Ollama Python API: Basic Integration Tutorial
You run local AI models with Ollama. Maybe you use the command line. Perhaps you use a web interface. But you can do much more.
You can control these models directly with code. This lets you build custom applications. You can automate tasks easily.
The key is using Ollama’s API. This article shows you how to use the Ollama Python API. You will learn to send requests and get responses.
We cover two ways: using the basic requests library and the simpler official ollama library.
Prerequisites for Using the Ollama Python API
Before you start coding, you need a few things ready. First, you must have Ollama installed. Make sure it is running in the background.
You also need Python installed on your computer. We assume you use Python 3.x. Basic Python knowledge helps a lot.
You need two Python libraries. The requests library handles web requests. The ollama library is the official client. Install them using pip:
pip install requests ollama
This command downloads and installs both libraries. Now you are ready to interact with the Ollama Python API.
Understanding Ollama’s API
Ollama runs a web server on your computer. This server exposes a REST API. Your Python code talks to this server.
The default address is http://localhost:11434. Your Python script sends HTTP requests to this address.
You send data in JSON format. You receive responses back in JSON format. We will use two main API endpoints.
The /api/generate endpoint creates text. The /api/tags endpoint lists your installed models. Let’s see how to use them.
Method 1: Using the requests Library (Understanding the HTTP Basics)
The requests library is a standard way to make HTTP requests in Python. It shows you the basic API interaction. You send data and handle the response yourself.
Generating Text with /api/generate
To make Ollama generate text, you send a POST request. The endpoint is /api/generate. You send a JSON body with details.
You must include the model name. You also need the prompt text. Ollama sends back the generated response.
Here is a simple Python script. It sends a prompt to Oll Ollama Python API using requests.
import requests
import json
# The API endpoint URL
url = "http://localhost:11434/api/generate"
# The data payload for the request
payload = { "model": "llama2",
# Use a model you have downloaded, like 'llama2'
"prompt": "Explain prompt engineering in simple terms.",
"stream": False # Set to False for a single response}
# Send the POST request
try:
response = requests.post(url, json=payload)
response.raise_for_status()
# Check for HTTP errors (like 404, 500)
# Parse the JSON response
result = response.json()
# Print the relevant parts
print("Model:", result.get("model"))
print("Response:", result.get("response"))
except requests.exceptions.RequestException as e:
print(f"Error connecting to Ollama: {e}")
print("Please ensure Ollama is installed and running.")
First, we import the needed libraries. requests sends the HTTP call. json helps handle data, though requests does most of it here.
We set the url to the generate endpoint. The payload dictionary holds our request data. We tell it which model to use and give it the prompt.
stream: False means we wait for the full response. requests.post() sends the data. response.raise_for_status() checks if the server returned an error code.
response.json() converts the JSON response into a Python dictionary. Finally, we print the model name and the generated response text from the dictionary.
Listing Available Models with /api/tags
You can also ask Ollama which models you have. This uses the /api/tags endpoint. You send a GET request this time.
A GET request does not need a request body. Ollama responds with a JSON list of your models. It’s simple and fast.
Here is the Python code using requests to list models from the Ollama Python API.
import requests
# The API endpoint URL for listing models
url = "http://localhost:11434/api/tags"
# Send the GET request
try:
response = requests.get(url)
response.raise_for_status()
# Check for errors
# Parse the JSON response
data = response.json()
print("Available Models:")
# Check if the response has the expected 'models' key
if data and "models" in data:
for model in data["models"]:
# Access model details from the dictionary
print(f"- {model['name']} (Size: {model['size']} bytes)")
else:
print("No models found or unexpected response format.")
except requests.exceptions.RequestException as e:
print(f"Error connecting to Ollama: {e}")
print("Please ensure Ollama is installed and running.")
We import requests again. The url now points to /api/tags. We use requests.get() to send the request.
response.raise_for_status() checks for errors. response.json() turns the response into a Python dictionary.
The response dictionary has a key called models. This key holds a list of dictionaries, one for each model. We loop through this list.
For each model dictionary, we print its name and size. This shows you what models you have ready.
Method 2: Using the Official ollama Python Library (The Easier Way)
The Ollama team provides an official Python library. This library makes using the Ollama Python API much simpler. It handles many details for you.
It is the recommended way for most Python projects. It simplifies sending requests and processing responses.
Generating Text with the ollama Library
Using the ollama library to generate text is very clean. You create a client object. Then you call methods on that object.
The library handles the HTTP requests and JSON data. You just provide the model and prompt.
See how much simpler this is compared to using requests directly for the Ollama Python API.
import ollama
# Create a client instance (defaults to http://localhost:11434)
client = ollama.Client()
# Send the generate request
try:
response = client.generate(model='llama2', prompt='Tell me a short story.')
# Remember to use a model you have!
# The library gives you the result directly
print("Response:", response.get("response"))
# Still access the 'response' key
except ollama.ResponseError as e:
print(f"Ollama API Error: {e}")
print("Please check the model name and ensure Ollama is running.")
except Exception as e:
print(f"An unexpected error occurred: {e}")
We import the ollama library. We create a client object. By default, it connects to http://localhost:11434.
We call client.generate(). We pass the model name and the prompt as arguments. The library sends the POST request for us.
The response variable holds the result as a dictionary. We access the generated text using response.get(“response”).
Error handling uses ollama.ResponseError for API-specific problems. This code is shorter and easier to read.
Listing Models with the ollama Library
Listing models is also very straightforward with the ollama library. You use the same client object.
There is a specific method for listing models. It returns the list directly. This method makes using the Ollama Python API for model listing very easy.
import ollama
# Create a client instance
client = ollama.Client()
try:
# Send the list models request
models = client.list()
print("Available Models:")
# The structure is the same as the raw API response
if models and "models" in models:
for model in models["models"]:
print(f"- {model['name']} (Size: {model['size']} bytes)")
else:
print("No models found or unexpected response format.")
except ollama.ResponseError as e:
print(f"Ollama API Error: {e}")
print("Please ensure Ollama is running.")
except Exception as e:
print(f"An unexpected error occurred: {e}")
Again, we import ollama and create a client. We call client.list().
The models variable gets the dictionary containing the model list. The structure is the same as the raw API response.
We loop through the models[“models”] list. We print the name and size for each model found. This method is much cleaner than handling raw HTTP GET requests.
Choosing the Right Method
You now know two ways to use the Ollama Python API. Both methods work. But they have different uses.
Use the requests library if you want to learn API basics. It shows you the underlying HTTP communication. It’s also useful if you cannot install the official library.
For most tasks, use the official ollama library. It simplifies code greatly. It handles complex things like streaming responses automatically. It is the recommended approach for integrating Ollama into your Python projects.
Troubleshooting Ollama Python API Issues
Sometimes things go wrong when using APIs. Here are some common problems. We also offer solutions.
If you get a “Connection refused” error, Ollama is likely not running. Or it might be on a different port. Check the Ollama application status.
A “404 Not Found” error means the URL or endpoint is wrong. Double-check http://localhost:11434/api/generate or /api/tags.
If you see “Model not found,” the model name in your code is incorrect. Use ollama list in your terminal. Make sure the model is downloaded and spelled right.
Ensure your Python script can reach localhost:11434. Firewall settings can sometimes block local connections, although this is rare.
What’s Next? Exploring More API Features
This tutorial covered basic text generation and model listing. The Ollama Python API can do much more. You can explore other endpoints.
The API lets you manage models (create, delete). You can also get text embeddings. Embeddings turn text into numbers for analysis.
The /api/generate endpoint has many advanced options. You can control creativity (temperature) or how the model chooses words (top_p). Look at the official Ollama API documentation for details.
With this knowledge, you can build cool applications. Create a simple chatbot, process text data, or automate tasks with local AI power using the Ollama Python API.
FAQs about Ollama Python API
Here are answers to common questions about using the Ollama Python API with Python.
Do I need an internet connection? No. Ollama and your models run locally. Your Python script talks only to your local Ollama server.
Can I run the script on a different computer? Yes, but you need to change the URL. Replace localhost with the IP address of the machine running Ollama. Ensure network firewalls allow access to port 11434.
How do I use a specific model tag? Just use the full tag name in the model field. For example, llama2:13b or mistral:latest. Use ollama list to see your tags.
Is the API stable? The API is actively developed. Endpoints like /api/generate and /api/tags are core features. They are stable for basic use, but check the official documentation for the latest updates.
Conclusion
You learned how to interact with your local AI models using Python. The Ollama Python API opens up many possibilities. You saw how to use the fundamental requests library.
You also learned about the easier official ollama library. Both let you send prompts and list models programmatically. You can now integrate local LLMs into your own scripts and applications.
Copy the code examples. Run them on your machine. Experiment with different models and prompts. Start building with the Ollama Python API today!