Run Different Ollama Models: Mistral, Mixtral & Beyond
You’ve installed Ollama, run your first ollama run llama2
command, and perhaps had a simple chat. That’s a great start! But Ollama isn’t limited to just one model. It offers a vast universe of different AI models, each with unique strengths, sizes, and characteristics.
Exploring these models allows you to find the best fit for your specific tasks, hardware, and performance needs. This guide will walk you through how to find, understand, and run various model types using Ollama’s simple command line interface.
Interacting with Ollama primarily happens through your computer’s Command Prompt (on Windows) or Terminal (on macOS and Linux). You issue commands starting with ollama
to download, run, list, or remove models.
Finding Models: The Ollama Library
The official source for discovering models compatible with Ollama is the Ollama model library website. Think of this as a catalog of all the AI models you can easily download and run.
On the library page, you’ll see model names like mistral
, mixtral
, llama2
, codellama
, and many others. Crucially, each model often has multiple versions or “tags.” These tags specify different sizes, quantization levels, or fine-tuned variations.
Examples of tags include :latest
(usually the most recent default version), :7b
(a 7 billion parameter size), :70b
(a 70 billion parameter size), :instruct
(fine-tuned for following instructions), or :q4_K_M
(a specific quantization level).
When you want to use a model, you’ll typically refer to it by its name and tag, separated by a colon, like model_name:tag
. The library page for each model lists the available tags.
Understanding Model Characteristics
Not all AI models are created equal. Understanding a few key characteristics helps you choose the right model for your needs and hardware.
Model Architectures (Mistral, Mixtral, Llama, etc.)
These names refer to the fundamental underlying design or architecture of the AI model. Different architectures can lead to variations in performance, speed, efficiency, and sometimes even the types of tasks they excel at.
For example, models based on the Mistral architecture (like mistral
itself or mixtral
) are often noted for their strong performance relative to their size and efficiency. Mixtral is particularly interesting as a “mixture of experts,” meaning it uses different parts of the model to handle different inputs, leading to high quality and speed.
Model Size (Parameters – e.g., 7B, 13B, 70B)
The number of parameters is often described as the model’s “brain size” or complexity. It indicates how many values the model adjusts during training. More parameters generally mean the model has learned more and can perform more complex tasks or has a wider range of knowledge.
However, more parameters also dramatically increase the hardware requirements, especially RAM. A 7B
model is relatively small/medium, a 13B
is medium, while a 70B
model is considered large and requires significant resources (often 64GB+ of RAM, preferably with a powerful GPU).
Quantization (e.g., q4, q8)
Quantization is a technique used to make models smaller and faster. It involves storing the model’s parameters using lower precision numbers (e.g., 4-bit integers instead of 16-bit floating points). This significantly reduces the amount of RAM and storage space the model needs.
Tags like :q4_K_M
or :q8_0
indicate the level and type of quantization. A q4
quantized model will be smaller and faster than a q8
version of the same model, but might be slightly less accurate. Quantization is crucial for running larger models on less powerful hardware.
Fine-tunes/Variations (e.g., instruct, code)
Some model tags indicate that the model has undergone additional training for a specific purpose. For instance, a tag like :instruct
means the model was fine-tuned to be better at following instructions and engaging in chat-like conversations. A tag like :code
indicates training focused on generating or understanding code.
Choosing a fine-tuned version relevant to your task can significantly improve performance compared to a base model.
Running a Specific Model (The ollama run
Command)
You already know the basic ollama run
command. To run a specific model or version, you simply add the model name and the desired tag.
The core command structure is:
ollama run <model_name>[:tag]
If the specific model and tag you request aren’t already downloaded on your system, Ollama will automatically begin downloading it first. You’ll see a progress bar indicating the download status.
Here are some examples of running different models and tags:
To run the default (usually latest) version of Mistral:
ollama run mistral
To run the 13 billion parameter version of Llama 2:
ollama run llama2:13b
To run a specific instruction-tuned version of Mixtral:
ollama run mixtral:8x7b-instruct-v0.1
To run a version of Code Llama optimized for coding:
ollama run codellama:7b-code
After you type the command and press Enter, Ollama checks if the model is local. If not, it downloads. Then, it loads the model into memory. Once loaded, you’ll see the chat prompt (>>>
), indicating that the specific model you requested is now active and ready to interact with.
Downloading Models Explicitly (The ollama pull
Command)
Sometimes you might want to download a model without immediately starting a chat session. This is useful for downloading large models ahead of time or scripting downloads.
The command for this is ollama pull
:
ollama pull <model_name>[:tag]
For example, to download the large 70B version of Llama 2 without running it yet:
ollama pull llama2:70b
Ollama will show the download progress, similar to when ollama run
downloads a model. Once the download is complete, the model is available on your system for future use with ollama run
.
Listing Installed Models (The ollama list
Command)
As you download more models, you might lose track of what you have. The ollama list
command shows you all the models and tags currently stored on your system:
ollama list
When you run this command, you’ll see output similar to this:
NAME ID SIZE DIGEST MODIFIED
llama2:latest f7b3b3f84a 3.8 GB f7b3b3f84a... 2 hours ago
mistral:latest 269410a2c9 4.1 GB 269410a2c9... About an hour ago
mixtral:8x7b-instruct-v0.1 5415a20b0b 25.8 GB 5415a20b0b... 10 minutes ago
codellama:7b-code 2860b4d301 3.8 GB 2860b4d301... 3 days ago
Pay attention to the NAME column, which shows the model_name:tag
, and the SIZE column, which indicates how much disk space each model version is using.
Switching Between Models
Switching models in Ollama is straightforward. If you are currently chatting with one model (e.g., Llama 2), type /bye
and press Enter to exit the chat session.
Then, simply run the ollama run
command again, specifying the name and tag of the *new* model you want to use (e.g., ollama run mistral
). Ollama will unload the previous model and load the newly requested one.
Removing Models to Save Space (The ollama rm
Command)
AI models, especially larger ones, can consume significant amounts of disk space. If you’ve finished experimenting with a model or need to free up space, you can remove it using the ollama rm
command.
The command structure is:
ollama rm <model_name>[:tag]
For example, to remove the 70B version of Llama 2:
ollama rm llama2:70b
Ollama will ask for confirmation before deleting. Be careful: this action is irreversible. It’s a good idea to run ollama list
first to ensure you are removing the correct model tag.
Choosing the Right Model for Your Hardware and Task
Selecting a model isn’t just about finding the “best” one; it’s about finding the best one that runs well on your computer and suits what you want to do.
- Limited RAM or No Strong GPU: Prioritize smaller models (like
7B
or13B
if you have decent RAM) and heavily quantized versions (e.g., tags withq4
). These require less memory and process faster on less powerful hardware. - Good GPU and Plenty of RAM: You can explore larger models (
13B
,70B
) and less quantized versions (e.g.,q8
) or even unquantized versions if available. These offer potentially better accuracy and capability but demand significant resources. - Task-Specific Needs: If you need a model for chatting, look for
:instruct
or:chat
tags. For coding, look for:code
tags or models specifically designed for programming tasks like Code Llama.
Experimentation is key! Start with a smaller, quantized version of a model you’re interested in and see how it performs on your system before trying larger versions.
Troubleshooting Running Models
Encountering issues? Here are a few common ones:
- “Error: model ‘<model_name>[:tag]’ not found”: Double-check the spelling of the model name and tag. Use
ollama list
to see what you have downloaded, or check the Ollama library for the correct name and tag. - Download Issues: Ensure you have a stable internet connection and enough disk space for the model. Large models can be tens of gigabytes.
- Runs Very Slowly or Crashes: This is often a sign the model is too large for your computer’s hardware, particularly RAM. Check your system’s resource usage while the model is loading or running. If RAM is maxed out, try a smaller parameter size (e.g.,
7b
instead of13b
) or a more heavily quantized version (e.g.,q4
instead ofq8
).
FAQs
How much RAM do I need for a specific model size?
It varies greatly depending on the model and quantization, but general estimates for quantized models are: 7B needs ~8GB RAM, 13B needs ~16GB RAM, and 70B needs ~48-64GB+ RAM. More RAM is always better, especially if you want to run other applications simultaneously.
Can I use models not on the Ollama library?
Yes, Ollama supports importing models in other formats (like GGUF) by creating a Modelfile. This is a more advanced topic, but the library is the easiest starting point.
How do I update a model?
Simply run the ollama pull <model_name>[:tag]
command again. If a newer version with the same tag exists, Ollama will download and replace the old one.
What’s the difference between Llama and Mistral models?
They are different model architectures developed by different organizations (Meta for Llama, Mistral AI for Mistral). They have different training data, structures, and performance characteristics. Mistral models are often praised for their efficiency and quality relative to size.
Conclusion
You now have the tools to go beyond the default and explore the exciting variety of AI models available for Ollama. You know how to find models in the Ollama library, understand key differences like size and quantization, and use the essential commands:
ollama pull
to download models.ollama list
to see what you have.ollama run
to load and interact with a specific model.ollama rm
to remove models you no longer need.
Experiment with different models like Mistral, Mixtral, and others. See how they perform on your hardware and which ones best suit your tasks. The power of local AI is at your fingertips!