Run Different Ollama Models: Mistral, Mixtral & Beyond

You’ve installed Ollama, run your first ollama run llama2 command, and perhaps had a simple chat. That’s a great start! But Ollama isn’t limited to just one model. It offers a vast universe of different AI models, each with unique strengths, sizes, and characteristics.

Exploring these models allows you to find the best fit for your specific tasks, hardware, and performance needs. This guide will walk you through how to find, understand, and run various model types using Ollama’s simple command line interface.

Interacting with Ollama primarily happens through your computer’s Command Prompt (on Windows) or Terminal (on macOS and Linux). You issue commands starting with ollama to download, run, list, or remove models.

Finding Models: The Ollama Library

The official source for discovering models compatible with Ollama is the Ollama model library website. Think of this as a catalog of all the AI models you can easily download and run.

On the library page, you’ll see model names like mistral, mixtral, llama2, codellama, and many others. Crucially, each model often has multiple versions or “tags.” These tags specify different sizes, quantization levels, or fine-tuned variations.

Examples of tags include :latest (usually the most recent default version), :7b (a 7 billion parameter size), :70b (a 70 billion parameter size), :instruct (fine-tuned for following instructions), or :q4_K_M (a specific quantization level).

When you want to use a model, you’ll typically refer to it by its name and tag, separated by a colon, like model_name:tag. The library page for each model lists the available tags.

Understanding Model Characteristics

Not all AI models are created equal. Understanding a few key characteristics helps you choose the right model for your needs and hardware.

Model Architectures (Mistral, Mixtral, Llama, etc.)

These names refer to the fundamental underlying design or architecture of the AI model. Different architectures can lead to variations in performance, speed, efficiency, and sometimes even the types of tasks they excel at.

For example, models based on the Mistral architecture (like mistral itself or mixtral) are often noted for their strong performance relative to their size and efficiency. Mixtral is particularly interesting as a “mixture of experts,” meaning it uses different parts of the model to handle different inputs, leading to high quality and speed.

Model Size (Parameters – e.g., 7B, 13B, 70B)

The number of parameters is often described as the model’s “brain size” or complexity. It indicates how many values the model adjusts during training. More parameters generally mean the model has learned more and can perform more complex tasks or has a wider range of knowledge.

However, more parameters also dramatically increase the hardware requirements, especially RAM. A 7B model is relatively small/medium, a 13B is medium, while a 70B model is considered large and requires significant resources (often 64GB+ of RAM, preferably with a powerful GPU).

Quantization (e.g., q4, q8)

Quantization is a technique used to make models smaller and faster. It involves storing the model’s parameters using lower precision numbers (e.g., 4-bit integers instead of 16-bit floating points). This significantly reduces the amount of RAM and storage space the model needs.

Tags like :q4_K_M or :q8_0 indicate the level and type of quantization. A q4 quantized model will be smaller and faster than a q8 version of the same model, but might be slightly less accurate. Quantization is crucial for running larger models on less powerful hardware.

Fine-tunes/Variations (e.g., instruct, code)

Some model tags indicate that the model has undergone additional training for a specific purpose. For instance, a tag like :instruct means the model was fine-tuned to be better at following instructions and engaging in chat-like conversations. A tag like :code indicates training focused on generating or understanding code.

Choosing a fine-tuned version relevant to your task can significantly improve performance compared to a base model.

Running a Specific Model (The ollama run Command)

You already know the basic ollama run command. To run a specific model or version, you simply add the model name and the desired tag.

The core command structure is:

ollama run <model_name>[:tag]

If the specific model and tag you request aren’t already downloaded on your system, Ollama will automatically begin downloading it first. You’ll see a progress bar indicating the download status.

Here are some examples of running different models and tags:

To run the default (usually latest) version of Mistral:

ollama run mistral

To run the 13 billion parameter version of Llama 2:

ollama run llama2:13b

To run a specific instruction-tuned version of Mixtral:

ollama run mixtral:8x7b-instruct-v0.1

To run a version of Code Llama optimized for coding:

ollama run codellama:7b-code

After you type the command and press Enter, Ollama checks if the model is local. If not, it downloads. Then, it loads the model into memory. Once loaded, you’ll see the chat prompt (>>>), indicating that the specific model you requested is now active and ready to interact with.

Downloading Models Explicitly (The ollama pull Command)

Sometimes you might want to download a model without immediately starting a chat session. This is useful for downloading large models ahead of time or scripting downloads.

The command for this is ollama pull:

ollama pull <model_name>[:tag]

For example, to download the large 70B version of Llama 2 without running it yet:

ollama pull llama2:70b

Ollama will show the download progress, similar to when ollama run downloads a model. Once the download is complete, the model is available on your system for future use with ollama run.

Listing Installed Models (The ollama list Command)

As you download more models, you might lose track of what you have. The ollama list command shows you all the models and tags currently stored on your system:

ollama list

When you run this command, you’ll see output similar to this:

NAME                 ID          SIZE    DIGEST              MODIFIED
llama2:latest        f7b3b3f84a      3.8 GB  f7b3b3f84a...   2 hours ago
mistral:latest       269410a2c9      4.1 GB  269410a2c9...   About an hour ago
mixtral:8x7b-instruct-v0.1 5415a20b0b     25.8 GB  5415a20b0b...   10 minutes ago
codellama:7b-code    2860b4d301      3.8 GB  2860b4d301...   3 days ago

Pay attention to the NAME column, which shows the model_name:tag, and the SIZE column, which indicates how much disk space each model version is using.

Switching Between Models

Switching models in Ollama is straightforward. If you are currently chatting with one model (e.g., Llama 2), type /bye and press Enter to exit the chat session.

Then, simply run the ollama run command again, specifying the name and tag of the *new* model you want to use (e.g., ollama run mistral). Ollama will unload the previous model and load the newly requested one.

Removing Models to Save Space (The ollama rm Command)

AI models, especially larger ones, can consume significant amounts of disk space. If you’ve finished experimenting with a model or need to free up space, you can remove it using the ollama rm command.

The command structure is:

ollama rm <model_name>[:tag]

For example, to remove the 70B version of Llama 2:

ollama rm llama2:70b

Ollama will ask for confirmation before deleting. Be careful: this action is irreversible. It’s a good idea to run ollama list first to ensure you are removing the correct model tag.

Choosing the Right Model for Your Hardware and Task

Selecting a model isn’t just about finding the “best” one; it’s about finding the best one that runs well on your computer and suits what you want to do.

  • Limited RAM or No Strong GPU: Prioritize smaller models (like 7B or 13B if you have decent RAM) and heavily quantized versions (e.g., tags with q4). These require less memory and process faster on less powerful hardware.
  • Good GPU and Plenty of RAM: You can explore larger models (13B, 70B) and less quantized versions (e.g., q8) or even unquantized versions if available. These offer potentially better accuracy and capability but demand significant resources.
  • Task-Specific Needs: If you need a model for chatting, look for :instruct or :chat tags. For coding, look for :code tags or models specifically designed for programming tasks like Code Llama.

Experimentation is key! Start with a smaller, quantized version of a model you’re interested in and see how it performs on your system before trying larger versions.

Troubleshooting Running Models

Encountering issues? Here are a few common ones:

  • “Error: model ‘<model_name>[:tag]’ not found”: Double-check the spelling of the model name and tag. Use ollama list to see what you have downloaded, or check the Ollama library for the correct name and tag.
  • Download Issues: Ensure you have a stable internet connection and enough disk space for the model. Large models can be tens of gigabytes.
  • Runs Very Slowly or Crashes: This is often a sign the model is too large for your computer’s hardware, particularly RAM. Check your system’s resource usage while the model is loading or running. If RAM is maxed out, try a smaller parameter size (e.g., 7b instead of 13b) or a more heavily quantized version (e.g., q4 instead of q8).

FAQs

How much RAM do I need for a specific model size?

It varies greatly depending on the model and quantization, but general estimates for quantized models are: 7B needs ~8GB RAM, 13B needs ~16GB RAM, and 70B needs ~48-64GB+ RAM. More RAM is always better, especially if you want to run other applications simultaneously.

Can I use models not on the Ollama library?

Yes, Ollama supports importing models in other formats (like GGUF) by creating a Modelfile. This is a more advanced topic, but the library is the easiest starting point.

How do I update a model?

Simply run the ollama pull <model_name>[:tag] command again. If a newer version with the same tag exists, Ollama will download and replace the old one.

What’s the difference between Llama and Mistral models?

They are different model architectures developed by different organizations (Meta for Llama, Mistral AI for Mistral). They have different training data, structures, and performance characteristics. Mistral models are often praised for their efficiency and quality relative to size.

Conclusion

You now have the tools to go beyond the default and explore the exciting variety of AI models available for Ollama. You know how to find models in the Ollama library, understand key differences like size and quantization, and use the essential commands:

  • ollama pull to download models.
  • ollama list to see what you have.
  • ollama run to load and interact with a specific model.
  • ollama rm to remove models you no longer need.

Experiment with different models like Mistral, Mixtral, and others. See how they perform on your hardware and which ones best suit your tasks. The power of local AI is at your fingertips!