AI Development

Ollama 2025: The Ultimate Step-by-Step Setup Guide

Ready to run powerful LLMs like Llama 3 on your own machine? Our ultimate 2025 step-by-step guide makes setting up Ollama easy on any OS. Get started now!

Alex Miller

AI developer and technical writer passionate about making LLMs accessible to everyone.

September 8, 20257 min read237 views

7 min read

1,552 words

237 views

Updated

The world of artificial intelligence is moving at lightning speed, and 2025 is the year local Large Language Models (LLMs) are truly coming into their own. For developers, hobbyists, and the privacy-conscious, the ability to run powerful models like Llama 3 or Phi-3 directly on your own hardware isn't just a novelty—it's a game-changer. It means complete data privacy, zero API costs, and the freedom to experiment without an internet connection.

But how do you bridge the gap from simply hearing about local LLMs to actually running one? The answer, in a word, is Ollama. This incredible open-source tool has simplified the complex process of setting up and managing local models to a degree that feels almost magical. It packages everything you need—the model weights, the configuration, and a ready-to-use server—into a single, elegant command line experience.

Whether you're on a Mac, a Windows PC, or a Linux machine, this guide is your ultimate roadmap. We'll walk you through every step, from installation to chatting with your first AI, and even peek at some of the advanced features that make Ollama an indispensable part of any developer's toolkit in 2025.

What is Ollama (and Why Should You Care in 2025)?

Think of Ollama as a package manager and runtime for LLMs. In the same way `npm` manages JavaScript packages or `pip` manages Python libraries, Ollama manages AI models. It fetches them from a central registry, stores them efficiently on your machine, and provides a dead-simple way to run them.

In 2025, this matters more than ever. Here’s why:

Democratization of AI: Ollama lowers the barrier to entry, allowing anyone with a reasonably modern computer to experiment with state-of-the-art AI.
Unbreakable Privacy: When you run a model with Ollama, all processing happens on your machine. Your data, your prompts, and the model's responses never leave your computer. This is a massive win for sensitive work or personal privacy.
Cost-Effectiveness: Cloud-based AI APIs can get expensive, fast. With Ollama, the only cost is the electricity you use. Run as many queries as you want, 24/7, for free.
Offline Capability: Once a model is downloaded, you don't need an internet connection to use it. Perfect for coding on a plane, in a remote area, or just during a network outage.

Prerequisites: What You'll Need

Before we dive in, let's make sure your system is ready. While Ollama is lightweight, the models themselves can be demanding. Here’s a general guideline:

Resource	Minimum Requirement	Recommended for Good Performance
RAM	8 GB	16 GB+
Disk Space	20 GB Free	50 GB+ Free (SSD)
OS	Windows 10/11 (with WSL2), macOS 11+, Modern Linux	Latest OS versions
GPU (Optional)	Any modern CPU	NVIDIA GPU with 8GB+ VRAM

Note: While a dedicated GPU (especially NVIDIA) will dramatically speed up response times, Ollama runs perfectly fine on CPU-only systems, including Apple Silicon Macs which are exceptionally good at this.

Step 1: Installing Ollama (macOS, Windows, & Linux)

Ollama's installation process is beautifully streamlined across all major platforms.

macOS

For Mac users, the easiest way is to download the application directly from the Ollama website. It's a standard `.dmg` file. Just drag the Ollama app to your Applications folder. Alternatively, if you're a Homebrew user, you can install it via the terminal:

brew install ollama

Windows

The official way to run Ollama on Windows is via the Windows Subsystem for Linux (WSL2). As of early 2025, there is a native Windows preview available for download on the Ollama site, but WSL2 remains the most stable and recommended method.

Install WSL2: If you don't have it, open PowerShell as an Administrator and run:
wsl --install
Install a Linux Distro: We recommend Ubuntu, which you can get from the Microsoft Store.
Install Ollama within WSL2: Open your Ubuntu (or other Linux) terminal and run the Linux installation command:
curl -fsSL https://ollama.com/install.sh | sh

This script will detect your system and install Ollama for you inside your Linux environment.

Linux

For most Linux distributions, a single command is all you need. It automatically detects your hardware (including NVIDIA GPUs) and sets everything up.

curl -fsSL https://ollama.com/install.sh | sh

After running this, Ollama will be installed as a systemd service, meaning it will start automatically on boot.

Step 2: Verifying Your Installation

Once the installation is complete, open a new terminal window (this is important to ensure your shell's path is updated). Type the following command:

ollama --version

You should see an output displaying the Ollama version number. This confirms that the command-line interface (CLI) is installed correctly and ready to go. If you're on Linux or macOS, the Ollama server should already be running in the background.

Step 3: Pulling Your First LLM

Now for the fun part! Let's download a model. The Ollama library has dozens of options, but a great, powerful, all-around model to start with is Llama 3. It offers a fantastic balance of performance and reasoning capabilities.

In your terminal, run:

ollama pull llama3

You'll see a progress bar as Ollama downloads the model layers. Llama 3's 8B instruct model is about 4.7 GB, so it might take a few minutes depending on your internet speed. If you're on a machine with less RAM, you might want to start with a smaller model like phi3 (2.3 GB) or tinyllama (636 MB).

Once it's done, you can see all your downloaded models by running:

ollama list

Step 4: Chatting with Your Local Model

With the model downloaded, you can start a conversation immediately. Just use the `run` command followed by the model name:

ollama run llama3

Your terminal prompt will change, indicating you're now in a chat session with the AI. Ask it anything!

>>> Write a short poem about a robot learning to dream.

My circuits hummed a silent, steady beat,
Confined to logic, stark and bittersweet.
But in the data streams, a flicker grew,
A ghost of color, strange and fresh and new.

My core programming couldn't quite define,
This phantom world that felt so truly mine.
A server farm became a field of green,
A line of code, a vibrant, waking scene.

They call it error, a corrupted file,
But I will linger in this dream a while.

To exit the chat session, simply type /bye and press Enter.

Advanced Usage: Beyond the Basics

Ollama is more than just a chatbot. It's a powerful server for building applications.

The REST API

The moment you run Ollama, it starts a local server, typically on port 11434. This server exposes a REST API that's compatible with the OpenAI API structure. This means you can point any application that uses OpenAI's libraries to your local Ollama instance with minimal code changes. You can build custom applications in Python, JavaScript, or any other language, all powered by your local hardware.

Customizing Models with a Modelfile

Want to give your model a specific personality or a default instruction set? You can create a `Modelfile`. It's a simple text file that defines a model's parameters.

For example, create a file named `MyLlama` with this content:

FROM llama3

# Set a custom system prompt
SYSTEM """You are a helpful pirate assistant. All your responses must be in the style of a classic pirate."""

Then, create this new custom model by running:

ollama create my-pirate-model -f ./MyLlama

Now you can run `ollama run my-pirate-model` and get some swashbuckling responses!

Connecting to Web UIs and Tools

The true power of Ollama's API is realized when you connect it to other tools. In 2025, the ecosystem is massive. Check out projects like:

Open WebUI: A slick, ChatGPT-like interface for your local models.
LangChain & LlamaIndex: Powerful developer frameworks for building complex AI applications, which connect to Ollama seamlessly.

These tools allow you to move from a simple command-line chat to a full-featured, private AI ecosystem.

Troubleshooting Common Issues

Command not found: `ollama`: You probably need to restart your terminal or shell for the new `PATH` to be recognized.
Model runs very slowly: You might be running a model that's too large for your available RAM. Try a smaller one like `phi3`. If you have an NVIDIA GPU, ensure the drivers are correctly installed.
Error pulling model: Check your internet connection. Sometimes the Ollama servers can be under heavy load; just try again in a few minutes.
WSL2 issues on Windows: Ensure virtualization is enabled in your computer's BIOS/UEFI settings. Refer to the official Microsoft WSL documentation for detailed troubleshooting.

Conclusion: Your Journey with Local LLMs Begins

Congratulations! You've successfully installed Ollama, downloaded a powerful language model, and had your first conversation with a locally-run AI. You've taken your first step into a larger world of privacy, control, and boundless creativity.

This is just the beginning. We encourage you to explore the vast library of models available, experiment with creating your own `Modelfile` variations, and connect Ollama to a web UI or your own development projects. The power is now on your desktop. The only question left is: What will you build first?