AI & Machine Learning

Build Your First AI App with Ollama & Python in 2025

Ready to build your first AI application? This 2025 guide walks you through creating a Python app with Ollama to run powerful LLMs locally. Start today!

Alexandre Dubois

Senior AI Engineer specializing in local model deployment and open-source tooling.

September 8, 20257 min read97 views

7 min read

1,323 words

97 views

Updated

Ever feel like the world of AI is a private club, locked behind expensive API keys and corporate clouds? Well, 2025 is the year that changes. We're witnessing a massive shift towards local, powerful, and private AI, and at the forefront of this revolution is a tool called Ollama.

If you've ever wanted to build your own AI-powered applications without relying on a third-party service, you're in the right place. This guide will walk you through everything you need to know to build your very first AI chat application using Ollama and Python, right on your own computer.

What is Ollama and Why Should You Care?

In simple terms, Ollama is a tool that makes it incredibly easy to download, run, and manage large language models (LLMs) on your local machine. Think of it like Docker, but for AI models. Instead of complex setup procedures and dependency nightmares, you get a simple command-line interface to get state-of-the-art models like Llama 3, Mistral, and Code Llama running in minutes.

Why is this a game-changer for 2025?

Privacy First: When you run a model locally with Ollama, your data never leaves your machine. No more sending sensitive information to a third-party API.
Cost-Effective: Forget paying per token or per API call. Once you've downloaded a model, you can use it as much as you want, completely free.
Offline Capability: Your AI app will work even without an internet connection. This opens up a world of possibilities for on-device, edge-computing applications.
Ultimate Control: You have direct access to the model. You can fine-tune it, customize its behavior, and integrate it deeply into your applications without platform limitations.

Ollama vs. Cloud-Based APIs: A Quick Comparison

To put things in perspective, let's see how running a local model with Ollama stacks up against using a traditional cloud-based API like OpenAI's GPT-4 or Google's Gemini.

Feature	Ollama (Local)	Cloud APIs (e.g., OpenAI)
Cost	Free (after hardware purchase)	Pay-per-use (can get expensive)
Privacy	100% private; data never leaves your machine	Data is sent to a third-party server
Latency	Very low; near-instant responses	Dependent on internet connection and server load
Offline Access	Yes, fully functional offline	No, requires an active internet connection
Setup	Simple command-line installation	Sign up, get an API key, manage billing
Model Power	Access to powerful open-source models	Access to state-of-the-art proprietary models

Setting Up Your Local AI Lab

Alright, let's get our hands dirty. The setup process is surprisingly straightforward.

Prerequisites

A reasonably modern computer (Mac, Windows, or Linux). A machine with at least 8GB of RAM is recommended, but 16GB+ is ideal for running larger models.
Python 3.8 or newer installed.
A bit of comfort with the command line/terminal.

Step 1: Install Ollama

Ollama has a brilliant one-line installer for macOS and Linux. Open your terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

For Windows users, simply download and run the installer from the official Ollama website. Once installed, Ollama runs as a background service.

Step 2: Pull Your First AI Model

With Ollama installed, you can now download an LLM. We'll start with Meta's Llama 3, a powerful and versatile model. It's a great all-rounder.

In your terminal, run:

ollama run llama3

This command will download the Llama 3 model (it might take a few minutes depending on your internet speed) and then immediately drop you into a chat session with it. You'll see something like this:

>>> Send a message (/? for help)

Go ahead, ask it a question! Type "Why is the sky blue?" and press Enter. You're now chatting with an AI running entirely on your computer. To exit, type /bye.

Building Your First AI App with Python

Now for the fun part: integrating this local power into a Python application. We'll start with a simple script and then build a slightly more advanced interactive chat.

Step 3: Set Up Your Python Project

Let's create a dedicated folder for our project and install the necessary Python library.

# Create and navigate into your project directory
mkdir my-ollama-app
cd my-ollama-app

# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
.\venv\Scripts\activate

# Install the official Ollama Python library
pip install ollama

Step 4: The Basic Python Script

Create a file named basic_chat.py and add the following code. This script will send a single, hardcoded prompt to the model and print the response.

import ollama

def run_basic_chat():
    # Make sure the Ollama application is running
    try:
        response = ollama.chat(
            model='llama3', # The model you want to use
            messages=[
                {
                    'role': 'user',
                    'content': 'Why is Python a popular language for AI?',
                },
            ]
        )
        
        print("--- AI Response ---")
        print(response['message']['content'])
        print("-------------------")

    except Exception as e:
        print(f"An error occurred: {e}")
        print("Please ensure the Ollama application is running and the 'llama3' model is installed ('ollama run llama3').")

if __name__ == '__main__':
    run_basic_chat()

Run the script from your terminal:

python basic_chat.py

You should see a well-articulated response from Llama 3 explaining Python's popularity in AI. Notice the `messages` list? This is how you structure a conversation. For a single question, you just need one dictionary with the `role` as 'user'.

Step 5: Making it an Interactive Chatbot

A single response is cool, but a real conversation is better. Let's create a new file, interactive_chat.py, that remembers the chat history.

import ollama

def run_interactive_chat():
    print("Starting interactive chat with Llama 3. Type 'exit' to end.")
    
    # The conversation history, starting with an optional system message
    conversation_history = [
        {'role': 'system', 'content': 'You are a helpful and concise assistant.'}
    ]

    while True:
        user_input = input("You: ")
        if user_input.lower() == 'exit':
            print("Exiting chat. Goodbye!")
            break

        # Add the user's message to the history
        conversation_history.append({'role': 'user', 'content': user_input})

        try:
            # Get the streaming response from the model
            stream = ollama.chat(
                model='llama3',
                messages=conversation_history,
                stream=True, # This is the key to getting a streaming response
            )

            print("AI: ", end="")
            assistant_response = ""
            # Process and print each chunk of the response as it comes in
            for chunk in stream:
                part = chunk['message']['content']
                print(part, end='', flush=True)
                assistant_response += part
            
            print() # Newline after the full response

            # Add the full assistant response to the history
            conversation_history.append({'role': 'assistant', 'content': assistant_response})

        except Exception as e:
            print(f"An error occurred: {e}")
            # Remove the last user message if the call failed
            conversation_history.pop()

if __name__ == '__main__':
    run_interactive_chat()

Run this new script:

python interactive_chat.py

Now you have a real chatbot! Notice a few key improvements:

Conversation History: We maintain a list called `conversation_history`. After each response from the AI, we append both the user's prompt and the AI's full reply to this list. This gives the model the context of the entire conversation.
Streaming Responses: We set `stream=True`. This makes the AI's response appear token-by-token, just like in ChatGPT, creating a much better user experience than waiting for the full response to generate.

Key Takeaways & What's Next?

Congratulations! You've just built a fully functional, private AI chatbot that runs 100% on your own hardware. That's a huge step into the world of practical AI development.

Here’s what you accomplished:

✅ You learned what Ollama is and why local LLMs are a big deal.
✅ You installed Ollama and downloaded a powerful model (Llama 3).
✅ You used the Ollama Python library to communicate with your local model.
✅ You built both a simple, single-turn script and a multi-turn, interactive chatbot that remembers context.

So, where do you go from here? The possibilities are endless.

Build a Web Interface: Wrap your chatbot logic in a simple web framework like Flask or FastAPI to create a web-based chat interface.
Explore Other Models: Try a model specifically for coding, like codellama, by running ollama run codellama. See how its responses differ for programming questions.
Try Different Modalities: Explore models that can understand images, like llava.
RAG (Retrieval-Augmented Generation): The next big step is to build a RAG pipeline. This involves letting your LLM answer questions based on your own documents, a powerful technique for creating expert chatbots.

You've taken the first and most important step. The world of local AI is yours to explore. Happy building!

Topics & Tags

📂 AI & Machine Learning #Ollama #Python #Local LLM #AI Development #Generative AI

Share this article

𝕏Twitter fFacebook inLinkedIn RReddit YHackerNews