Skip to main content

Ollama Provider

Run local open-source models via Ollama with zero API costs and complete privacy, or use Ollama Cloud for hosted models.

Quick Start

# Local Ollama (no API key needed)
ccs ollama "explain this code"

# Ollama Cloud (requires API key)
ccs ollama-cloud "refactor this function"

Variants

Ollama (Local)

Run models on your local machine with complete privacy and zero API costs. Configuration:
  • Base URL: http://localhost:11434
  • Default model: qwen3-coder
  • API key: Not required
  • Context: 32K+ tokens
Prerequisites: Ollama must be installed and running locally.

Ollama Cloud

Access Ollama’s hosted models via their cloud API. Configuration:
  • Base URL: https://ollama.com
  • Default models: glm-4.7:cloud, minimax-m2.1:cloud
  • API key: Required from ollama.com
  • Context: Varies by model

Prerequisites

Installing Ollama (Local)

1

Download Ollama

Visit ollama.com and download for your platform
2

Install

Follow platform-specific installation instructions
3

Verify Installation

ollama --version
4

Pull Model

ollama pull qwen3-coder

Ollama Cloud Setup

1

Create Account

Sign up at ollama.com
2

Get API Key

Navigate to API settings and generate your API key
3

Configure CCS

ccs setup --preset ollama-cloud
# Enter your API key when prompted

Configuration

Local Ollama Setup

# Interactive setup
ccs setup --preset ollama

# Manual config in ~/.ccs/config.yaml
profiles:
  ollama:
    env:
      ANTHROPIC_BASE_URL: "http://localhost:11434"
      ANTHROPIC_MODEL: "qwen3-coder"
      ANTHROPIC_AUTH_TOKEN: "ollama"

Ollama Cloud Setup

# ~/.ccs/config.yaml
profiles:
  ollama-cloud:
    env:
      ANTHROPIC_BASE_URL: "https://ollama.com"
      ANTHROPIC_MODEL: "glm-4.7:cloud"
      ANTHROPIC_AUTH_TOKEN: "YOUR_OLLAMA_CLOUD_API_KEY"

Model Selection

ModelSizeContextUse Case
qwen3-coder7B32KCoding (recommended)
deepseek-coder6.7B16KCode completion
codellama7B16KCode generation
mistral7B8KGeneral purpose

Pulling Models

# Install recommended coding model
ollama pull qwen3-coder

# List available models
ollama list

# Remove unused models
ollama rm model-name

Cloud Models

ModelDescription
glm-4.7:cloudGLM via Ollama Cloud
minimax-m2.1:cloudMinimax via Ollama Cloud

Usage Examples

Local Ollama

# Basic usage
ccs ollama "explain this function"

# Switch models
ANTHROPIC_MODEL=deepseek-coder ccs ollama "review this code"

# Custom temperature
ANTHROPIC_TEMPERATURE=0.7 ccs ollama "generate unit tests"

Ollama Cloud

# Use cloud variant
ccs ollama-cloud "debug this error"

# Specific cloud model
ANTHROPIC_MODEL=minimax-m2.1:cloud ccs ollama-cloud "optimize performance"

Troubleshooting

Connection Refused

Symptom: Error: connect ECONNREFUSED 127.0.0.1:11434 Cause: Ollama service not running Solution:
# Start Ollama service
ollama serve

# Or on macOS/Windows, launch Ollama app

Model Not Found

Symptom: Error: model 'qwen3-coder' not found Cause: Model not pulled locally Solution:
# Pull the model first
ollama pull qwen3-coder

# Verify installation
ollama list

Slow Responses

Symptom: Long response times Causes & Solutions:
  • CPU-only inference: Use smaller model or add GPU support
  • Large model: Switch to smaller variant (e.g., qwen3-coder:3b)
  • Insufficient RAM: Close other apps, use quantized models
Optimize performance:
# Use smaller quantized model
ollama pull qwen3-coder:q4_0

# Update model in config
ANTHROPIC_MODEL=qwen3-coder:q4_0 ccs ollama "test"

Ollama Cloud API Errors

Symptom: 401 Unauthorized or 403 Forbidden Solution:
# Verify API key is correct
ccs config
# Navigate to ollama-cloud profile
# Re-enter API key

Performance Tuning

Context Length

# ~/.ccs/config.yaml
profiles:
  ollama:
    env:
      ANTHROPIC_MAX_TOKENS: "32768"  # Adjust based on model

Concurrency

Ollama handles concurrent requests via queue. For better performance:
# Increase parallel requests (Ollama config)
OLLAMA_NUM_PARALLEL=4 ollama serve

Cost Information

VariantCostPrivacy
Ollama (Local)$0 (hardware only)Complete - data never leaves machine
Ollama CloudVaries by usageDepends on Ollama Cloud privacy policy

Storage Locations

PathDescription
~/.ollama/models/Downloaded model files
~/.ccs/config.yamlCCS profile configuration
~/.ccs/ollama.settings.jsonModel preferences (if using Dashboard)

Ollama vs llama.cpp

FeatureOllamallama.cpp
Model formatOllama formatGGUF (raw)
SetupEasier (install + pull)More manual
Model selectionBuilt-in model libraryAny GGUF file
PerformanceGoodBetter (more optimization options)
CommunityLarge, many modelsSmaller but growing
Best forGetting started quicklyFine-tuned control
Use Ollama for quick setup with curated models. Use llama.cpp if you need specific GGUF models or advanced tuning.

Next Steps

API Profiles

Configure custom Ollama endpoints

llama.cpp Provider

Alternative GGUF-based local inference

Dashboard

Manage models via web interface

Remote Proxy

Run Ollama on remote server