Ollama Provider

Run local open-source models via Ollama with zero API costs and complete privacy, or use Ollama Cloud for hosted models.

Quick Start

# Local Ollama (no API key needed)
ccs ollama "explain this code"

# Ollama Cloud (requires API key)
ccs ollama-cloud "refactor this function"

Variants

Ollama (Local)

Run models on your local machine with complete privacy and zero API costs. Configuration:

Base URL: http://localhost:11434
Default model: qwen3-coder
API key: Not required
Context: 32K+ tokens

Prerequisites: Ollama must be installed and running locally.

Ollama Cloud

Access Ollama’s hosted models via their cloud API. Configuration:

Base URL: https://ollama.com
Default models: glm-4.7:cloud, minimax-m2.1:cloud
API key: Required from ollama.com
Context: Varies by model

Prerequisites

Installing Ollama (Local)

Download Ollama

Visit ollama.com and download for your platform

Install

Follow platform-specific installation instructions

Verify Installation

ollama --version

Pull Model

ollama pull qwen3-coder

Ollama Cloud Setup

Create Account

Get API Key

Navigate to API settings and generate your API key

Configure CCS

ccs setup --preset ollama-cloud
# Enter your API key when prompted

Configuration

Local Ollama Setup

# Via API profile preset
ccs api create --preset ollama

# Or direct shortcut
ccs ollama "test local model"

# Manual config in ~/.ccs/config.yaml
profiles:
  ollama:
    env:
      ANTHROPIC_BASE_URL: "http://localhost:11434"
      ANTHROPIC_MODEL: "qwen3-coder"
      ANTHROPIC_AUTH_TOKEN: "ollama"

Ollama Cloud Setup

# ~/.ccs/config.yaml
profiles:
  ollama-cloud:
    env:
      ANTHROPIC_BASE_URL: "https://ollama.com"
      ANTHROPIC_MODEL: "glm-4.7:cloud"
      ANTHROPIC_AUTH_TOKEN: "YOUR_OLLAMA_CLOUD_API_KEY"

Model Selection

Popular Local Models

Model	Size	Context	Use Case
`qwen3-coder`	7B	32K	Coding (recommended)
`deepseek-coder`	6.7B	16K	Code completion
`codellama`	7B	16K	Code generation
`mistral`	7B	8K	General purpose

Pulling Models

# Install recommended coding model
ollama pull qwen3-coder

# List available models
ollama list

# Remove unused models
ollama rm model-name

Cloud Models

Model	Description
`glm-4.7:cloud`	GLM via Ollama Cloud
`minimax-m2.1:cloud`	Minimax via Ollama Cloud

Usage Examples

Local Ollama

# Basic usage
ccs ollama "explain this function"

# Switch models
ANTHROPIC_MODEL=deepseek-coder ccs ollama "review this code"

# Custom temperature
ANTHROPIC_TEMPERATURE=0.7 ccs ollama "generate unit tests"

Ollama Cloud

# Use cloud variant
ccs ollama-cloud "debug this error"

# Specific cloud model
ANTHROPIC_MODEL=minimax-m2.1:cloud ccs ollama-cloud "optimize performance"

Troubleshooting

Connection Refused

Symptom: Error: connect ECONNREFUSED 127.0.0.1:11434 Cause: Ollama service not running Solution:

# Start Ollama service
ollama serve

# Or on macOS/Windows, launch Ollama app

Model Not Found

Symptom: Error: model 'qwen3-coder' not found Cause: Model not pulled locally Solution:

# Pull the model first
ollama pull qwen3-coder

# Verify installation
ollama list

Slow Responses

Symptom: Long response times Causes & Solutions:

CPU-only inference: Use smaller model or add GPU support
Large model: Switch to smaller variant (e.g., qwen3-coder:3b)
Insufficient RAM: Close other apps, use quantized models

Optimize performance:

# Use smaller quantized model
ollama pull qwen3-coder:q4_0

# Update model in config
ANTHROPIC_MODEL=qwen3-coder:q4_0 ccs ollama "test"

Ollama Cloud API Errors

Symptom: 401 Unauthorized or 403 Forbidden Solution:

# Verify API key is correct
ccs config
# Navigate to ollama-cloud profile
# Re-enter API key

Performance Tuning

Context Length

# ~/.ccs/config.yaml
profiles:
  ollama:
    env:
      ANTHROPIC_MAX_TOKENS: "32768"  # Adjust based on model

Concurrency

Ollama handles concurrent requests via queue. For better performance:

# Increase parallel requests (Ollama config)
OLLAMA_NUM_PARALLEL=4 ollama serve

Cost Information

Variant	Cost	Privacy
Ollama (Local)	$0 (hardware only)	Complete - data never leaves machine
Ollama Cloud	Varies by usage	Depends on Ollama Cloud privacy policy

Storage Locations

Path	Description
`~/.ollama/models/`	Downloaded model files
`~/.ccs/config.yaml`	CCS profile configuration
`~/.ccs/ollama.settings.json`	Model preferences (if using Dashboard)

Ollama vs llama.cpp

Feature	Ollama	llama.cpp
Model format	Ollama format	GGUF (raw)
Setup	Easier (install + pull)	More manual
Model selection	Built-in model library	Any GGUF file
Performance	Good	Better (more optimization options)
Community	Large, many models	Smaller but growing
Best for	Getting started quickly	Fine-tuned control

Use Ollama for quick setup with curated models. Use llama.cpp if you need specific GGUF models or advanced tuning.

Next Steps

API Profiles

Configure custom Ollama endpoints

llama.cpp Provider

Alternative GGUF-based local inference

Dashboard

Manage models via web interface

Remote Proxy

Run Ollama on remote server

Getting Started

Providers

Features

Tutorials

Ollama Provider

Ollama Provider

Quick Start

Variants

Ollama (Local)

Ollama Cloud

Prerequisites

Installing Ollama (Local)

Ollama Cloud Setup

Configuration

Local Ollama Setup

Ollama Cloud Setup

Model Selection

Popular Local Models

Pulling Models

Cloud Models

Usage Examples

Local Ollama

Ollama Cloud

Troubleshooting

Connection Refused

Model Not Found

Slow Responses

Ollama Cloud API Errors

Performance Tuning

Context Length

Concurrency

Cost Information

Storage Locations

Ollama vs llama.cpp

Next Steps

API Profiles

llama.cpp Provider

Dashboard

Remote Proxy

Getting Started

Providers

Features

Tutorials

Documentation Index

​Ollama Provider

​Quick Start

​Variants

​Ollama (Local)

​Ollama Cloud

​Prerequisites

​Installing Ollama (Local)

​Ollama Cloud Setup

​Configuration

​Local Ollama Setup

​Ollama Cloud Setup

​Model Selection

​Popular Local Models

​Pulling Models

​Cloud Models

​Usage Examples

​Local Ollama

​Ollama Cloud

​Troubleshooting

​Connection Refused

​Model Not Found

​Slow Responses

​Ollama Cloud API Errors

​Performance Tuning

​Context Length

​Concurrency

​Cost Information

​Storage Locations

​Ollama vs llama.cpp

​Next Steps

API Profiles

llama.cpp Provider

Dashboard

Remote Proxy

Ollama Provider

Quick Start

Variants

Ollama (Local)

Ollama Cloud

Prerequisites

Installing Ollama (Local)

Ollama Cloud Setup

Configuration

Local Ollama Setup

Ollama Cloud Setup

Model Selection

Popular Local Models

Pulling Models

Cloud Models

Usage Examples

Local Ollama

Ollama Cloud

Troubleshooting

Connection Refused

Model Not Found

Slow Responses

Ollama Cloud API Errors

Performance Tuning

Context Length

Concurrency

Cost Information

Storage Locations

Ollama vs llama.cpp

Next Steps