Skip to main content

Extended Thinking Mode

Control model reasoning depth with the --thinking flag for supported providers.
Available since v7.38.0 - Extended context window support with --1m flag for 1M token context.

Overview

Extended thinking allocates compute tokens for step-by-step reasoning before generating responses. Models “think out loud” internally to solve complex problems through systematic analysis. Use for: Complex architecture, multi-step reasoning, debugging obscure issues, mathematical proofs, strategic planning.

Priority Order

Thinking settings are resolved in this order (highest wins):
  1. --thinking CLI flag
  2. CCS_THINKING environment variable
  3. config.yaml thinking section

CCS_THINKING Environment Variable

Override thinking per-session without changing config:
# Named levels
CCS_THINKING=high ccs agy "analyze this"
CCS_THINKING=auto ccs gemini "complex task"

# Disable thinking
CCS_THINKING=off ccs codex "quick task"
CCS_THINKING=none ccs codex "quick task"
CCS_THINKING=disabled ccs codex "quick task"
CCS_THINKING=0 ccs codex "quick task"

# Integer budget (0-100000)
CCS_THINKING=24576 ccs agy "deep analysis"
Accepted values:
  • Named levels: minimal, low, medium, high, xhigh, auto
  • Off values: off, none, disabled, 0 (all equivalent — disable thinking)
  • Integer budget: 0100000

Two Modes

Budget Mode (Token Count)

Specify exact token budget for thinking phase.
ccs gemini --thinking 8192      # Allocate 8K tokens
ccs agy --thinking 24576         # Deep analysis

Level Mode (Named Levels)

Use predefined levels for simplified control.
ccs codex --thinking low         # Quick (1K tokens)
ccs codex --thinking medium      # Standard (8K tokens)
ccs codex --thinking high        # Deep (24K tokens)
ccs codex --thinking xhigh       # Maximum (32K tokens)
Level mappings: minimal=512, low=1024, medium=8192, high=24576, xhigh=32768

Provider Support Matrix

ProviderModelTypeRangeDynamic
AntigravityClaude Opus 4.6budget1024-128000
AntigravityClaude Opus 4.5 Thinkingbudget1024-100000
AntigravityClaude Sonnet 4.5 Thinkingbudget1024-100000
GeminiGemini 2.5 Probudget128-32768
GeminiGemini 3 Prolevelslow, high
CodexGPT-5.2 Codexlevelsmedium, high, xhigh
CodexGPT-5 Minilevelsmedium, high
  • Type: budget (numeric) vs. levels (named presets)
  • Dynamic: Supports auto mode (model decides dynamically)

Usage

ccs agy --thinking auto           # Dynamic budget selection
ccs gemini --thinking auto        # Model optimizes cost/quality

Custom Budgets

ccs gemini --thinking 1024        # Light thinking
ccs gemini --thinking 32768       # Deep analysis
ccs agy --thinking 100000         # Maximum (Antigravity only)

Named Levels

ccs codex --thinking off          # Disable thinking
ccs codex --thinking medium       # Balanced
ccs codex --thinking xhigh        # Maximum effort

Cross-Type Compatibility

CCS automatically converts between budgets and levels.
ccs gemini --thinking high        # → 24576 tokens
ccs codex --thinking 8192         # → "medium" level

Auto-Capping Behavior

CCS validates values and auto-adjusts invalid inputs.

Budget Clamping

ccs gemini --thinking 50000       # → Clamped to 32768 (max)

Level Capping

ccs codex --thinking xhigh        # → Capped to "high" (GPT-5 Mini)

Fuzzy Matching

ccs codex --thinking hi           # → "high"
ccs codex --thinking med          # → "medium"

Cost Implications

Higher budgets = more tokens = higher cost.
  • low (1K): Minimal cost, fast
  • medium (8K): Moderate cost, balanced
  • high (24K): Higher cost, deep analysis
  • xhigh (32K): Maximum cost, maximum depth
  • Custom (100K): Very high cost (Antigravity only)
Best practices: Use auto for optimization, reserve high budgets for complex problems, start with low/medium for routine tasks.

Troubleshooting

Model Doesn’t Support Thinking

Error: Model gemini-claude-sonnet-4-5 does not support extended thinking Solution: Use thinking-enabled variants (e.g., gemini-claude-sonnet-4-5-thinking) or switch to a supported reasoning-first profile such as ccs km when you need Kimi API reasoning. Legacy ccs glmt remains compatibility-only.

Budget Exceeds Maximum

Warning: Thinking budget 50000 exceeds maximum. Clamped to 32768. Solution: Use budget within range or switch to model with higher limit.

Level Not Supported

Warning: Level "xhigh" not valid for gpt-5-mini. Mapped to "high". Solution: CCS auto-maps to closest valid level. Check support matrix.

Dynamic Thinking Unavailable

Warning: Model does not support dynamic/auto thinking Solution: Specify explicit level or budget. Check “Dynamic” column in matrix.

ccs config thinking Command

Manage thinking configuration interactively or via flags:
# Show current thinking config
ccs config thinking

# Set thinking mode
ccs config thinking --mode auto           # Dynamic tier-based defaults
ccs config thinking --mode off            # Disable thinking entirely
ccs config thinking --mode manual         # Use explicit override

# Persistent override (applies to all providers)
ccs config thinking --override high       # Set persistent override level
ccs config thinking --override 24576      # Set persistent override budget
ccs config thinking --clear-override      # Remove persistent override

# Per-tier defaults
ccs config thinking --tier opus high      # Set opus tier to "high"
ccs config thinking --tier sonnet medium  # Set sonnet tier to "medium"
ccs config thinking --tier haiku low      # Set haiku tier to "low"

# Per-provider overrides
ccs config thinking --provider-override gemini opus xhigh
ccs config thinking --provider-override agy sonnet high
ccs config thinking --clear-provider-override gemini        # Clear all gemini overrides
ccs config thinking --clear-provider-override gemini opus   # Clear gemini opus only

Dashboard Thinking Settings

The CCS Dashboard includes a Thinking settings panel with:
  • Mode selector — auto / off / manual
  • Persistent Override panel — set a global level that overrides tier defaults
  • Tier defaults — configure opus/sonnet/haiku default levels
  • Provider Overrides section — per-provider tier level customization

Extended Context Window

Available since v7.38.0
Enable 1M token context window for supported models using the --1m flag.

Usage

# Enable 1M context
ccs gemini --1m "analyze this large codebase"

# Disable (use default context)
ccs gemini --no-1m "quick prompt"

# Combine with thinking mode
ccs gemini --thinking auto --1m "deep analysis of large project"

How It Works

The --1m flag appends [1m] suffix to model names, routing to extended context variants:
# Without --1m
gemini-claude-sonnet-4-5

# With --1m
gemini-claude-sonnet-4-5[1m]

Provider Support

Auto-Enabled:
  • Native Gemini models (always use 1M context by default)
Opt-In:
  • Claude models via Gemini proxy
  • Antigravity models
  • Codex models
Not Supported:
  • Settings-based profiles (GLM, KM, custom APIs)
  • Local models (Ollama)

Best Practices

When to use --1m:
  • Large codebase analysis
  • Multi-file refactoring
  • Documentation generation across many files
  • Complex architectural planning
When NOT to use:
  • Simple queries (wastes quota)
  • Short prompts (no benefit)
  • Rate-limited scenarios (uses more quota faster)

Cost Implications

Extended context consumes quota faster. Use selectively for tasks requiring large context windows.

Disable Extended Context

# Force standard context
ccs gemini --no-1m "your prompt"
Useful when:
  • Quota conservation needed
  • Faster response time preferred
  • Task doesn’t require large context