Extended Thinking Mode
Control model reasoning depth with the--thinking flag for supported providers.
Available since v7.38.0 - Extended context window support with
--1m flag for 1M token context.Overview
Extended thinking allocates compute tokens for step-by-step reasoning before generating responses. Models “think out loud” internally to solve complex problems through systematic analysis. Use for: Complex architecture, multi-step reasoning, debugging obscure issues, mathematical proofs, strategic planning.Priority Order
Thinking settings are resolved in this order (highest wins):--thinkingCLI flagCCS_THINKINGenvironment variableconfig.yamlthinking section
CCS_THINKING Environment Variable
Override thinking per-session without changing config:
- Named levels:
minimal,low,medium,high,xhigh,auto - Off values:
off,none,disabled,0(all equivalent — disable thinking) - Integer budget:
0–100000
Two Modes
Budget Mode (Token Count)
Specify exact token budget for thinking phase.Level Mode (Named Levels)
Use predefined levels for simplified control.Provider Support Matrix
| Provider | Model | Type | Range | Dynamic |
|---|---|---|---|---|
| Antigravity | Claude Opus 4.6 | budget | 1024-128000 | ✓ |
| Antigravity | Claude Opus 4.5 Thinking | budget | 1024-100000 | ✓ |
| Antigravity | Claude Sonnet 4.5 Thinking | budget | 1024-100000 | ✓ |
| Gemini | Gemini 2.5 Pro | budget | 128-32768 | ✓ |
| Gemini | Gemini 3 Pro | levels | low, high | ✓ |
| Codex | GPT-5.2 Codex | levels | medium, high, xhigh | ✗ |
| Codex | GPT-5 Mini | levels | medium, high | ✗ |
- Type: budget (numeric) vs. levels (named presets)
- Dynamic: Supports
automode (model decides dynamically)
Usage
Auto Mode (Recommended)
Custom Budgets
Named Levels
Cross-Type Compatibility
CCS automatically converts between budgets and levels.Auto-Capping Behavior
CCS validates values and auto-adjusts invalid inputs.Budget Clamping
Level Capping
Fuzzy Matching
Cost Implications
Higher budgets = more tokens = higher cost.- low (1K): Minimal cost, fast
- medium (8K): Moderate cost, balanced
- high (24K): Higher cost, deep analysis
- xhigh (32K): Maximum cost, maximum depth
- Custom (100K): Very high cost (Antigravity only)
auto for optimization, reserve high budgets for complex problems, start with low/medium for routine tasks.
Troubleshooting
Model Doesn’t Support Thinking
Error:Model gemini-claude-sonnet-4-5 does not support extended thinking
Solution: Use thinking-enabled variants (e.g., gemini-claude-sonnet-4-5-thinking) or switch to a supported reasoning-first profile such as ccs km when you need Kimi API reasoning. Legacy ccs glmt remains compatibility-only.
Budget Exceeds Maximum
Warning:Thinking budget 50000 exceeds maximum. Clamped to 32768.
Solution: Use budget within range or switch to model with higher limit.
Level Not Supported
Warning:Level "xhigh" not valid for gpt-5-mini. Mapped to "high".
Solution: CCS auto-maps to closest valid level. Check support matrix.
Dynamic Thinking Unavailable
Warning:Model does not support dynamic/auto thinking
Solution: Specify explicit level or budget. Check “Dynamic” column in matrix.
ccs config thinking Command
Manage thinking configuration interactively or via flags:
Dashboard Thinking Settings
The CCS Dashboard includes a Thinking settings panel with:- Mode selector — auto / off / manual
- Persistent Override panel — set a global level that overrides tier defaults
- Tier defaults — configure opus/sonnet/haiku default levels
- Provider Overrides section — per-provider tier level customization
Related
- CLI Flags Reference -
--thinkingflag syntax,CCS_THINKINGenv var - Antigravity Provider - Budget mode, 100K max
- Gemini Provider - Budget/level hybrid
- Codex Provider - Level-based, maxLevel caps
- GLMT Deprecation - Legacy compatibility and migration guidance
Extended Context Window
Available since v7.38.0
--1m flag.
Usage
How It Works
The--1m flag appends [1m] suffix to model names, routing to extended context variants:
Provider Support
Auto-Enabled:- Native Gemini models (always use 1M context by default)
- Claude models via Gemini proxy
- Antigravity models
- Codex models
- Settings-based profiles (GLM, KM, custom APIs)
- Local models (Ollama)
Best Practices
When to use--1m:
- Large codebase analysis
- Multi-file refactoring
- Documentation generation across many files
- Complex architectural planning
- Simple queries (wastes quota)
- Short prompts (no benefit)
- Rate-limited scenarios (uses more quota faster)
Cost Implications
Extended context consumes quota faster. Use selectively for tasks requiring large context windows.Disable Extended Context
- Quota conservation needed
- Faster response time preferred
- Task doesn’t require large context
