Architecture
How whisp works under the hood.
Architecture
How whisp works under the hood.
Overview
Whisp consists of three main components:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Shell │────▶│ Daemon │────▶│ AI Provider │
│ Integration │◀────│ (Rust) │◀────│ (API) │
└─────────────┘ └─────────────┘ └─────────────┘
│
┌──────┴──────┐
│ Resilience │
│ Layer │
└─────────────┘Components
Shell Integration
Minimal shell code (bash, zsh, fish) that delegates to the Rust binary:
# The , function calls whisp shell query
, find large files
# → whisp shell query "find large files"Shell integration provides:
- Shortcuts (
,,,.,,d, etc.) - Error detection via preexec/precmd hooks
- Session ID management
- Stdin piping support
Daemon Process
The daemon is a Rust binary that:
- Listens on a Unix socket (
/tmp/whisp.sock) - Gathers context (cwd, shell, history, project type, git info)
- Constructs prompts for the AI
- Sends requests through the resilience layer
- Returns structured responses
The daemon runs continuously to avoid cold-start latency.
AI Provider Layer
Abstraction over multiple AI providers:
| Provider | Library | Features |
|---|---|---|
| OpenAI | async-openai | Token tracking, custom endpoints |
| Anthropic | reqwest | System prompt extraction |
| Ollama | reqwest | Local inference, no API key |
| Gemini | reqwest | System instructions |
| Cerebras | reqwest | Fast inference |
Resilience Layer
Wraps all providers with:
- Rate limiting: Token bucket algorithm via
governor - Retries: Exponential backoff with jitter
- Error handling: Graceful degradation
Retryable errors: 429 (rate limit), 5xx (server), timeouts
Non-retryable: 400, 401, 403, 404Context Gathering
For each query, whisp collects:
| Context | Source | Purpose |
|---|---|---|
| Working directory | pwd | Path context |
| Shell type | Environment | Shell-specific syntax |
| OS info | uname / system files | OS-specific commands |
| Project type | Marker files | Framework awareness |
| Git branch | git rev-parse | Repository context |
| Git dirty | git status | Uncommitted changes |
| Recent commands | Session history | Pattern awareness |
Project Type Detection
Detected via marker files:
| Project Type | Marker File |
|---|---|
| Rust | Cargo.toml |
| Node.js | package.json |
| Python | pyproject.toml, setup.py |
| Go | go.mod |
| Ruby | Gemfile |
| Java | pom.xml, build.gradle |
| C++ | CMakeLists.txt |
| Make | Makefile |
Request Types
The daemon handles multiple request types:
| Type | Purpose | Shortcut |
|---|---|---|
| Query | Generate command | , |
| Explain | Explain command | ,. |
| DryRun | Preview effects | ,d |
| Variants | Alternative commands | ,, |
| Error | Fix suggestions | (automatic) |
| Pipe | Process stdin | cat | , |
| Chat | Multi-turn conversation | whisp chat |
| SearchHistory | Find past commands | ,/ |
Data Flow
Normal Query
1. User types: , find large files
2. Shell calls: whisp shell query "find large files"
3. CLI connects to daemon via Unix socket
4. Daemon gathers context (cwd, shell, project, git)
5. Daemon constructs prompt with context
6. Request sent through resilience layer to provider
7. Provider returns generated command
8. Daemon parses response, extracts command/explanation
9. CLI displays result, prompts for confirmation
10. If confirmed, command added to shell historyError Recovery
1. User runs: gcc main.c -o main
2. Command fails with exit code 1
3. Shell precmd hook captures error + stderr
4. whisp shell send-error "gcc main.c..." 1 "undefined reference..."
5. Daemon sends error context to AI
6. AI suggests fix: gcc main.c -o main -lm
7. User prompted to run fixSecurity Features
Secret Redaction
Before sending to AI, sensitive patterns are redacted:
- API keys (OpenAI, Anthropic, AWS, GitHub, etc.)
- Passwords in URLs and environment variables
- PEM private keys
- Database connection strings
- Bearer tokens
Destructive Command Detection
Client-side pattern matching warns about dangerous commands:
rm -rf, dd if=, mkfs, wipefs, shred, chmod -R 777 /File Permissions
- Config file:
0600(user-only, contains API keys) - Socket: Created with
umask 0077 - PID file:
0600with exclusive lock - Temp files:
0600with ownership validation
Connection Limits
Daemon limits concurrent connections (100) to prevent resource exhaustion.
Audit Trail
All interactions logged to ~/.whisp/history.jsonl (line-delimited JSON):
{
"timestamp": "2024-01-15T14:32:00Z",
"session_id": "abc123",
"entry_type": "query",
"command": "find . -size +100M",
"cwd": "/home/user/project",
"query": "find large files",
"response": {
"command": "find . -size +100M",
"explanation": "Finds files larger than 100MB in the current directory"
},
"project_type": "rust",
"git_branch": "main",
"token_usage": {
"input_tokens": 156,
"output_tokens": 42
},
"provider": "openai",
"model": "gpt-5-nano-2025-08-07",
"duration_ms": 234
}The log automatically rotates at 10MB, keeping the last 3 rotated files (~40MB total).
Performance
Typical response times:
| Operation | Time |
|---|---|
| Context gathering | ~5ms |
| Socket communication | ~1ms |
| API request (gpt-5-nano) | ~200ms |
| API request (Ollama local) | ~100-500ms |
| Total round trip | ~250ms |
The daemon stays warm to avoid startup overhead.
Metrics Tracked
- Request count and error count
- Response times (avg, p50, p95, p99)
- Token usage (input/output)
- Request type breakdown
- Memory usage (Linux)
Token Usage
Each request consumes tokens:
| Component | Tokens |
|---|---|
| System prompt | ~200 |
| Context (cwd, shell, etc.) | ~50 |
| Recent commands | ~100 |
| User query | ~20 |
| Total input | ~370 |
| Generated response | ~80 |
Chat mode uses more tokens due to conversation history (last 20 messages retained).