Architecture

How whisp works under the hood.

Overview

Whisp consists of three main components:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Shell     │────▶│   Daemon    │────▶│ AI Provider │
│ Integration │◀────│   (Rust)    │◀────│   (API)     │
└─────────────┘     └─────────────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    │  Resilience │
                    │    Layer    │
                    └─────────────┘

Components

Shell Integration

Minimal shell code (bash, zsh, fish) that delegates to the Rust binary:

# The , function calls whisp shell query
, find large files
# → whisp shell query "find large files"

Shell integration provides:

Shortcuts (,, ,., ,d, etc.)
Error detection via preexec/precmd hooks
Session ID management
Stdin piping support

Daemon Process

The daemon is a Rust binary that:

Listens on a Unix socket (/tmp/whisp.sock)
Gathers context (cwd, shell, history, project type, git info)
Constructs prompts for the AI
Sends requests through the resilience layer
Returns structured responses

The daemon runs continuously to avoid cold-start latency.

AI Provider Layer

Abstraction over multiple AI providers:

Provider	Library	Features
OpenAI	`async-openai`	Token tracking, custom endpoints
Anthropic	`reqwest`	System prompt extraction
Ollama	`reqwest`	Local inference, no API key
Gemini	`reqwest`	System instructions
Cerebras	`reqwest`	Fast inference

Resilience Layer

Wraps all providers with:

Rate limiting: Token bucket algorithm via governor
Retries: Exponential backoff with jitter
Error handling: Graceful degradation

Retryable errors: 429 (rate limit), 5xx (server), timeouts
Non-retryable: 400, 401, 403, 404

Context Gathering

For each query, whisp collects:

Context	Source	Purpose
Working directory	`pwd`	Path context
Shell type	Environment	Shell-specific syntax
OS info	`uname` / system files	OS-specific commands
Project type	Marker files	Framework awareness
Git branch	`git rev-parse`	Repository context
Git dirty	`git status`	Uncommitted changes
Recent commands	Session history	Pattern awareness

Project Type Detection

Detected via marker files:

Project Type	Marker File
Rust	`Cargo.toml`
Node.js	`package.json`
Python	`pyproject.toml`, `setup.py`
Go	`go.mod`
Ruby	`Gemfile`
Java	`pom.xml`, `build.gradle`
C++	`CMakeLists.txt`
Make	`Makefile`

Request Types

The daemon handles multiple request types:

Type	Purpose	Shortcut
Query	Generate command	`,`
Explain	Explain command	`,.`
DryRun	Preview effects	`,d`
Variants	Alternative commands	`,,`
Error	Fix suggestions	(automatic)
Pipe	Process stdin	`cat \| ,`
Chat	Multi-turn conversation	`whisp chat`
SearchHistory	Find past commands	`,/`

Data Flow

Normal Query

1. User types: , find large files
2. Shell calls: whisp shell query "find large files"
3. CLI connects to daemon via Unix socket
4. Daemon gathers context (cwd, shell, project, git)
5. Daemon constructs prompt with context
6. Request sent through resilience layer to provider
7. Provider returns generated command
8. Daemon parses response, extracts command/explanation
9. CLI displays result, prompts for confirmation
10. If confirmed, command added to shell history

Error Recovery

1. User runs: gcc main.c -o main
2. Command fails with exit code 1
3. Shell precmd hook captures error + stderr
4. whisp shell send-error "gcc main.c..." 1 "undefined reference..."
5. Daemon sends error context to AI
6. AI suggests fix: gcc main.c -o main -lm
7. User prompted to run fix

Security Features

Secret Redaction

Before sending to AI, sensitive patterns are redacted:

API keys (OpenAI, Anthropic, AWS, GitHub, etc.)
Passwords in URLs and environment variables
PEM private keys
Database connection strings
Bearer tokens

Destructive Command Detection

Client-side pattern matching warns about dangerous commands:

rm -rf, dd if=, mkfs, wipefs, shred, chmod -R 777 /

File Permissions

Config file: 0600 (user-only, contains API keys)
Socket: Created with umask 0077
PID file: 0600 with exclusive lock
Temp files: 0600 with ownership validation

Connection Limits

Daemon limits concurrent connections (100) to prevent resource exhaustion.

Audit Trail

All interactions logged to ~/.whisp/history.jsonl (line-delimited JSON):

{
  "timestamp": "2024-01-15T14:32:00Z",
  "session_id": "abc123",
  "entry_type": "query",
  "command": "find . -size +100M",
  "cwd": "/home/user/project",
  "query": "find large files",
  "response": {
    "command": "find . -size +100M",
    "explanation": "Finds files larger than 100MB in the current directory"
  },
  "project_type": "rust",
  "git_branch": "main",
  "token_usage": {
    "input_tokens": 156,
    "output_tokens": 42
  },
  "provider": "openai",
  "model": "gpt-5-nano-2025-08-07",
  "duration_ms": 234
}

The log automatically rotates at 10MB, keeping the last 3 rotated files (~40MB total).

Performance

Typical response times:

Operation	Time
Context gathering	~5ms
Socket communication	~1ms
API request (gpt-5-nano)	~200ms
API request (Ollama local)	~100-500ms
Total round trip	~250ms

The daemon stays warm to avoid startup overhead.

Metrics Tracked

Request count and error count
Response times (avg, p50, p95, p99)
Token usage (input/output)
Request type breakdown
Memory usage (Linux)

Token Usage

Each request consumes tokens:

Component	Tokens
System prompt	~200
Context (cwd, shell, etc.)	~50
Recent commands	~100
User query	~20
Total input	~370
Generated response	~80

Chat mode uses more tokens due to conversation history (last 20 messages retained).

Architecture

How whisp works under the hood.

Overview

Whisp consists of three main components:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Shell     │────▶│   Daemon    │────▶│ AI Provider │
│ Integration │◀────│   (Rust)    │◀────│   (API)     │
└─────────────┘     └─────────────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    │  Resilience │
                    │    Layer    │
                    └─────────────┘

Components

Shell Integration

Minimal shell code (bash, zsh, fish) that delegates to the Rust binary:

# The , function calls whisp shell query
, find large files
# → whisp shell query "find large files"

Shell integration provides:

Shortcuts (,, ,., ,d, etc.)
Error detection via preexec/precmd hooks
Session ID management
Stdin piping support

Daemon Process

The daemon is a Rust binary that:

Listens on a Unix socket (/tmp/whisp.sock)
Gathers context (cwd, shell, history, project type, git info)
Constructs prompts for the AI
Sends requests through the resilience layer
Returns structured responses

The daemon runs continuously to avoid cold-start latency.

AI Provider Layer

Abstraction over multiple AI providers:

Provider	Library	Features
OpenAI	`async-openai`	Token tracking, custom endpoints
Anthropic	`reqwest`	System prompt extraction
Ollama	`reqwest`	Local inference, no API key
Gemini	`reqwest`	System instructions
Cerebras	`reqwest`	Fast inference

Resilience Layer

Wraps all providers with:

Rate limiting: Token bucket algorithm via governor
Retries: Exponential backoff with jitter
Error handling: Graceful degradation

Retryable errors: 429 (rate limit), 5xx (server), timeouts
Non-retryable: 400, 401, 403, 404

Context Gathering

For each query, whisp collects:

Context	Source	Purpose
Working directory	`pwd`	Path context
Shell type	Environment	Shell-specific syntax
OS info	`uname` / system files	OS-specific commands
Project type	Marker files	Framework awareness
Git branch	`git rev-parse`	Repository context
Git dirty	`git status`	Uncommitted changes
Recent commands	Session history	Pattern awareness

Project Type Detection

Detected via marker files:

Project Type	Marker File
Rust	`Cargo.toml`
Node.js	`package.json`
Python	`pyproject.toml`, `setup.py`
Go	`go.mod`
Ruby	`Gemfile`
Java	`pom.xml`, `build.gradle`
C++	`CMakeLists.txt`
Make	`Makefile`

Request Types

The daemon handles multiple request types:

Type	Purpose	Shortcut
Query	Generate command	`,`
Explain	Explain command	`,.`
DryRun	Preview effects	`,d`
Variants	Alternative commands	`,,`
Error	Fix suggestions	(automatic)
Pipe	Process stdin	`cat \| ,`
Chat	Multi-turn conversation	`whisp chat`
SearchHistory	Find past commands	`,/`

Data Flow

Normal Query

1. User types: , find large files
2. Shell calls: whisp shell query "find large files"
3. CLI connects to daemon via Unix socket
4. Daemon gathers context (cwd, shell, project, git)
5. Daemon constructs prompt with context
6. Request sent through resilience layer to provider
7. Provider returns generated command
8. Daemon parses response, extracts command/explanation
9. CLI displays result, prompts for confirmation
10. If confirmed, command added to shell history

Error Recovery

1. User runs: gcc main.c -o main
2. Command fails with exit code 1
3. Shell precmd hook captures error + stderr
4. whisp shell send-error "gcc main.c..." 1 "undefined reference..."
5. Daemon sends error context to AI
6. AI suggests fix: gcc main.c -o main -lm
7. User prompted to run fix

Security Features

Secret Redaction

Before sending to AI, sensitive patterns are redacted:

API keys (OpenAI, Anthropic, AWS, GitHub, etc.)
Passwords in URLs and environment variables
PEM private keys
Database connection strings
Bearer tokens

Destructive Command Detection

Client-side pattern matching warns about dangerous commands:

rm -rf, dd if=, mkfs, wipefs, shred, chmod -R 777 /

File Permissions

Config file: 0600 (user-only, contains API keys)
Socket: Created with umask 0077
PID file: 0600 with exclusive lock
Temp files: 0600 with ownership validation

Connection Limits

Daemon limits concurrent connections (100) to prevent resource exhaustion.

Audit Trail

All interactions logged to ~/.whisp/history.jsonl (line-delimited JSON):

{
  "timestamp": "2024-01-15T14:32:00Z",
  "session_id": "abc123",
  "entry_type": "query",
  "command": "find . -size +100M",
  "cwd": "/home/user/project",
  "query": "find large files",
  "response": {
    "command": "find . -size +100M",
    "explanation": "Finds files larger than 100MB in the current directory"
  },
  "project_type": "rust",
  "git_branch": "main",
  "token_usage": {
    "input_tokens": 156,
    "output_tokens": 42
  },
  "provider": "openai",
  "model": "gpt-5-nano-2025-08-07",
  "duration_ms": 234
}

The log automatically rotates at 10MB, keeping the last 3 rotated files (~40MB total).

Performance

Typical response times:

Operation	Time
Context gathering	~5ms
Socket communication	~1ms
API request (gpt-5-nano)	~200ms
API request (Ollama local)	~100-500ms
Total round trip	~250ms

The daemon stays warm to avoid startup overhead.

Metrics Tracked

Request count and error count
Response times (avg, p50, p95, p99)
Token usage (input/output)
Request type breakdown
Memory usage (Linux)

Token Usage

Each request consumes tokens:

Component	Tokens
System prompt	~200
Context (cwd, shell, etc.)	~50
Recent commands	~100
User query	~20
Total input	~370
Generated response	~80

Chat mode uses more tokens due to conversation history (last 20 messages retained).

Architecture

On this page

Architecture

On this page