Context compression
Deterministic pipeline that trims repeated boilerplate, redundant file contents, and stale system-prompt prefix before your request hits the wire. Same answer, fewer tokens.
Product overview
TokenPak runs on 127.0.0.1 between your LLM client and the
model provider. Every outbound request flows through a deterministic pipeline that trims
context — repeated boilerplate, redundant file contents, stale system-prompt prefix —
without changing the meaning of the prompt. Your agent gets the same answers for fewer
tokens.
Compression is one of four first-class concerns. Routing, caching, and telemetry all live at the same layer: your traffic, your machine, your rules.
Deterministic pipeline that trims repeated boilerplate, redundant file contents, and stale system-prompt prefix before your request hits the wire. Same answer, fewer tokens.
Rules-based routing with fallbacks across providers (Anthropic, OpenAI, Google, Ollama, local). Swap the model without swapping the client.
Byte-preserved passthrough keeps provider-side cache hits intact. When TokenPak caches something itself it says so in the telemetry store — never mixed with provider cache hits.
Every request logged to a local SQLite store with a version-controlled schema. Savings are attributed: proxy-caused, client-caused, or unclassified — never conflated.
Three planes. Every TokenPak surface maps onto exactly one of them.
Data plane
proxy/ + services/. Byte-level work: compression, cache, routing, wire-side telemetry. Every outbound request flows through here exactly once.
Control plane
MCP over stdio or Streamable HTTP. Tools, resources, prompts, status. No model requests flow on this plane — it only describes and observes.
Semantic layer
Canonical headers, metadata fields, telemetry events, capability labels. TokenPak is the reference implementation of the TokenPak Integration Protocol.
Full architecture — 18 canonical subsystems, plane ownership rules, byte-fidelity guarantees — lives on docs.tokenpak.ai/architecture.
Teams running LLM agents in production or in daily developer workflows where the token bill is starting to show. Claude Code, Cursor, Cline, Continue, Aider, Codex, plus any app built on the Anthropic or OpenAI SDK or LiteLLM.
If your tools already speak the provider API, TokenPak drops in under them. If you're building the tools yourself, there's an SDK. Nothing on this page is per-tool; the integrations are in the OSS package.
pip install tokenpak && tokenpak integrate claude-code --apply. Takes a minute.