TokenPak

Product overview

A local proxy that compresses context before it hits the LLM API.

TokenPak runs on 127.0.0.1 between your LLM client and the model provider. Every outbound request flows through a deterministic pipeline that trims context — repeated boilerplate, redundant file contents, stale system-prompt prefix — without changing the meaning of the prompt. Your agent gets the same answers for fewer tokens.

Compression is one of four first-class concerns. Routing, caching, and telemetry all live at the same layer: your traffic, your machine, your rules.

What it does

Context compression

Deterministic pipeline that trims repeated boilerplate, redundant file contents, and stale system-prompt prefix before your request hits the wire. Same answer, fewer tokens.

Model routing

Rules-based routing with fallbacks across providers (Anthropic, OpenAI, Google, Ollama, local). Swap the model without swapping the client.

Caching

Byte-preserved passthrough keeps provider-side cache hits intact. When TokenPak caches something itself it says so in the telemetry store — never mixed with provider cache hits.

Telemetry + cost tracking

Every request logged to a local SQLite store with a version-controlled schema. Savings are attributed: proxy-caused, client-caused, or unclassified — never conflated.

Architecture at a glance

Three planes. Every TokenPak surface maps onto exactly one of them.

Data plane

The traffic path

proxy/ + services/. Byte-level work: compression, cache, routing, wire-side telemetry. Every outbound request flows through here exactly once.

Control plane

Capability + diagnostics

MCP over stdio or Streamable HTTP. Tools, resources, prompts, status. No model requests flow on this plane — it only describes and observes.

Semantic layer

TIP-1.0 contracts

Canonical headers, metadata fields, telemetry events, capability labels. TokenPak is the reference implementation of the TokenPak Integration Protocol.

Full architecture — 18 canonical subsystems, plane ownership rules, byte-fidelity guarantees — lives on docs.tokenpak.ai/architecture.

Who it's for

Teams running LLM agents in production or in daily developer workflows where the token bill is starting to show. Claude Code, Cursor, Cline, Continue, Aider, Codex, plus any app built on the Anthropic or OpenAI SDK or LiteLLM.

If your tools already speak the provider API, TokenPak drops in under them. If you're building the tools yourself, there's an SDK. Nothing on this page is per-tool; the integrations are in the OSS package.

Try it locally.

pip install tokenpak && tokenpak integrate claude-code --apply. Takes a minute.