Low latency
Under 50ms compression overhead on typical agent prompts. Your agent does not feel slower.
TokenPak — local LLM proxy
TokenPak is a local proxy that compresses your LLM context before it hits the API — fewer tokens, lower cost, same results.
Built for the parts of an agent workload you cannot renegotiate.
Under 50ms compression overhead on typical agent prompts. Your agent does not feel slower.
pip install tokenpak && tokenpak setup. Interactive wizard detects your API keys, picks a compression profile, and starts the proxy. Per-client auto-integration (tokenpak integrate) is on the roadmap.
Claude Code, Cursor, Cline, Continue, Aider, OpenAI SDK, Anthropic SDK, LiteLLM, Codex. No plugin rewrites.
No cloud component. No credentials stored. Requests still go to your model providers, but compression happens on your machine.
Three steps. No cloud, no rewrites.
Step 1
pip install tokenpak. Runs at 127.0.0.1 as a local proxy.
Step 2
tokenpak setup — interactive wizard wires your keys + starts the proxy. Point your LLM client at http://127.0.0.1:8766 via one env var.
Step 3
Every request is compressed deterministically; savings logged locally.
Refreshed automatically on every release + on a daily safety-net schedule.
**Retroactive SHA256SUMS attestation.**
Curated entry points sourced from tokenpak/docs.
starter
Install TokenPak and see savings in one command.
starter
pip install options, OS notes, troubleshooting install failures.
reference
Every tokenpak verb, flag, and exit code.
reference
Proxy-centered model; three planes; 18 subsystems at a glance.
support
Common symptoms and the fixes that work.
pip install tokenpak && tokenpak integrate claude-code --apply. No cloud, no credentials stored, no code changes.