TokenPak — local LLM proxy

Local. Cheaper. Faster.

TokenPak is a local proxy that compresses your LLM context before it hits the API — fewer tokens, lower cost, same results.

Get Started View on GitHub

Why TokenPak

Built for the parts of an agent workload you cannot renegotiate.

Low latency

Under 50ms compression overhead on typical agent prompts. Your agent does not feel slower.

One-command setup

pip install tokenpak && tokenpak setup. Interactive wizard detects your API keys, picks a compression profile, and starts the proxy. Per-client auto-integration (tokenpak integrate) is on the roadmap.

Works with what you already use

Claude Code, Cursor, Cline, Continue, Aider, OpenAI SDK, Anthropic SDK, LiteLLM, Codex. No plugin rewrites.

Local and private

No cloud component. No credentials stored. Requests still go to your model providers, but compression happens on your machine.

How it works

Three steps. No cloud, no rewrites.

Step 1

Install

pip install tokenpak. Runs at 127.0.0.1 as a local proxy.

Step 2

Setup

tokenpak setup — interactive wizard wires your keys + starts the proxy. Point your LLM client at http://127.0.0.1:8766 via one env var.

Step 3

Save

Every request is compressed deterministically; savings logged locally.

Latest release

Refreshed automatically on every release + on a daily safety-net schedule.

Release

TokenPak v1.3.002 (retroactive attestation)

v1.3.002 · Apr 23, 2026

latest

**Retroactive SHA256SUMS attestation.**

GitHub PyPI Changelog

Documentation highlights

Curated entry points sourced from tokenpak/docs.

starter

See savings in one command.

pip install tokenpak && tokenpak integrate claude-code --apply. No cloud, no credentials stored, no code changes.

Get Started Read the docs