TokenPak

TokenPak — local LLM proxy

Local. Cheaper. Faster.

TokenPak is a local proxy that compresses your LLM context before it hits the API — fewer tokens, lower cost, same results.

Why TokenPak

Built for the parts of an agent workload you cannot renegotiate.

Low latency

Under 50ms compression overhead on typical agent prompts. Your agent does not feel slower.

One-command setup

pip install tokenpak && tokenpak setup. Interactive wizard detects your API keys, picks a compression profile, and starts the proxy. Per-client auto-integration (tokenpak integrate) is on the roadmap.

Works with what you already use

Claude Code, Cursor, Cline, Continue, Aider, OpenAI SDK, Anthropic SDK, LiteLLM, Codex. No plugin rewrites.

Local and private

No cloud component. No credentials stored. Requests still go to your model providers, but compression happens on your machine.

How it works

Three steps. No cloud, no rewrites.

Step 1

Install

pip install tokenpak. Runs at 127.0.0.1 as a local proxy.

Step 2

Setup

tokenpak setup — interactive wizard wires your keys + starts the proxy. Point your LLM client at http://127.0.0.1:8766 via one env var.

Step 3

Save

Every request is compressed deterministically; savings logged locally.

Latest release

Refreshed automatically on every release + on a daily safety-net schedule.

Documentation highlights

Curated entry points sourced from tokenpak/docs.

starter

Installation

pip install options, OS notes, troubleshooting install failures.

reference

CLI Reference

Every tokenpak verb, flag, and exit code.

reference

Architecture

Proxy-centered model; three planes; 18 subsystems at a glance.

See savings in one command.

pip install tokenpak && tokenpak integrate claude-code --apply. No cloud, no credentials stored, no code changes.