TokenPak — local LLM proxy

Local. Context-lean. Measurable.

TokenPak is a local proxy that packs your LLM context before it hits the API — fewer repeated tokens on the wire, more reuse opportunities, and savings you can measure per-request.

Get Started View on GitHub

TokenPak doesn't replace your AI stack.

It adds the missing logistics layer: packing, routing, reusable context, guardrails, orchestration, and per-request records.

Six capabilities

The logistics layer, as a capability grid.

Pack

Deterministically pack context before it goes on the wire — stop re-sending the same boilerplate, file contents, and system prompts.

Route

Send each request to the right model, with fallback rules you control.

Reuse

Recall and reuse context from local PAKs instead of rebuilding it every session.

Guard

Spend and safety guardrails run as a side-channel gatehouse across every request.

Dispatch

Coordinate scoped, multi-step and multi-agent work. Preview (alpha) — not yet in a published release.

Record

Every request leaves a local record — what was sent, reused, observed, and charged.

How a request flows

One linear path, packed and recorded locally before it leaves your machine.

Request flow

Raw context
→
Packing Station
→
Route
→
Send
→
Provider
→
Record

Reuse feed: PAKs feed into the Packing Station — recall and reuse context instead of rebuilding it.

Gatehouse (Guard): spend and safety guardrails run as a side-channel overlay before Send.

Where TokenPak fits

TokenPak overlaps with tools you may already run — and adds a layer they don't. Run them together.

Where it fits

Your agent

Claude Code
Cursor · Cline
→
TokenPak (local)

pack · measure
guard · record
→
Your gateway

LiteLLM /
OpenRouter / …
→
Model provider

Anthropic · OpenAI
Google Gemini

Dispatch Center dashboard (preview) — orchestration for multi-step and multi-agent work, around the flow.

Tool type	Overlaps on	TokenPak adds	Together?
Gateways / routers	routing	deterministic context packing + reusable PAKs	Run both.
Observability tools	measurement	pre-send packing + per-request context-level attribution	Run both.
MCP-based workflows	ecosystem coordination	a semantic contract (TIP) for packing, routing, cost, telemetry	Composes.

Savings

Reduce the repeated context your agents re-send. Improve cache reuse. Receipt-backed measurement is coming from benchmarked workloads.

Open source & Pro

TokenPak's core is open source (Apache-2.0). Pro adds team dashboards, advanced routing, and enterprise controls.

Pro is delivered as the tokenpak-paid package via a separate index.

Latest release

Refreshed automatically on every release + on a daily safety-net schedule.

Release

TokenPak v1.10.0

v1.10.0 · Jun 28, 2026

latest

### Added - **TokenPak Dispatch graduates from preview to a released feature.** The `tokenpak dispatch` command (intake/routing, Decision Inbox, run-ledger lifecycle, observability) now ships in the released `pip install tokenpak` package — the Dispatch engine, its registry/schema data, and the user guide are included in the wheel. Delivery/receipt remain an explicit post-alpha preview (no live station execution wired yet).

GitHub PyPI Changelog

Documentation highlights

Curated entry points sourced from tokenpak/docs.

starter

See savings in one command.

pip install tokenpak && tokenpak setup. No cloud component; credentials stay in your environment and provider flow.

Get Started Read the docs