TokenPak

TokenPak — local LLM proxy

Local. Context-lean. Measurable.

TokenPak is a local proxy that packs your LLM context before it hits the API — fewer repeated tokens on the wire, more reuse opportunities, and savings you can measure per-request.

TokenPak doesn't replace your AI stack.

It adds the missing logistics layer: packing, routing, reusable context, guardrails, orchestration, and per-request records.

Six capabilities

The logistics layer, as a capability grid.

Pack

Deterministically pack context before it goes on the wire — stop re-sending the same boilerplate, file contents, and system prompts.

Route

Send each request to the right model, with fallback rules you control.

Reuse

Recall and reuse context from local PAKs instead of rebuilding it every session.

Guard

Spend and safety guardrails run as a side-channel gatehouse across every request.

Dispatch

Coordinate scoped, multi-step and multi-agent work. Preview (alpha) — not yet in a published release.

Record

Every request leaves a local record — what was sent, reused, observed, and charged.

How a request flows

One linear path, packed and recorded locally before it leaves your machine.

Request flow
  1. Raw context
  2. Packing Station
  3. Route
  4. Send
  5. Provider
  6. Record

Reuse feed: PAKs feed into the Packing Station — recall and reuse context instead of rebuilding it.

Gatehouse (Guard): spend and safety guardrails run as a side-channel overlay before Send.

Where TokenPak fits

TokenPak overlaps with tools you may already run — and adds a layer they don't. Run them together.

Where it fits
  1. Your agent

    Claude Code

    Cursor · Cline

  2. TokenPak (local)

    pack · measure

    guard · record

  3. Your gateway

    LiteLLM /

    OpenRouter / …

  4. Model provider

    Anthropic · OpenAI

    Google Gemini

Dispatch Center dashboard (preview) — orchestration for multi-step and multi-agent work, around the flow.

Tool type Overlaps on TokenPak adds Together?
Gateways / routers routing deterministic context packing + reusable PAKs Run both.
Observability tools measurement pre-send packing + per-request context-level attribution Run both.
MCP-based workflows ecosystem coordination a semantic contract (TIP) for packing, routing, cost, telemetry Composes.

Savings

Reduce the repeated context your agents re-send. Improve cache reuse. Receipt-backed measurement is coming from benchmarked workloads.

Open source & Pro

TokenPak's core is open source (Apache-2.0). Pro adds team dashboards, advanced routing, and enterprise controls.

Pro is delivered as the tokenpak-paid package via a separate index.

Latest release

Refreshed automatically on every release + on a daily safety-net schedule.

Release

TokenPak v1.10.0

v1.10.0 · Jun 28, 2026

latest

### Added - **TokenPak Dispatch graduates from preview to a released feature.** The `tokenpak dispatch` command (intake/routing, Decision Inbox, run-ledger lifecycle, observability) now ships in the released `pip install tokenpak` package — the Dispatch engine, its registry/schema data, and the user guide are included in the wheel. Delivery/receipt remain an explicit post-alpha preview (no live station execution wired yet).

Documentation highlights

Curated entry points sourced from tokenpak/docs.

starter

Installation

pip install options, OS notes, troubleshooting install failures.

starter

Claude Code guide

Route Claude Code through TokenPak with one environment variable.

starter

OpenAI SDK guide

Point the OpenAI Python SDK at TokenPak with OPENAI_BASE_URL.

starter

Anthropic SDK guide

Point the Anthropic Python SDK at TokenPak with ANTHROPIC_BASE_URL.

starter

Cline guide

Route Cline through TokenPak with a custom compatible base URL.

See savings in one command.

pip install tokenpak && tokenpak setup. No cloud component; credentials stay in your environment and provider flow.