Source of truth — matches actual proxy_v4.py endpoints. API runs on the proxy port (default 8766).
Base URL: http://localhost:8766
Most endpoints require no authentication for local use.
The dashboard can be protected with an admin token (configure dashboard.require_token in config or set TOKENPAK_DASHBOARD_AUTH=true). Set the token via X-Admin-Token header when required.
POST /v1/*Route an LLM request through TokenPak. Supports OpenAI-compatible and Anthropic-compatible clients.
curl -X POST http://localhost:8766/v1/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $ANTHROPIC_API_KEY" \
-d '{"model":"claude-haiku-4-5","messages":[{"role":"user","content":"Hello"}],"max_tokens":100}'
TokenPak auto-detects the provider (Anthropic, OpenAI, Google) based on path, headers, and request body. The response is the standard provider response — no schema changes.
GET /healthLiveness check. Returns system status, module availability, and session stats.
curl http://localhost:8766/health
Response:
{
"status": "ok",
"compilation_mode": "full",
"vault_index": { "available": true, "blocks": 42, "path": "/..." },
"router": { "enabled": true },
"capsule_available": true,
"canon": { "enabled": true, "session_hits": 12 },
"skeleton": { "enabled": false },
"budget": { "enabled": true, "total_tokens": 200000 },
"strict_validation": true,
"upstream_timeout_seconds": 60,
"circuit_breakers": {
"anthropic": { "open": false, "failures": 0 }
},
"stats": { "requests": 47, "tokens_saved": 12400 }
}
GET /statsSession statistics and cost tracking.
curl http://localhost:8766/stats
Response:
{
"session": { "requests": 47, "tokens_saved": 12400, "cost_usd": 0.08 },
"compilation_mode": "full",
"vault_index": { "available": true, "blocks": 42 },
"router": { "enabled": true },
"today": { "requests": 47, "cost_usd": 0.08 },
"by_model": { "claude-haiku-4-5": { "requests": 30, "cost": 0.04 } },
"recent": [ ... ]
}
GET /stats/lastPer-request stats for the most recent request.
curl http://localhost:8766/stats/last
GET /stats/sessionCurrent session-level stats (tokens in, tokens out, savings).
curl http://localhost:8766/stats/session
GET /cache-statsCache performance metrics (hit rate, miss count, TTL info).
curl http://localhost:8766/cache-stats
GET /recentLast 50 requests with timing, model, token counts, and cost.
curl http://localhost:8766/recent
POST /ingestIngest a single request record into TokenPak’s telemetry.
curl -X POST http://localhost:8766/ingest \
-H "Content-Type: application/json" \
-d '{"model":"claude-haiku-4-5","tokens":1500,"cost":0.002}'
Required fields:
| Field | Type | Description |
|---|---|---|
model |
string | Model name (e.g. claude-haiku-4-5) |
tokens |
int ≥ 0 | Token count for this request |
cost |
float ≥ 0 | Cost in USD |
Optional fields: timestamp (ISO 8601), agent, session_id, tokens_saved, compression_rate
POST /ingest/batchIngest multiple records at once.
curl -X POST http://localhost:8766/ingest/batch \
-H "Content-Type: application/json" \
-d '[{"model":"claude-haiku-4-5","tokens":1500,"cost":0.002}, ...]'
GET /dashboardOpens the TokenPak web dashboard (HTML UI). Accessible in your browser at:
http://localhost:8766/dashboard
Shows token savings, cost trends, model usage, and session history.
/v1/* supports both streaming (SSE) and non-streaming requests./v1beta/* is also supported (for Google Gemini compatibility).