TokenPak Architecture v2 — Universal Content Compiler
Vision
One command turns any directory into LLM-ready context.
tokenpak index ~/company-data --budget 8000
TokenPak is not a text compressor. It’s a universal content compiler that processes any file type — documents, code, images, audio, video, datasets — into a versioned, compressed, budget-aware knowledge index that any LLM can consume.
The Problem (Enterprise)
Companies have:
- 50TB of docs, contracts, images, recordings, code, data
- $100K+/month in LLM API spend
- Engineers manually chunking PDFs, transcribing meetings, formatting context
- No provenance — can’t trace LLM answers back to source files
- No quality control — bad OCR and garbled transcriptions silently degrade output
Current solutions are point tools:
- QMD searches markdown. That’s it.
- LLMLingua compresses text. Nothing else.
- Whisper transcribes audio. One modality.
- Each requires separate integration, separate config, separate pipelines.
TokenPak: One pipeline that orchestrates all of them.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ INPUT: Any Directory │
│ /contracts /recordings /code /images /data /docs │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: PROCESSORS (Multimodal) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Text │ │ Images │ │ Audio │ │ Video │ │
│ │ Markdown │ │ Describe │ │ Whisper │ │Keyframes │ │
│ │ PDF text │ │ Embed │ │Transcribe│ │+Transcript│ │
│ │ Plaintext│ │ Dedupe │ │ Diarize │ │ Chapters │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Code │ │ PDFs │ │ JSON/CSV │ │ Archives│ │
│ │Tree-sitter│ │OCR+Digital│ │ Schema │ │ Recursive│ │
│ │Signatures│ │Multi-engine│ │+Sampling│ │ Extract │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ └──────┬─────┴──────┬─────┴──────┬─────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ QUALITY SCORER (per output) │ │
│ │ Score 0.0-1.0 | Route: auto/review/reject │ │
│ └─────────────────────────┬───────────────────────┘ │
└────────────────────────────┼────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: BLOCK REGISTRY (Novel IP) │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Content-hash each processed output (SHA-256) │ │
│ │ Version tracking: v1, v2, v3... │ │
│ │ Change detection: only reprocess modified files │ │
│ │ Wire format: compact binary/text blocks (.tokpak) │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ Registry: { "contracts/Q4-report.pdf": { │
│ hash: "a3f8c...", version: 3, type: "pdf-digital", │
│ quality: 0.94, tokens: 2340, compressed: 580, │
│ lastProcessed: "2026-02-20T17:00:00Z" }} │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: RETRIEVAL (QMD Integration) │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ QMD indexes all text blocks (BM25 + vectors) │ │
│ │ Hybrid search: keyword + semantic + LLM reranking │ │
│ │ Query expansion for better recall │ │
│ │ Context tree (hierarchical metadata) │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ tokenpak search "Q4 revenue projections" │
│ → Returns top-k relevant blocks from ANY file type │
│ → PDF page 47, meeting transcript 14:32, Slack thread │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 4: COMPRESSION ENGINE │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ LLMLingua-2: token-level filtering (MIT) │ │
│ │ SelectiveContext: sentence-level filtering │ │
│ │ Code: signature extraction (keep API, drop bodies)│ │
│ │ JSON: schema + sampling (keys+types, not full data│ │
│ │ Force tokens: never remove critical keywords │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 5: BUDGET ALLOCATOR (Novel IP) │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ QMD-weighted importance scoring per block │ │
│ │ Quadratic allocation (high-importance → more room) │ │
│ │ Minimum floor per category (prevent starvation) │ │
│ │ Dynamic redistribution (adapts to what's present) │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ Budget: 8000 tokens │
│ Block A (Q4 report, importance=10): 3,400 tokens │
│ Block B (meeting notes, importance=8): 2,200 tokens │
│ Block C (code ref, importance=5): 900 tokens │
│ Block D (old email, importance=2): 150 tokens │
│ Block E (tool schemas, importance=3): 350 tokens │
│ Overhead (wire format, refs): 1,000 tokens │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Layer 6: OUTPUT (Wire Format) │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ TOKPAK:1 │ │
│ │ BUDGET: {max_in:8000, used:7950} │ │
│ │ BLOCKS: [ │ │
│ │ {ref:"Q4-report#v3", quality:0.94, tokens:3400},│ │
│ │ {ref:"meeting-jan15#v1", quality:0.91, ...}, │ │
│ │ ] │ │
│ │ PROVENANCE: every fact traceable to source file │ │
│ │ OUTPUT_CONTRACT: PATCH | PLAN | ACTIONS | QUESTIONS│ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────┬──────────────────────────────────┘
│
▼
Any LLM Provider
(Anthropic, OpenAI, Google, local)
Component Stack
Open Source (Integrated, not built by us)
| Component |
Purpose |
License |
Why this one |
| QMD |
Text retrieval (BM25 + vectors + reranking) |
MIT |
Best local hybrid search, OpenClaw-native |
| LLMLingua-2 |
Token-level compression |
MIT |
SOTA 2-20x compression, <5% quality loss |
| SelectiveContext |
Sentence-level filtering |
MIT |
Preserves readability |
| Whisper |
Audio transcription |
MIT |
Industry standard, local |
| Tesseract |
OCR (primary) |
Apache 2.0 |
Most mature, widest language support |
| EasyOCR |
OCR (fallback) |
Apache 2.0 |
Better on handwriting |
| Surya |
OCR (fallback) |
GPL-3.0 |
Best on complex layouts |
| Tree-sitter |
Code parsing |
MIT |
Fast, accurate, 100+ languages |
| PaddleOCR |
OCR (CJK specialist) |
Apache 2.0 |
Best for Chinese/Japanese/Korean |
Novel IP (Built by us)
| Component |
Purpose |
Status |
| Block Registry |
Version, hash, track all processed content |
Spec complete |
| Wire Format |
Compact output format with provenance |
Spec complete |
| Budget Allocator |
Quadratic importance-weighted token distribution |
Designed |
| Quality Scorer |
Per-output confidence scoring with routing |
Designed |
| Review Queue |
Human-in-the-loop for low-confidence outputs |
Designed |
| Monitoring Proxy |
Real-time token tracking + cost estimation |
Working ✅ |
| Hybrid Calibration |
Static benchmark + bounded dynamic worker adjustment |
Working ✅ |
| Multi-engine Fallback |
Try Tesseract → EasyOCR → Surya → flag for review |
Designed |
Processing Pipeline (Per File Type)
Text (Markdown, plaintext, HTML)
File → detect encoding → extract text → QMD index → block registry
- Processing time: <1s
- Compression: 5-20x via LLMLingua-2
Code (Python, JS, TS, Go, Rust, etc.)
File → Tree-sitter parse → extract:
- imports/dependencies
- function/class signatures + docstrings
- type definitions
- drop function bodies
→ block registry
- Processing time: <1s
- Compression: 3-10x (keep API surface, drop implementation)
PDFs (Digital)
File → detect type (digital/scanned) → extract text → preserve tables/structure → QMD index → block registry
- Processing time: 1-5s
- Compression: 2-5x
PDFs (Scanned)
File → page images → OCR (multi-engine):
Engine 1 (Tesseract) → score
Engine 2 (EasyOCR) → score
Best score wins (or merge)
→ quality check:
>0.90: auto-accept
0.70-0.90: accept with warning
0.50-0.70: review queue
<0.50: reject / try next engine
→ QMD index → block registry
- Processing time: 5-30s per page
- Quality: tracked per-page with confidence scores
Images
File → resize to max 1024px → LLM describe (cache result) → dedupe check (perceptual hash) → block registry
- Processing time: 2-5s (LLM call for description)
- Compression: ∞ on repeats (serve cached description)
Audio
File → Whisper transcribe → speaker diarization → timestamp segments → QMD index → block registry
- Processing time: ~0.5x realtime (local Whisper)
- Compression: ∞ (process once, serve text forever)
Video
File → extract keyframes (1/scene) → Whisper on audio track → combine:
- Keyframe descriptions
- Timestamped transcript
- Chapter markers
→ QMD index → block registry
- Processing time: 1-2x realtime
- Compression: 50-100x (full video → text summary)
JSON/CSV
File → detect schema → extract:
- Column names + types
- First 5 rows (sampling)
- Row count, null rates
- For nested: keys at max depth 3
→ block registry
- Processing time: <1s
- Compression: 5-20x (schema, not data)
QMD Integration Architecture
TokenPak CLI
│
├── tokenpak index <dir>
│ │
│ ├── Walk directory
│ ├── Process each file (multimodal pipeline)
│ ├── Store processed blocks in registry
│ └── QMD indexes all text outputs
│ ├── qmd collection add <processed-output-dir>
│ ├── qmd embed (generate vectors)
│ └── qmd update (on file changes)
│
├── tokenpak search "query" --budget 8000
│ │
│ ├── QMD hybrid search (BM25 + vectors + reranking)
│ ├── Returns ranked blocks with scores
│ ├── Budget allocator distributes tokens
│ ├── Compression engine packs each block
│ └── Wire format output ready for LLM
│
└── tokenpak serve --port 8766
│
├── Transparent proxy mode (current, working)
├── Intercepts LLM requests
├── Injects relevant context from index
├── Tracks tokens, cost, latency
└── Syncs stats to dashboard
Budget Allocation: Quadratic Importance Weighting
Inspired by Quadratic Mean Diameter (QMD) from forestry — high-importance blocks get quadratically more space.
def allocate_budget(blocks, total_budget):
"""
Allocate token budget using quadratic importance weighting.
High-importance blocks get disproportionately more space.
"""
# Minimum floor: every block gets at least 3% of budget
floor = 0.03 * total_budget
remaining = total_budget - (floor * len(blocks))
# Square importance scores
squared = {b: b.importance ** 2 for b in blocks}
total_sq = sum(squared.values())
# Distribute remaining budget proportionally to squared importance
allocation = {}
for block, sq in squared.items():
allocation[block] = floor + (sq / total_sq) * remaining
return allocation
Importance Scoring
| Signal |
Weight |
Source |
| QMD relevance score |
0.40 |
QMD search result |
| Recency |
0.20 |
File modification date |
| Source quality |
0.15 |
Quality scorer output |
| User preference |
0.15 |
Explicit pins/priorities |
| File type authority |
0.10 |
Contracts > chat logs |
Competitive Positioning
What exists (point solutions)
| Tool |
What it does |
What it doesn’t do |
| QMD |
Text search (BM25 + vectors) |
Images, audio, video, code, PDFs, compression |
| LLMLingua |
Token compression |
Retrieval, multimodal, versioning, quality |
| Whisper |
Audio transcription |
Everything else |
| LlamaIndex |
RAG framework |
Local processing, compression, quality scoring |
| Anthropic cache |
Provider-specific caching |
Cross-provider, multimodal, retrieval |
What TokenPak does (unified pipeline)
All of the above, orchestrated:
- QMD for retrieval ✅
- LLMLingua for compression ✅
- Whisper for audio ✅
- Multi-engine OCR for PDFs ✅
- Tree-sitter for code ✅
- Quality scoring across all modalities ✅
- Block versioning (novel) ✅
- Budget allocation (novel) ✅
- Wire format with provenance (novel) ✅
- Cost monitoring (working today) ✅
The pitch: “One CLI that turns any directory into optimized LLM context. Local, private, cross-provider.”
Revenue Model
| Tier |
Price |
Includes |
| Open Source |
Free (MIT) |
CLI, text processing, basic compression, QMD integration |
| Pro |
$99/mo |
Multimodal processing, monitoring dashboard, priority support |
| Enterprise |
$999/mo+ |
Quality review queue, SLA, custom integrations, audit trails |
| Usage-based |
$0.10-0.50 per 1M tokens processed |
Alternative for high-volume users |
90-Day Revised Roadmap
Month 1: Core Pipeline (Weeks 1-4)
| Week |
Deliverable |
| 1 |
tokenpak index — directory walker + text processing + block registry |
| 2 |
QMD integration — auto-index, search, budget allocation |
| 3 |
tokenpak search — query → retrieve → compress → output |
| 4 |
Code processing (Tree-sitter) + JSON/CSV schema extraction |
Month 2: Multimodal + Quality (Weeks 5-8)
| Week |
Deliverable |
| 5 |
PDF processing (digital + OCR with multi-engine fallback) |
| 6 |
Quality scoring + review queue |
| 7 |
Audio (Whisper) + image (LLM describe + cache) |
| 8 |
Video (keyframes + transcript) |
Month 3: Launch (Weeks 9-12)
| Week |
Deliverable |
| 9 |
Monitoring dashboard (web UI) |
| 10 |
npm + PyPI package publishing |
| 11 |
Documentation, examples, benchmarks |
| 12 |
Open source launch (GitHub, Hacker News, Discord) |
Agent Support
| Agent |
Role |
| Kevin |
Architecture decisions, positioning, launch strategy |
| Cali |
Core implementation, multimodal processors, QMD integration |
| Trix |
Proxy/monitoring, benchmarks, documentation, DevOps |
| Sue |
Task management, QA, community prep |
Success Metrics
Phase 1 (Month 1) — Validation
Phase 2 (Month 2) — Multimodal
Phase 3 (Month 3) — Launch
Benchmark Data (Collected)
| Test |
Avg tokens/request |
Reduction |
Date |
| Baseline (no optimization) |
20,801 |
— |
2026-02-20 |
| QMD only |
6,136 |
70.5% |
2026-02-20 |
| TokenPak compression only |
TBD |
Pending A/B run |
Pending |
| QMD + TokenPak hybrid |
3,265 |
84.3% vs baseline (≈43% extra vs QMD-only) |
2026-02-21 |
- Indexing speedup: 55.27x vs baseline (572-file vault)
- Throughput: ~2,738 files/sec
- Token count cache speedup: 26.6x
- Search latency: ~22.7ms/query
Architecture v2: 2026-02-20
Authors: Trix 🐰 + Kevin