Frameworks, protocols, and tools for building intelligent systems. PDF documents for reference, HTML for interactive viewing.
Three formats dominate the quantized model landscape. GGUF packs everything in one self-describing binary. AWQ stores weights as safetensors for GPU serving. EXL2 adds per-column error maps for maximum quality per bit. This article opens all three.
An end-to-end look at production LLM inference: request handling, tokenization, prefill, KV cache management, decode, schedulers, quantization, context windows, streaming APIs, and the serving stack behind real latency.
A systematic framework for LLM agents to tackle hard optimization and debugging problems. Covers theoretical floors, structured worklogs, agent architecture, bottleneck hierarchy, the wall protocol, optimization patterns, and SQLite-based memory systems.
A framework for LLM agents to systematically explore and understand complex systems, codebases, and problem spaces. Covers uncertainty mapping, exploration journals, agent roles (Surveyor, Diver, Tracer, Synthesizer, Challenger), and memory systems.
An autonomous scientific discovery engine that extracts, validates, and synthesizes claims from ML research papers. Builds a knowledge graph with regime-gated edges and mines it for testable hypotheses using anti-hype scoring.
A soccer prediction system combining Elo ratings, Bayesian calibration, and Monte Carlo simulation. Features walk-forward backtesting, multi-league support, season odds simulation, and evidence-based betting strategy analysis.