Chirag Patnaik
Search
⌘K
Assets
Website
Manifesto
Substack
Portfolio
GitHub
☰ Menu
ML Explainers
33
Architecture & Attention
Attention
Flash Attention
RoPE
Mamba / SSM
Mixture-of-Experts
Training
LoRA & QLoRA
FSDP / ZeRO
Gradient checkpointing
Muon
RLHF
Preference optimization
Constitutional AI
Inference & Serving
Prefill vs decode
vLLM
rvLLM
Speculative decoding
EAGLE & Medusa
Chunked prefill
Prefix caching
Sliding window & KV compression
Lookahead decoding
Disaggregated P/D
Reasoning & Search
Chain-of-thought
Self-consistency & Best-of-N
Tree of Thought
MCTS for LLMs
Test-time compute scaling
Representation & Generation
CLIP
Diffusion
t-SNE & UMAP
Other
RAG
EGGROLL
RLM
↑
↓
navigate
↵
open
Esc
close