Awesome Agentic Evaluation: A Curated Guide to Benchmarking AI Agents
A walkthrough of the landscape of agentic evaluation: benchmarks, tooling, design patterns, and best practices for measuring how well AI agents actually work.
Notes on AI systems, optimization, and practical GenAI engineering.
A walkthrough of the landscape of agentic evaluation: benchmarks, tooling, design patterns, and best practices for measuring how well AI agents actually work.
A comprehensive technical walkthrough of Stanford CS336 Spring 2025: lectures, assignments, and the core engineering lessons behind building language models from scratch.
A practical high-level guide to RLHF: what it is, why it exists, and how it works.