Blog

Notes on dev, creative code, research, and whatever else catches my attention.

What I learned making Pando
Nuances of interpretability evaluations - and why no single method tells the full story.
Apr 15, 20263 min readMLInterpretabilityPython
Benchmark crisis is coming for all of us
Some thoughts on ML research in the age of agents - and why static benchmarks are dying.
Apr 10, 20263 min readMLAgentsResearch