Index — all writing
Blog
Notes on building reliable AI systems, platform engineering, and software architecture — written from inside real systems, not slideware.
Two Coding Agents, One Project Brief
I run Claude Code and Codex over the same repo, and each wants its own instructions file. Maintaining two by hand means they drift, and a stale project brief is worse than none. The fix is boring, and it's a symlink.
How the Creator of Claude Code Actually Uses It
Boris Cherny built Claude Code, and the way he uses it isn't about clever settings. It's about running it like a small team instead of a chat window. The handful of habits behind that, and the ones I've adopted.
Your AI Tools Start From Zero Every Time. They Don't Have To.
AI coding tools forget everything between sessions, so they repeat mistakes and relearn your conventions over and over. Here's the learnings loop that fixes it, and exactly how I wired it into Claude Code on this blog.
Putting AI in Front of a Platform: Lessons from Real Systems
What I've learned putting LLMs and agents on top of a Salesforce platform, where the data has rules you don't get to ignore and a wrong answer lands in the system the business runs on.
Designing Reliable AI Agents on Top of Enterprise Platforms
An agent that can change records on a system of record is powerful and risky in equal measure. The guardrails I rely on — acting as the user, idempotent actions, a narrow toolset, human checkpoints, and real logging — to let one run safely.
Building Reliable LLM Features: What Production Actually Demands
An LLM feature is easy to demo and hard to trust. These are the practical habits — validating output, measuring quality, versioning prompts, and planning for wrong answers — that I rely on to make one hold up with real users.
Non-Functional Requirements for AI Systems: What Staff Engineers Should Specify
Most teams spec what an AI feature should do and skip how well it has to do it. The non-functional requirements — accuracy, latency, cost, fallback, observability, governance — that decide whether it's actually production-ready.
How I Evaluate LLM Output Without a Ground-Truth Dataset
You almost never have labeled data when you ship an AI feature. A practical way to measure quality anyway — a small hand-built set, plain assertions, a checked model-as-judge, and the production signals you already have.