Field notes

The studio,
out loud.

Lessons from shipping AI products, building with small teams, and the occasional strong opinion about software.

AllEngineering Strategy AI AI Development Development Healthcare AI Tools SaaS Leadership

01
Engineering
The AI productivity playbook: a real engineer's day
May 15, 20268 min read
02
Engineering
Claude Code + PostHog: analytics-aware development
May 14, 20267 min read
03
Engineering
Claude Code + Sentry: incident debugging as conversation
May 13, 20267 min read
04
Engineering
Claude Code + Supabase: a working integration via MCP
May 12, 20267 min read
05
Engineering
Effective MCP patterns: keeping AI tools safe at scale
May 11, 20268 min read
06
Engineering
MCP fundamentals: connecting your AI tools to your team's stack
May 8, 20268 min read
07
Engineering
Claude Code vs. Codex: which to reach for
May 7, 20267 min read
08
Engineering
Getting started with Codex: install to first real task
May 6, 20267 min read
09
Engineering
Getting started with Claude Code: install to first real task
May 5, 20268 min read
10
Engineering
AI tools for software engineers: a practical orientation
May 4, 20267 min read
11
Engineering
Determinism harnesses for non-deterministic systems
Apr 30, 20262 min read
12
Engineering
Multi-agent orchestration: from kitchen brigade to opera
Apr 30, 20263 min read
13
Engineering
Retry strategies that don't compound errors
Apr 30, 20263 min read
14
Engineering
Tech lead: PR reviews deeper than 'lgtm'
Apr 30, 20264 min read
15
Engineering
Your first MCP server (Node)
Apr 29, 20268 min read
16
Engineering
MCP error handling: tell the model what went wrong
Apr 29, 20262 min read
17
Engineering
Security: threat-model first draft from architecture
Apr 29, 20264 min read
18
Engineering
What makes an eval good
Apr 29, 20267 min read
19
Strategy
Sales: discovery summariser that keeps the human
Apr 28, 20265 min read
20
Engineering
Data: pipeline DAG explainer + drift detector
Apr 28, 20265 min read
21
Engineering
MCP for CI/CD: build-system tools as agent inputs
Apr 28, 20262 min read
22
Engineering
Trend evals vs. threshold evals
Apr 28, 20262 min read
23
Engineering
Backend: API design + endpoint scaffolding
Apr 27, 20269 min read
24
Engineering
Data: SQL refactors and lineage maps
Apr 27, 20265 min read
25
Engineering
Fall-back chains: cheap → expensive → human
Apr 27, 20263 min read
26
Engineering
Integration tests for AI features: contract or behavioural?
Apr 27, 20263 min read
27
Engineering
CI strategy: smoke vs. full suite for LLM apps
Apr 24, 20262 min read
28
Engineering
Self-consistency: when N=3 beats a smarter prompt
Apr 24, 20263 min read
29
Engineering
SRE: postmortem first drafts that don't blame
Apr 24, 20265 min read
30
Engineering
Tech writer: doc audits that catch what humans miss
Apr 24, 20264 min read
31
AI
Agents in government: constituent services with public-records care
Apr 23, 20265 min read
32
Engineering
Cost guardrails: stop runaway agents before billing does
Apr 23, 20266 min read
33
Engineering
End-to-end tests for AI workflows: scope and survival
Apr 23, 20262 min read
34
Engineering
MCP for actioning tools (PR creator, ticket closer)
Apr 23, 20262 min read
35
Engineering
Frontend: accessibility passes that finally get done
Apr 22, 20264 min read
36
Engineering
MCP and the Claude Code workflow specifically
Apr 22, 20262 min read
37
Engineering
Pairwise judges: A/B agreement at scale
Apr 22, 20262 min read
38
Engineering
Pinning model versions through provider migrations
Apr 22, 20262 min read
39
Engineering
Drift catchers: detecting style shifts
Apr 21, 20262 min read
40
Engineering
Eval CI: the pass/fail gate that's actually useful
Apr 21, 20262 min read
41
Engineering
Prompt invariance: prompts that survive paraphrase
Apr 21, 20263 min read
42
Engineering
Tool failure modes: timeouts, retries, idempotency
Apr 21, 20264 min read
43
Engineering
Context engineering: what to load, what to defer
Apr 20, 20264 min read
44
Engineering
Output validation: pydantic, zod, and friends in production
Apr 20, 20262 min read
45
Engineering
Versioning model + prompt as a unit
Apr 20, 20263 min read
46
AI Development
Why AI-First Development Matters for Modern SaaS Products
Apr 20, 20262 min read
47
AI
Agents in hospitality: reservations + recovery
Apr 17, 20265 min read
48
AI
Agents in HR: recruiting agents and the bias receipts they leave behind
Apr 17, 20265 min read
49
Engineering
Backend: database migrations without fear
Apr 17, 20265 min read
50
Engineering
ML: feature-store query rewrites
Apr 17, 20264 min read
51
Engineering
Building agents that explain themselves
Apr 16, 20263 min read
52
Engineering
Constrained decoding: the underrated lever
Apr 16, 20263 min read
53
Engineering
Mobile (Android): Compose rollout audits
Apr 16, 20264 min read
54
Engineering
Safety guardrails: refusal patterns that don't make agents useless
Apr 16, 20263 min read
55
Strategy
Finance: variance commentary that reads like the CFO wrote it
Apr 15, 20265 min read
56
Engineering
Confidence calibration: when 'I don't know' is the answer
Apr 15, 20263 min read
57
Engineering
Counter-example mining
Apr 15, 20263 min read
58
Engineering
The post-launch test plan: what runs forever
Apr 15, 20263 min read
59
Engineering
SRE: runbook generation that captures the response
Apr 15, 20265 min read
60
AI
Agents in legal: contract review with receipts
Apr 14, 20265 min read
61
AI
Agents on the factory floor
Apr 14, 20265 min read
62
Strategy
The hand-off contract — turning an AI employee from prototype to permanent
Apr 14, 20265 min read
63
Engineering
LLM evals are restaurant health inspections
Apr 14, 20264 min read
64
Engineering
Retiring an agent
Apr 14, 20263 min read
65
Engineering
Long-horizon tasks: keeping an agent on rails for hours
Apr 13, 20264 min read
66
Engineering
MCP authorization: per-user permissions
Apr 13, 20262 min read
67
Engineering
MCP composition: when one server should call another
Apr 13, 20262 min read
68
Engineering
MCP server versioning: shipping breaking changes safely
Apr 13, 20262 min read
69
Engineering
MCP transport: stdio vs. HTTP vs. SSE
Apr 13, 20262 min read
70
Engineering
Deploying agents in CI: scoped, audited, repeatable
Apr 10, 20267 min read
71
AI
Agents in media: news summary with a corrections workflow
Apr 10, 20264 min read
72
Engineering
Caching deterministic prefixes
Apr 10, 20263 min read
73
Engineering
Eval result storage and versioning
Apr 10, 20262 min read
74
Engineering
Tests for retrieval pipelines
Apr 10, 20262 min read
75
Engineering
Beyond MCP: tool-use specs in major models
Apr 9, 20262 min read
76
Engineering
Cost tests: catching the prompt that doubled spend
Apr 9, 20262 min read
77
Engineering
The judge pattern for confidence
Apr 9, 20263 min read
78
Engineering
MCP in 10 minutes
Apr 9, 20266 min read
79
Engineering
QA: test-plan generation from acceptance criteria
Apr 9, 20265 min read
80
Engineering
Versioning agent behaviour: prompts as source code
Apr 8, 20263 min read
81
Strategy
Founder ops: board-deck content from raw metrics
Apr 8, 20265 min read
82
Strategy
HR: performance-review draft assistant
Apr 8, 20265 min read
83
Strategy
EM: PR reviewer that flags scope creep
Apr 8, 20265 min read
84
Engineering
UX tests for AI-generated content
Apr 8, 20262 min read
85
Engineering
Agent observability: traces that tell you what happened
Apr 7, 20266 min read
86
AI
Agents in construction: estimator copilots in margin-thin work
Apr 7, 20264 min read
87
AI
Agents in energy: grid monitoring with a safety case
Apr 7, 20264 min read
88
Strategy
An AI employee isn't a bot — it's a teammate with a desk
Apr 7, 20269 min read
89
Engineering
Eval anti-patterns: when evals make products worse
Apr 7, 20263 min read
90
AI
Agents for non-profits: donor research on a tight budget
Apr 6, 20265 min read
91
Engineering
Browsing agents: scraping vs. structured tools
Apr 6, 20263 min read
92
Engineering
Eval-driven prompt iteration
Apr 6, 20262 min read
93
Engineering
Tool-use evals: right tool, right order
Apr 6, 20262 min read
94
Engineering
Voice-first agents: the latency budget you live within
Apr 6, 20263 min read
95
Engineering
Agent memory: what to write down, what to forget
Apr 3, 20263 min read
96
Engineering
Hallucination checks: cite-or-it-didn't-happen
Apr 3, 20263 min read
97
Engineering
MCP server observability
Apr 3, 20262 min read
98
Engineering
Prompt evolution: how agents get worse without you noticing
Apr 3, 20263 min read
99
Engineering
Red-teaming your own prompt
Apr 3, 20263 min read
100
Development
Building Electron Apps That Scale: Lessons from Healthcare Software
Apr 2, 20263 min read
101
Engineering
EM: 1:1 prep + roadmap sanity check
Apr 2, 20264 min read
102
Engineering
Frontend: component scaffolding + state machines
Apr 2, 20264 min read
103
Engineering
Full-stack: a real feature in an afternoon
Apr 2, 20265 min read
104
Engineering
Tests for tool-using agents: trace assertions
Apr 2, 20263 min read
105
Strategy
Product: synthesising 1,000 tickets into 7 themes
Apr 1, 20265 min read
106
Strategy
Sales: pipeline reviewer + forecast challenger
Apr 1, 20265 min read
107
Engineering
MCP authentication: tokens, scopes, OAuth
Apr 1, 20262 min read
108
Engineering
MCP server rate limits: the polite-rejection pattern
Apr 1, 20262 min read
109
Engineering
Property-based testing for LLM features
Apr 1, 20262 min read
110
Engineering
Building your first eval set from scratch
Mar 31, 20268 min read
111
Engineering
Evals for agents: trajectory + outcome
Mar 31, 20267 min read
112
Engineering
MCP and secrets management
Mar 31, 20262 min read
113
Engineering
MCP server hosting: local, sidecar, remote
Mar 31, 20262 min read
114
Engineering
MCP tool naming: making tools discoverable
Mar 31, 20262 min read
115
AI
Agents in telecom: diagnostics that route faster than tier-1
Mar 30, 20265 min read
116
Engineering
LLM-as-judge: when to trust it, when not
Mar 30, 20267 min read
117
Engineering
MCP for data tools (Postgres, BigQuery, S3)
Mar 30, 20262 min read
118
Healthcare
What medieval scribes teach us about AI scribes
Mar 30, 20265 min read
119
Engineering
Structured output: JSON mode, schemas, why one beats the other
Mar 30, 20267 min read
120
AI
Agents in sales: SDR copilots that don't get you blocked
Mar 27, 20265 min read
121
Engineering
Idempotency keys for LLM calls
Mar 27, 20263 min read
122
Engineering
OSS maintainer: triage + contributor-guide updates
Mar 27, 20264 min read
123
Engineering
Prompts are recipes, not spells
Mar 27, 20264 min read
124
Engineering
Why we need MCP at all
Mar 27, 20262 min read
125
Engineering
Human eval workflows: instructions that don't vary
Mar 26, 20262 min read
126
Engineering
Judging open-ended output without a rubric
Mar 26, 20262 min read
127
AI
MCP servers are USB-C for AI
Mar 26, 20265 min read
128
Engineering
MCP tool schemas: arg shapes that help
Mar 26, 20262 min read
129
Engineering
Regression cohorts: catching what evals miss
Mar 26, 20263 min read
130
AI
Agents in support: tier-1 deflection without tier-1 backlash
Mar 25, 20265 min read
131
AI
Agents in insurance: claims processing, speed vs. accuracy
Mar 25, 20264 min read
132
Engineering
Code-writing agents: the test-first discipline
Mar 25, 20263 min read
133
Engineering
Drift tests vs. functional tests: separate lanes
Mar 25, 20263 min read
134
Engineering
Plan vs. act: the agent loop everyone gets wrong
Mar 25, 20266 min read
135
AI
Agents in retail: shoppers that don't feel creepy
Mar 24, 20265 min read
136
Strategy
Comms: crisis-comms first-draft drafter
Mar 24, 20265 min read
137
Engineering
Privacy tests: PII redaction assertions
Mar 24, 20262 min read
138
AI
RAG is a public library (and Dewey was right)
Mar 24, 20264 min read
139
Engineering
Sub-agents: when 1+1 actually equals 2
Mar 24, 20264 min read
140
AI
Your AI agent should plan like a kitchen brigade
Mar 23, 20265 min read
141
Engineering
Calibrating your judge: meta-evals
Mar 23, 20262 min read
142
Strategy
Product: PRD draft from a discovery transcript
Mar 23, 20265 min read
143
Engineering
Security: code-pattern audits and CVE sweeps
Mar 23, 20264 min read
144
Engineering
Tool design: write tools the way you write APIs
Mar 23, 20268 min read
145
AI
Agents in logistics: route planning with a human in the loop
Mar 20, 20265 min read
146
Strategy
Comms: internal newsletter that captures the actual week
Mar 20, 20265 min read
147
Engineering
Golden-set discipline
Mar 20, 20263 min read
148
Engineering
Why probabilistic systems still need deterministic contracts
Mar 20, 20267 min read
149
Engineering
Refusal grammars: predictable, not surprising
Mar 20, 20263 min read
150
AI
Agents for research synthesis
Mar 19, 20265 min read
151
Engineering
MCP for internal tools (Linear, Notion, Slack analogues)
Mar 19, 20262 min read
152
Engineering
ML: eval harness from a spec
Mar 19, 20264 min read
153
Engineering
Multimodal agents: when adding vision actually helps
Mar 19, 20264 min read
154
Engineering
Test-data management for AI: synthetic vs. real
Mar 19, 20262 min read
155
AI
Agents in healthcare: scribe yes, nurse no
Mar 18, 20268 min read
156
Engineering
Behavioural assertions: testing 'should-ness'
Mar 18, 20262 min read
157
Engineering
Eval taxonomy: golden, behavioural, drift, safety
Mar 18, 20263 min read
158
Engineering
Evals for retrieval: separating retrieval from synthesis
Mar 18, 20262 min read
159
Engineering
Your first MCP server (Python)
Mar 18, 20262 min read
160
Engineering
Agent A/B tests: comparing without confusing your users
Mar 17, 20263 min read
161
AI Tools
Your AI coding assistant is a midwife, not a genius
Mar 17, 20265 min read
162
Strategy
CS: onboarding playbook generator per customer
Mar 17, 20265 min read
163
Engineering
The deterministic-envelope pattern
Mar 17, 20263 min read
164
Engineering
MCP and prompt injection: ambient instructions
Mar 17, 20262 min read
165
AI
Agents in finance: compliance with an audit trail
Mar 16, 20264 min read
166
Strategy
Recruiting: JD writer + screening-question generator
Mar 16, 20265 min read
167
Engineering
Few-shot drift: why golden examples poison new versions
Mar 16, 20263 min read
168
Engineering
The judge pattern: agents that grade other agents
Mar 16, 20264 min read
169
Engineering
PII in test fixtures: the boring legal slope
Mar 16, 20263 min read
170
Engineering
Architect: vendor-comparison architecture doc
Mar 13, 20263 min read
171
Strategy
Finance: AP-invoice auditor on a small ops team
Mar 13, 20265 min read
172
Strategy
Legal-ops: internal policy answerer with citation discipline
Mar 13, 20265 min read
173
Engineering
A senior engineer's day with Claude Code
Mar 13, 20269 min read
174
Engineering
Skills files: recipes the model can call
Mar 13, 20264 min read
175
Strategy
Marketing: SEO audits as a Claude Code workflow
Mar 12, 20265 min read
176
Engineering
Evals that survive a model bump
Mar 12, 20263 min read
177
Engineering
Managed agents: when to reach for them
Mar 12, 20264 min read
178
Engineering
Mock LLMs in tests: when to fake, when to call
Mar 12, 20263 min read
179
Engineering
The red set: adversarial cases you're allowed to fail
Mar 12, 20262 min read
180
AI
Agents in agriculture: yield prediction with weather data
Mar 11, 20265 min read
181
AI
Agents in pharma: literature review with citation discipline
Mar 11, 20265 min read
182
Engineering
The new test pyramid for AI products
Mar 11, 20267 min read
183
Engineering
Per-feature evals vs. per-model evals
Mar 11, 20262 min read
184
Engineering
Sampling production traffic for eval
Mar 11, 20262 min read
185
AI
In-product agents that earn renewal
Mar 10, 20265 min read
186
Strategy
Marketing: the campaign-brief copilot
Mar 10, 20265 min read
187
Strategy
Operations: the SOP writer that updates itself
Mar 10, 20265 min read
188
Engineering
Security tests: prompt-injection regression suite
Mar 10, 20262 min read
189
Engineering
Temperature, top-p, and the production tradeoff
Mar 10, 20263 min read
190
Strategy
HR: onboarding-buddy automation
Mar 9, 20265 min read
191
Strategy
EM: sprint-planning copilot for managers
Mar 9, 20265 min read
192
Strategy
Operations: vendor-comparison briefs in 30 minutes
Mar 9, 20265 min read
193
SaaS
From MVP to Scale: A Practical Guide for SaaS Founders
Mar 9, 20264 min read
194
Engineering
QA: flaky test triage at scale
Mar 9, 20265 min read
195
Engineering
DevOps: CI pipeline diagnosis at 2am
Mar 6, 20264 min read
196
Engineering
DevOps: Terraform refactor with a watchful copilot
Mar 6, 20265 min read
197
Engineering
The future of MCP
Mar 6, 20262 min read
198
Engineering
MCP testing: harnesses, fixtures, regressions
Mar 6, 20262 min read
199
Engineering
Output post-processors that don't hide the truth
Mar 6, 20263 min read
200
AI
Agents in education: tutor agents and the assessment problem
Mar 5, 20265 min read
201
Engineering
Authoring eval cases
Mar 5, 20262 min read
202
Strategy
Founder ops: investor-update auto-drafter
Mar 5, 20265 min read
203
Engineering
Snapshot tests: where they help, where they trap
Mar 5, 20262 min read
204
Engineering
Tests for streaming responses
Mar 5, 20262 min read
205
Engineering
Agent rollback: kill switches on day one
Mar 4, 20263 min read
206
AI
Agents in marketing: campaign agents and the brand-voice problem
Mar 4, 20265 min read
207
Engineering
Determinism for tool calls: keys, ordering, side-effects
Mar 4, 20262 min read
208
Engineering
Output diffing in CI
Mar 4, 20263 min read
209
Engineering
Reading an eval dashboard
Mar 4, 20262 min read
210
Engineering
Accessibility tests for AI surfaces
Mar 3, 20262 min read
211
AI
Agents in real estate: lead qualification at speed
Mar 3, 20264 min read
212
Engineering
Eval-driven development
Mar 3, 20263 min read
213
Engineering
Eval ownership in an org: PM, eng, or QA?
Mar 3, 20262 min read
214
Engineering
Performance tests: token budgets and latency SLAs
Mar 3, 20262 min read
215
AI
The agent maturity curve
Mar 2, 20269 min read
216
Engineering
Auto-generated eval cases from production logs
Mar 2, 20262 min read
217
Strategy
CS: renewal-risk scoring that explains itself
Mar 2, 20265 min read
218
Strategy
Recruiting: sourcing brief from a single role description
Mar 2, 20265 min read
219
Engineering
Eval cost management
Mar 2, 20262 min read
220
Engineering
Mobile (iOS): UIKit-to-SwiftUI translation
Mar 2, 20264 min read
221
AI
Agents in fashion: stylist yes, designer no
Feb 27, 20264 min read
222
Engineering
AI-native debugging: the rubber duck got smarter
Feb 26, 20264 min read
223
Engineering
Claude Code + Jira: standups without the standing
Feb 25, 20263 min read
224
AI
Small models are underrated: a case for boring infrastructure
Feb 24, 20264 min read
225
AI
Agents in gaming: from live-ops support to dynamic NPCs
Feb 23, 20264 min read
226
Engineering
Multi-model routing: the dispatcher pattern for LLMs
Feb 20, 20264 min read
227
Engineering
Claude Code + Linear: where work lives, the agent lives
Feb 19, 20263 min read
228
AI
Agents in music: the producer's new intern
Feb 18, 20264 min read
229
Engineering
Semantic caching: why your top 1% of queries cost 60% of your bill
Feb 17, 20264 min read
230
Engineering
Claude Code + Notion: docs become structured data
Feb 16, 20264 min read
231
AI
Agents in podcasting: editor yes, host no
Feb 13, 20264 min read
232
Engineering
AI cost attribution: a chargeback model for LLM spend
Feb 12, 20264 min read
233
Engineering
Claude Code + Slack: standups, escalations, and the back-channel
Feb 11, 20263 min read
234
AI
Agents in publishing: from slush pile to acquisition
Feb 10, 20264 min read
235
Engineering
AI latency budgets: borrowing from network engineering
Feb 9, 20264 min read
236
AI
Agents in libraries: Dewey's heir is silicon
Feb 6, 20264 min read
237
Engineering
AI feature flags: a model rollout looks like a deployment
Feb 5, 20264 min read
238
Engineering
Claude Code + Datadog: 2 a.m. is for the agent now
Feb 4, 20264 min read
239
AI
Agents in museums: catalog, conservation, and the curator
Feb 3, 20263 min read
240
Engineering
AI canary deployments: 1% traffic, 100% paranoia
Feb 2, 20264 min read
241
AI
Agents in veterinary medicine: scribe for the species that can't talk
Jan 30, 20264 min read
242
Engineering
Embedding model selection: the 5-minute decision tree
Jan 29, 20264 min read
243
Engineering
Claude Code + Stripe: revenue-aware development
Jan 28, 20264 min read
244
AI
Agents in aviation: copilot in the literal sense
Jan 27, 20264 min read
245
Engineering
Vector DB architecture: pgvector, managed, or homemade
Jan 26, 20264 min read
246
AI
Agents in maritime: the wheelhouse and the warehouse
Jan 23, 20264 min read
247
Engineering
RAG vs. fine-tuning: a 90% decision tree
Jan 22, 20264 min read
248
Engineering
Claude Code + Figma: design handoff in one prompt
Jan 21, 20264 min read
249
AI
Agents in utilities: the boring grid is the perfect customer
Jan 20, 20263 min read
250
Engineering
Token economics: what your unit cost actually is
Jan 19, 20264 min read
251
AI
Agents in mental health: triage yes, therapy no
Jan 16, 20264 min read
252
Engineering
AI incident response: the postmortem template you'll wish you had
Jan 15, 20264 min read
253
AI
AI for product managers: the new PRD looks like an eval set
Jan 14, 20264 min read
254
AI
Agents in fitness: a coach who reads your data
Jan 13, 20264 min read
255
Leadership
AI team scaling: 1, 3, and 10 engineers
Jan 12, 20265 min read
256
AI
Agents in dental practice: chart, code, and call
Jan 9, 20264 min read
257
Engineering
An AI-aware pull request template
Jan 8, 20265 min read
258
AI
AI for designers: from mood board to motion principles
Jan 7, 20264 min read
259
AI
Agents in architecture firms: drafting is the first 10%
Jan 6, 20264 min read
260
Engineering
Self-healing pipelines: the night shift you don't have to pay
Jan 5, 20264 min read
261
Engineering
Agent supervision loops: the OODA loop, re-implemented
Jan 2, 20264 min read
262
AI
AI for data scientists: notebooks are the new IDE
Dec 31, 20254 min read
263
Engineering
EU AI Act: what changes in your engineering process
Dec 30, 20254 min read
264
Leadership
AI for the CTO: a sixty-minute audit
Dec 29, 20254 min read
265
Engineering
HIPAA and AI: the BAA is the first conversation
Dec 26, 20254 min read
266
Leadership
Twelve AI procurement questions every buyer should ask
Dec 24, 20255 min read
267
Leadership
AI for founders: the 'is this a product?' filter
Dec 23, 20255 min read
268
AI
Agents in sports analytics: the scout's new tablet
Dec 22, 20254 min read
269
AI
AI and the symphony conductor: orchestration is older than software
Dec 19, 20254 min read
270
AI
AI and air traffic control: a 70-year-old playbook for safe autonomy
Dec 18, 20254 min read

The studio,out loud.

The AI productivity playbook: a real engineer's day

Claude Code + PostHog: analytics-aware development

Claude Code + Sentry: incident debugging as conversation

Claude Code + Supabase: a working integration via MCP

Effective MCP patterns: keeping AI tools safe at scale

MCP fundamentals: connecting your AI tools to your team's stack

Claude Code vs. Codex: which to reach for

Getting started with Codex: install to first real task

Getting started with Claude Code: install to first real task

AI tools for software engineers: a practical orientation

Determinism harnesses for non-deterministic systems

Multi-agent orchestration: from kitchen brigade to opera

Retry strategies that don't compound errors

Tech lead: PR reviews deeper than 'lgtm'

Your first MCP server (Node)

MCP error handling: tell the model what went wrong

Security: threat-model first draft from architecture

What makes an eval good

Sales: discovery summariser that keeps the human

Data: pipeline DAG explainer + drift detector

MCP for CI/CD: build-system tools as agent inputs

Trend evals vs. threshold evals

Backend: API design + endpoint scaffolding

Data: SQL refactors and lineage maps

Fall-back chains: cheap → expensive → human

Integration tests for AI features: contract or behavioural?

CI strategy: smoke vs. full suite for LLM apps

Self-consistency: when N=3 beats a smarter prompt

SRE: postmortem first drafts that don't blame

Tech writer: doc audits that catch what humans miss

Agents in government: constituent services with public-records care

Cost guardrails: stop runaway agents before billing does

End-to-end tests for AI workflows: scope and survival

MCP for actioning tools (PR creator, ticket closer)

Frontend: accessibility passes that finally get done

MCP and the Claude Code workflow specifically

Pairwise judges: A/B agreement at scale

Pinning model versions through provider migrations

Drift catchers: detecting style shifts

Eval CI: the pass/fail gate that's actually useful

Prompt invariance: prompts that survive paraphrase

Tool failure modes: timeouts, retries, idempotency

Context engineering: what to load, what to defer

Output validation: pydantic, zod, and friends in production

Versioning model + prompt as a unit

Why AI-First Development Matters for Modern SaaS Products

Agents in hospitality: reservations + recovery

Agents in HR: recruiting agents and the bias receipts they leave behind

Backend: database migrations without fear

ML: feature-store query rewrites

Building agents that explain themselves

Constrained decoding: the underrated lever

Mobile (Android): Compose rollout audits

Safety guardrails: refusal patterns that don't make agents useless

Finance: variance commentary that reads like the CFO wrote it

Confidence calibration: when 'I don't know' is the answer

Counter-example mining

The post-launch test plan: what runs forever

SRE: runbook generation that captures the response

Agents in legal: contract review with receipts

Agents on the factory floor

The hand-off contract — turning an AI employee from prototype to permanent

LLM evals are restaurant health inspections

Retiring an agent

Long-horizon tasks: keeping an agent on rails for hours

MCP authorization: per-user permissions

MCP composition: when one server should call another

MCP server versioning: shipping breaking changes safely

MCP transport: stdio vs. HTTP vs. SSE

Deploying agents in CI: scoped, audited, repeatable

Agents in media: news summary with a corrections workflow

Caching deterministic prefixes

Eval result storage and versioning

Tests for retrieval pipelines

Beyond MCP: tool-use specs in major models

Cost tests: catching the prompt that doubled spend

The judge pattern for confidence

MCP in 10 minutes

QA: test-plan generation from acceptance criteria

The studio,
out loud.