Category · Engineering

Field notes in
Engineering.

186 articles in this category — part of the Jaypore Labs journal.

01
Engineering
The AI productivity playbook: a real engineer's day
May 15, 20268 min read
02
Engineering
Claude Code + PostHog: analytics-aware development
May 14, 20267 min read
03
Engineering
Claude Code + Sentry: incident debugging as conversation
May 13, 20267 min read
04
Engineering
Claude Code + Supabase: a working integration via MCP
May 12, 20267 min read
05
Engineering
Effective MCP patterns: keeping AI tools safe at scale
May 11, 20268 min read
06
Engineering
MCP fundamentals: connecting your AI tools to your team's stack
May 8, 20268 min read
07
Engineering
Claude Code vs. Codex: which to reach for
May 7, 20267 min read
08
Engineering
Getting started with Codex: install to first real task
May 6, 20267 min read
09
Engineering
Getting started with Claude Code: install to first real task
May 5, 20268 min read
10
Engineering
AI tools for software engineers: a practical orientation
May 4, 20267 min read
11
Engineering
Determinism harnesses for non-deterministic systems
Apr 30, 20262 min read
12
Engineering
Multi-agent orchestration: from kitchen brigade to opera
Apr 30, 20263 min read
13
Engineering
Retry strategies that don't compound errors
Apr 30, 20263 min read
14
Engineering
Tech lead: PR reviews deeper than 'lgtm'
Apr 30, 20264 min read
15
Engineering
Your first MCP server (Node)
Apr 29, 20268 min read
16
Engineering
MCP error handling: tell the model what went wrong
Apr 29, 20262 min read
17
Engineering
Security: threat-model first draft from architecture
Apr 29, 20264 min read
18
Engineering
What makes an eval good
Apr 29, 20267 min read
19
Engineering
Data: pipeline DAG explainer + drift detector
Apr 28, 20265 min read
20
Engineering
MCP for CI/CD: build-system tools as agent inputs
Apr 28, 20262 min read
21
Engineering
Trend evals vs. threshold evals
Apr 28, 20262 min read
22
Engineering
Backend: API design + endpoint scaffolding
Apr 27, 20269 min read
23
Engineering
Data: SQL refactors and lineage maps
Apr 27, 20265 min read
24
Engineering
Fall-back chains: cheap → expensive → human
Apr 27, 20263 min read
25
Engineering
Integration tests for AI features: contract or behavioural?
Apr 27, 20263 min read
26
Engineering
CI strategy: smoke vs. full suite for LLM apps
Apr 24, 20262 min read
27
Engineering
Self-consistency: when N=3 beats a smarter prompt
Apr 24, 20263 min read
28
Engineering
SRE: postmortem first drafts that don't blame
Apr 24, 20265 min read
29
Engineering
Tech writer: doc audits that catch what humans miss
Apr 24, 20264 min read
30
Engineering
Cost guardrails: stop runaway agents before billing does
Apr 23, 20266 min read
31
Engineering
End-to-end tests for AI workflows: scope and survival
Apr 23, 20262 min read
32
Engineering
MCP for actioning tools (PR creator, ticket closer)
Apr 23, 20262 min read
33
Engineering
Frontend: accessibility passes that finally get done
Apr 22, 20264 min read
34
Engineering
MCP and the Claude Code workflow specifically
Apr 22, 20262 min read
35
Engineering
Pairwise judges: A/B agreement at scale
Apr 22, 20262 min read
36
Engineering
Pinning model versions through provider migrations
Apr 22, 20262 min read
37
Engineering
Drift catchers: detecting style shifts
Apr 21, 20262 min read
38
Engineering
Eval CI: the pass/fail gate that's actually useful
Apr 21, 20262 min read
39
Engineering
Prompt invariance: prompts that survive paraphrase
Apr 21, 20263 min read
40
Engineering
Tool failure modes: timeouts, retries, idempotency
Apr 21, 20264 min read
41
Engineering
Context engineering: what to load, what to defer
Apr 20, 20264 min read
42
Engineering
Output validation: pydantic, zod, and friends in production
Apr 20, 20262 min read
43
Engineering
Versioning model + prompt as a unit
Apr 20, 20263 min read
44
Engineering
Backend: database migrations without fear
Apr 17, 20265 min read
45
Engineering
ML: feature-store query rewrites
Apr 17, 20264 min read
46
Engineering
Building agents that explain themselves
Apr 16, 20263 min read
47
Engineering
Constrained decoding: the underrated lever
Apr 16, 20263 min read
48
Engineering
Mobile (Android): Compose rollout audits
Apr 16, 20264 min read
49
Engineering
Safety guardrails: refusal patterns that don't make agents useless
Apr 16, 20263 min read
50
Engineering
Confidence calibration: when 'I don't know' is the answer
Apr 15, 20263 min read
51
Engineering
Counter-example mining
Apr 15, 20263 min read
52
Engineering
The post-launch test plan: what runs forever
Apr 15, 20263 min read
53
Engineering
SRE: runbook generation that captures the response
Apr 15, 20265 min read
54
Engineering
LLM evals are restaurant health inspections
Apr 14, 20264 min read
55
Engineering
Retiring an agent
Apr 14, 20263 min read
56
Engineering
Long-horizon tasks: keeping an agent on rails for hours
Apr 13, 20264 min read
57
Engineering
MCP authorization: per-user permissions
Apr 13, 20262 min read
58
Engineering
MCP composition: when one server should call another
Apr 13, 20262 min read
59
Engineering
MCP server versioning: shipping breaking changes safely
Apr 13, 20262 min read
60
Engineering
MCP transport: stdio vs. HTTP vs. SSE
Apr 13, 20262 min read
61
Engineering
Deploying agents in CI: scoped, audited, repeatable
Apr 10, 20267 min read
62
Engineering
Caching deterministic prefixes
Apr 10, 20263 min read
63
Engineering
Eval result storage and versioning
Apr 10, 20262 min read
64
Engineering
Tests for retrieval pipelines
Apr 10, 20262 min read
65
Engineering
Beyond MCP: tool-use specs in major models
Apr 9, 20262 min read
66
Engineering
Cost tests: catching the prompt that doubled spend
Apr 9, 20262 min read
67
Engineering
The judge pattern for confidence
Apr 9, 20263 min read
68
Engineering
MCP in 10 minutes
Apr 9, 20266 min read
69
Engineering
QA: test-plan generation from acceptance criteria
Apr 9, 20265 min read
70
Engineering
Versioning agent behaviour: prompts as source code
Apr 8, 20263 min read
71
Engineering
UX tests for AI-generated content
Apr 8, 20262 min read
72
Engineering
Agent observability: traces that tell you what happened
Apr 7, 20266 min read
73
Engineering
Eval anti-patterns: when evals make products worse
Apr 7, 20263 min read
74
Engineering
Browsing agents: scraping vs. structured tools
Apr 6, 20263 min read
75
Engineering
Eval-driven prompt iteration
Apr 6, 20262 min read
76
Engineering
Tool-use evals: right tool, right order
Apr 6, 20262 min read
77
Engineering
Voice-first agents: the latency budget you live within
Apr 6, 20263 min read
78
Engineering
Agent memory: what to write down, what to forget
Apr 3, 20263 min read
79
Engineering
Hallucination checks: cite-or-it-didn't-happen
Apr 3, 20263 min read
80
Engineering
MCP server observability
Apr 3, 20262 min read
81
Engineering
Prompt evolution: how agents get worse without you noticing
Apr 3, 20263 min read
82
Engineering
Red-teaming your own prompt
Apr 3, 20263 min read
83
Engineering
EM: 1:1 prep + roadmap sanity check
Apr 2, 20264 min read
84
Engineering
Frontend: component scaffolding + state machines
Apr 2, 20264 min read
85
Engineering
Full-stack: a real feature in an afternoon
Apr 2, 20265 min read
86
Engineering
Tests for tool-using agents: trace assertions
Apr 2, 20263 min read
87
Engineering
MCP authentication: tokens, scopes, OAuth
Apr 1, 20262 min read
88
Engineering
MCP server rate limits: the polite-rejection pattern
Apr 1, 20262 min read
89
Engineering
Property-based testing for LLM features
Apr 1, 20262 min read
90
Engineering
Building your first eval set from scratch
Mar 31, 20268 min read
91
Engineering
Evals for agents: trajectory + outcome
Mar 31, 20267 min read
92
Engineering
MCP and secrets management
Mar 31, 20262 min read
93
Engineering
MCP server hosting: local, sidecar, remote
Mar 31, 20262 min read
94
Engineering
MCP tool naming: making tools discoverable
Mar 31, 20262 min read
95
Engineering
LLM-as-judge: when to trust it, when not
Mar 30, 20267 min read
96
Engineering
MCP for data tools (Postgres, BigQuery, S3)
Mar 30, 20262 min read
97
Engineering
Structured output: JSON mode, schemas, why one beats the other
Mar 30, 20267 min read
98
Engineering
Idempotency keys for LLM calls
Mar 27, 20263 min read
99
Engineering
OSS maintainer: triage + contributor-guide updates
Mar 27, 20264 min read
100
Engineering
Prompts are recipes, not spells
Mar 27, 20264 min read
101
Engineering
Why we need MCP at all
Mar 27, 20262 min read
102
Engineering
Human eval workflows: instructions that don't vary
Mar 26, 20262 min read
103
Engineering
Judging open-ended output without a rubric
Mar 26, 20262 min read
104
Engineering
MCP tool schemas: arg shapes that help
Mar 26, 20262 min read
105
Engineering
Regression cohorts: catching what evals miss
Mar 26, 20263 min read
106
Engineering
Code-writing agents: the test-first discipline
Mar 25, 20263 min read
107
Engineering
Drift tests vs. functional tests: separate lanes
Mar 25, 20263 min read
108
Engineering
Plan vs. act: the agent loop everyone gets wrong
Mar 25, 20266 min read
109
Engineering
Privacy tests: PII redaction assertions
Mar 24, 20262 min read
110
Engineering
Sub-agents: when 1+1 actually equals 2
Mar 24, 20264 min read
111
Engineering
Calibrating your judge: meta-evals
Mar 23, 20262 min read
112
Engineering
Security: code-pattern audits and CVE sweeps
Mar 23, 20264 min read
113
Engineering
Tool design: write tools the way you write APIs
Mar 23, 20268 min read
114
Engineering
Golden-set discipline
Mar 20, 20263 min read
115
Engineering
Why probabilistic systems still need deterministic contracts
Mar 20, 20267 min read
116
Engineering
Refusal grammars: predictable, not surprising
Mar 20, 20263 min read
117
Engineering
MCP for internal tools (Linear, Notion, Slack analogues)
Mar 19, 20262 min read
118
Engineering
ML: eval harness from a spec
Mar 19, 20264 min read
119
Engineering
Multimodal agents: when adding vision actually helps
Mar 19, 20264 min read
120
Engineering
Test-data management for AI: synthetic vs. real
Mar 19, 20262 min read
121
Engineering
Behavioural assertions: testing 'should-ness'
Mar 18, 20262 min read
122
Engineering
Eval taxonomy: golden, behavioural, drift, safety
Mar 18, 20263 min read
123
Engineering
Evals for retrieval: separating retrieval from synthesis
Mar 18, 20262 min read
124
Engineering
Your first MCP server (Python)
Mar 18, 20262 min read
125
Engineering
Agent A/B tests: comparing without confusing your users
Mar 17, 20263 min read
126
Engineering
The deterministic-envelope pattern
Mar 17, 20263 min read
127
Engineering
MCP and prompt injection: ambient instructions
Mar 17, 20262 min read
128
Engineering
Few-shot drift: why golden examples poison new versions
Mar 16, 20263 min read
129
Engineering
The judge pattern: agents that grade other agents
Mar 16, 20264 min read
130
Engineering
PII in test fixtures: the boring legal slope
Mar 16, 20263 min read
131
Engineering
Architect: vendor-comparison architecture doc
Mar 13, 20263 min read
132
Engineering
A senior engineer's day with Claude Code
Mar 13, 20269 min read
133
Engineering
Skills files: recipes the model can call
Mar 13, 20264 min read
134
Engineering
Evals that survive a model bump
Mar 12, 20263 min read
135
Engineering
Managed agents: when to reach for them
Mar 12, 20264 min read
136
Engineering
Mock LLMs in tests: when to fake, when to call
Mar 12, 20263 min read
137
Engineering
The red set: adversarial cases you're allowed to fail
Mar 12, 20262 min read
138
Engineering
The new test pyramid for AI products
Mar 11, 20267 min read
139
Engineering
Per-feature evals vs. per-model evals
Mar 11, 20262 min read
140
Engineering
Sampling production traffic for eval
Mar 11, 20262 min read
141
Engineering
Security tests: prompt-injection regression suite
Mar 10, 20262 min read
142
Engineering
Temperature, top-p, and the production tradeoff
Mar 10, 20263 min read
143
Engineering
QA: flaky test triage at scale
Mar 9, 20265 min read
144
Engineering
DevOps: CI pipeline diagnosis at 2am
Mar 6, 20264 min read
145
Engineering
DevOps: Terraform refactor with a watchful copilot
Mar 6, 20265 min read
146
Engineering
The future of MCP
Mar 6, 20262 min read
147
Engineering
MCP testing: harnesses, fixtures, regressions
Mar 6, 20262 min read
148
Engineering
Output post-processors that don't hide the truth
Mar 6, 20263 min read
149
Engineering
Authoring eval cases
Mar 5, 20262 min read
150
Engineering
Snapshot tests: where they help, where they trap
Mar 5, 20262 min read
151
Engineering
Tests for streaming responses
Mar 5, 20262 min read
152
Engineering
Agent rollback: kill switches on day one
Mar 4, 20263 min read
153
Engineering
Determinism for tool calls: keys, ordering, side-effects
Mar 4, 20262 min read
154
Engineering
Output diffing in CI
Mar 4, 20263 min read
155
Engineering
Reading an eval dashboard
Mar 4, 20262 min read
156
Engineering
Accessibility tests for AI surfaces
Mar 3, 20262 min read
157
Engineering
Eval-driven development
Mar 3, 20263 min read
158
Engineering
Eval ownership in an org: PM, eng, or QA?
Mar 3, 20262 min read
159
Engineering
Performance tests: token budgets and latency SLAs
Mar 3, 20262 min read
160
Engineering
Auto-generated eval cases from production logs
Mar 2, 20262 min read
161
Engineering
Eval cost management
Mar 2, 20262 min read
162
Engineering
Mobile (iOS): UIKit-to-SwiftUI translation
Mar 2, 20264 min read
163
Engineering
AI-native debugging: the rubber duck got smarter
Feb 26, 20264 min read
164
Engineering
Claude Code + Jira: standups without the standing
Feb 25, 20263 min read
165
Engineering
Multi-model routing: the dispatcher pattern for LLMs
Feb 20, 20264 min read
166
Engineering
Claude Code + Linear: where work lives, the agent lives
Feb 19, 20263 min read
167
Engineering
Semantic caching: why your top 1% of queries cost 60% of your bill
Feb 17, 20264 min read
168
Engineering
Claude Code + Notion: docs become structured data
Feb 16, 20264 min read
169
Engineering
AI cost attribution: a chargeback model for LLM spend
Feb 12, 20264 min read
170
Engineering
Claude Code + Slack: standups, escalations, and the back-channel
Feb 11, 20263 min read
171
Engineering
AI latency budgets: borrowing from network engineering
Feb 9, 20264 min read
172
Engineering
AI feature flags: a model rollout looks like a deployment
Feb 5, 20264 min read
173
Engineering
Claude Code + Datadog: 2 a.m. is for the agent now
Feb 4, 20264 min read
174
Engineering
AI canary deployments: 1% traffic, 100% paranoia
Feb 2, 20264 min read
175
Engineering
Embedding model selection: the 5-minute decision tree
Jan 29, 20264 min read
176
Engineering
Claude Code + Stripe: revenue-aware development
Jan 28, 20264 min read
177
Engineering
Vector DB architecture: pgvector, managed, or homemade
Jan 26, 20264 min read
178
Engineering
RAG vs. fine-tuning: a 90% decision tree
Jan 22, 20264 min read
179
Engineering
Claude Code + Figma: design handoff in one prompt
Jan 21, 20264 min read
180
Engineering
Token economics: what your unit cost actually is
Jan 19, 20264 min read
181
Engineering
AI incident response: the postmortem template you'll wish you had
Jan 15, 20264 min read
182
Engineering
An AI-aware pull request template
Jan 8, 20265 min read
183
Engineering
Self-healing pipelines: the night shift you don't have to pay
Jan 5, 20264 min read
184
Engineering
Agent supervision loops: the OODA loop, re-implemented
Jan 2, 20264 min read
185
Engineering
EU AI Act: what changes in your engineering process
Dec 30, 20254 min read
186
Engineering
HIPAA and AI: the BAA is the first conversation
Dec 26, 20254 min read

← Back to all posts

Field notes inEngineering.

The AI productivity playbook: a real engineer's day

Claude Code + PostHog: analytics-aware development

Claude Code + Sentry: incident debugging as conversation

Claude Code + Supabase: a working integration via MCP

Effective MCP patterns: keeping AI tools safe at scale

MCP fundamentals: connecting your AI tools to your team's stack

Claude Code vs. Codex: which to reach for

Getting started with Codex: install to first real task

Getting started with Claude Code: install to first real task

AI tools for software engineers: a practical orientation

Determinism harnesses for non-deterministic systems

Multi-agent orchestration: from kitchen brigade to opera

Retry strategies that don't compound errors

Tech lead: PR reviews deeper than 'lgtm'

Your first MCP server (Node)

MCP error handling: tell the model what went wrong

Security: threat-model first draft from architecture

What makes an eval good

Data: pipeline DAG explainer + drift detector

MCP for CI/CD: build-system tools as agent inputs

Trend evals vs. threshold evals

Backend: API design + endpoint scaffolding

Data: SQL refactors and lineage maps

Fall-back chains: cheap → expensive → human

Integration tests for AI features: contract or behavioural?

CI strategy: smoke vs. full suite for LLM apps

Self-consistency: when N=3 beats a smarter prompt

SRE: postmortem first drafts that don't blame

Tech writer: doc audits that catch what humans miss

Cost guardrails: stop runaway agents before billing does

End-to-end tests for AI workflows: scope and survival

MCP for actioning tools (PR creator, ticket closer)

Frontend: accessibility passes that finally get done

MCP and the Claude Code workflow specifically

Pairwise judges: A/B agreement at scale

Pinning model versions through provider migrations

Drift catchers: detecting style shifts

Eval CI: the pass/fail gate that's actually useful

Prompt invariance: prompts that survive paraphrase

Tool failure modes: timeouts, retries, idempotency

Context engineering: what to load, what to defer

Output validation: pydantic, zod, and friends in production

Versioning model + prompt as a unit

Backend: database migrations without fear

ML: feature-store query rewrites

Building agents that explain themselves

Constrained decoding: the underrated lever

Mobile (Android): Compose rollout audits

Safety guardrails: refusal patterns that don't make agents useless

Confidence calibration: when 'I don't know' is the answer

Counter-example mining

The post-launch test plan: what runs forever

SRE: runbook generation that captures the response

LLM evals are restaurant health inspections

Retiring an agent

Long-horizon tasks: keeping an agent on rails for hours

MCP authorization: per-user permissions

MCP composition: when one server should call another

MCP server versioning: shipping breaking changes safely

MCP transport: stdio vs. HTTP vs. SSE

Deploying agents in CI: scoped, audited, repeatable

Caching deterministic prefixes

Eval result storage and versioning

Tests for retrieval pipelines

Beyond MCP: tool-use specs in major models

Cost tests: catching the prompt that doubled spend

The judge pattern for confidence

MCP in 10 minutes

QA: test-plan generation from acceptance criteria

Versioning agent behaviour: prompts as source code

UX tests for AI-generated content

Agent observability: traces that tell you what happened

Eval anti-patterns: when evals make products worse

Browsing agents: scraping vs. structured tools

Eval-driven prompt iteration

Tool-use evals: right tool, right order

Voice-first agents: the latency budget you live within

Agent memory: what to write down, what to forget

Hallucination checks: cite-or-it-didn't-happen

Field notes in
Engineering.