Tagged · Engineering
Field notes,
Engineering.
134 articles in this tag — part of the Jaypore Labs journal.
- 01Engineering
Determinism harnesses for non-deterministic systems
Apr 30, 20262 min read - 02Engineering
Multi-agent orchestration: from kitchen brigade to opera
Apr 30, 20263 min read - 03Engineering
Retry strategies that don't compound errors
Apr 30, 20263 min read - 04Engineering
Your first MCP server (Node)
Apr 29, 20268 min read - 05Engineering
MCP error handling: tell the model what went wrong
Apr 29, 20262 min read - 06Engineering
What makes an eval good
Apr 29, 20267 min read - 07Engineering
MCP for CI/CD: build-system tools as agent inputs
Apr 28, 20262 min read - 08Engineering
Trend evals vs. threshold evals
Apr 28, 20262 min read - 09Engineering
Fall-back chains: cheap → expensive → human
Apr 27, 20263 min read - 10Engineering
Integration tests for AI features: contract or behavioural?
Apr 27, 20263 min read - 11Engineering
CI strategy: smoke vs. full suite for LLM apps
Apr 24, 20262 min read - 12Engineering
Self-consistency: when N=3 beats a smarter prompt
Apr 24, 20263 min read - 13Engineering
Cost guardrails: stop runaway agents before billing does
Apr 23, 20266 min read - 14Engineering
End-to-end tests for AI workflows: scope and survival
Apr 23, 20262 min read - 15Engineering
MCP for actioning tools (PR creator, ticket closer)
Apr 23, 20262 min read - 16Engineering
MCP and the Claude Code workflow specifically
Apr 22, 20262 min read - 17Engineering
Pairwise judges: A/B agreement at scale
Apr 22, 20262 min read - 18Engineering
Pinning model versions through provider migrations
Apr 22, 20262 min read - 19Engineering
Drift catchers: detecting style shifts
Apr 21, 20262 min read - 20Engineering
Eval CI: the pass/fail gate that's actually useful
Apr 21, 20262 min read - 21Engineering
Prompt invariance: prompts that survive paraphrase
Apr 21, 20263 min read - 22Engineering
Tool failure modes: timeouts, retries, idempotency
Apr 21, 20264 min read - 23Engineering
Context engineering: what to load, what to defer
Apr 20, 20264 min read - 24Engineering
Output validation: pydantic, zod, and friends in production
Apr 20, 20262 min read - 25Engineering
Versioning model + prompt as a unit
Apr 20, 20263 min read - 26Engineering
Building agents that explain themselves
Apr 16, 20263 min read - 27Engineering
Constrained decoding: the underrated lever
Apr 16, 20263 min read - 28Engineering
Safety guardrails: refusal patterns that don't make agents useless
Apr 16, 20263 min read - 29Engineering
Confidence calibration: when 'I don't know' is the answer
Apr 15, 20263 min read - 30Engineering
Counter-example mining
Apr 15, 20263 min read - 31Engineering
The post-launch test plan: what runs forever
Apr 15, 20263 min read - 32Engineering
Retiring an agent
Apr 14, 20263 min read - 33Engineering
Long-horizon tasks: keeping an agent on rails for hours
Apr 13, 20264 min read - 34Engineering
MCP authorization: per-user permissions
Apr 13, 20262 min read - 35Engineering
MCP composition: when one server should call another
Apr 13, 20262 min read - 36Engineering
MCP server versioning: shipping breaking changes safely
Apr 13, 20262 min read - 37Engineering
MCP transport: stdio vs. HTTP vs. SSE
Apr 13, 20262 min read - 38Engineering
Deploying agents in CI: scoped, audited, repeatable
Apr 10, 20267 min read - 39Engineering
Caching deterministic prefixes
Apr 10, 20263 min read - 40Engineering
Eval result storage and versioning
Apr 10, 20262 min read - 41Engineering
Tests for retrieval pipelines
Apr 10, 20262 min read - 42Engineering
Beyond MCP: tool-use specs in major models
Apr 9, 20262 min read - 43Engineering
Cost tests: catching the prompt that doubled spend
Apr 9, 20262 min read - 44Engineering
The judge pattern for confidence
Apr 9, 20263 min read - 45Engineering
MCP in 10 minutes
Apr 9, 20266 min read - 46Engineering
Versioning agent behaviour: prompts as source code
Apr 8, 20263 min read - 47Engineering
UX tests for AI-generated content
Apr 8, 20262 min read - 48Engineering
Agent observability: traces that tell you what happened
Apr 7, 20266 min read - 49Engineering
Eval anti-patterns: when evals make products worse
Apr 7, 20263 min read - 50Engineering
Browsing agents: scraping vs. structured tools
Apr 6, 20263 min read - 51Engineering
Eval-driven prompt iteration
Apr 6, 20262 min read - 52Engineering
Tool-use evals: right tool, right order
Apr 6, 20262 min read - 53Engineering
Voice-first agents: the latency budget you live within
Apr 6, 20263 min read - 54Engineering
Agent memory: what to write down, what to forget
Apr 3, 20263 min read - 55Engineering
Hallucination checks: cite-or-it-didn't-happen
Apr 3, 20263 min read - 56Engineering
MCP server observability
Apr 3, 20262 min read - 57Engineering
Prompt evolution: how agents get worse without you noticing
Apr 3, 20263 min read - 58Engineering
Red-teaming your own prompt
Apr 3, 20263 min read - 59Engineering
Tests for tool-using agents: trace assertions
Apr 2, 20263 min read - 60Engineering
MCP authentication: tokens, scopes, OAuth
Apr 1, 20262 min read - 61Engineering
MCP server rate limits: the polite-rejection pattern
Apr 1, 20262 min read - 62Engineering
Property-based testing for LLM features
Apr 1, 20262 min read - 63Engineering
Building your first eval set from scratch
Mar 31, 20268 min read - 64Engineering
Evals for agents: trajectory + outcome
Mar 31, 20267 min read - 65Engineering
MCP and secrets management
Mar 31, 20262 min read - 66Engineering
MCP server hosting: local, sidecar, remote
Mar 31, 20262 min read - 67Engineering
MCP tool naming: making tools discoverable
Mar 31, 20262 min read - 68Engineering
LLM-as-judge: when to trust it, when not
Mar 30, 20267 min read - 69Engineering
MCP for data tools (Postgres, BigQuery, S3)
Mar 30, 20262 min read - 70Engineering
Structured output: JSON mode, schemas, why one beats the other
Mar 30, 20267 min read - 71Engineering
Idempotency keys for LLM calls
Mar 27, 20263 min read - 72Engineering
Why we need MCP at all
Mar 27, 20262 min read - 73Engineering
Human eval workflows: instructions that don't vary
Mar 26, 20262 min read - 74Engineering
Judging open-ended output without a rubric
Mar 26, 20262 min read - 75AI
MCP servers are USB-C for AI
Mar 26, 20265 min read - 76Engineering
MCP tool schemas: arg shapes that help
Mar 26, 20262 min read - 77Engineering
Regression cohorts: catching what evals miss
Mar 26, 20263 min read - 78Engineering
Code-writing agents: the test-first discipline
Mar 25, 20263 min read - 79Engineering
Drift tests vs. functional tests: separate lanes
Mar 25, 20263 min read - 80Engineering
Plan vs. act: the agent loop everyone gets wrong
Mar 25, 20266 min read - 81Engineering
Privacy tests: PII redaction assertions
Mar 24, 20262 min read - 82Engineering
Sub-agents: when 1+1 actually equals 2
Mar 24, 20264 min read - 83Engineering
Calibrating your judge: meta-evals
Mar 23, 20262 min read - 84Engineering
Tool design: write tools the way you write APIs
Mar 23, 20268 min read - 85Engineering
Golden-set discipline
Mar 20, 20263 min read - 86Engineering
Why probabilistic systems still need deterministic contracts
Mar 20, 20267 min read - 87Engineering
Refusal grammars: predictable, not surprising
Mar 20, 20263 min read - 88Engineering
MCP for internal tools (Linear, Notion, Slack analogues)
Mar 19, 20262 min read - 89Engineering
Multimodal agents: when adding vision actually helps
Mar 19, 20264 min read - 90Engineering
Test-data management for AI: synthetic vs. real
Mar 19, 20262 min read - 91Engineering
Behavioural assertions: testing 'should-ness'
Mar 18, 20262 min read - 92Engineering
Eval taxonomy: golden, behavioural, drift, safety
Mar 18, 20263 min read - 93Engineering
Evals for retrieval: separating retrieval from synthesis
Mar 18, 20262 min read - 94Engineering
Your first MCP server (Python)
Mar 18, 20262 min read - 95Engineering
Agent A/B tests: comparing without confusing your users
Mar 17, 20263 min read - 96Engineering
The deterministic-envelope pattern
Mar 17, 20263 min read - 97Engineering
MCP and prompt injection: ambient instructions
Mar 17, 20262 min read - 98Engineering
Few-shot drift: why golden examples poison new versions
Mar 16, 20263 min read - 99Engineering
The judge pattern: agents that grade other agents
Mar 16, 20264 min read - 100Engineering
PII in test fixtures: the boring legal slope
Mar 16, 20263 min read - 101Engineering
Skills files: recipes the model can call
Mar 13, 20264 min read - 102Engineering
Evals that survive a model bump
Mar 12, 20263 min read - 103Engineering
Managed agents: when to reach for them
Mar 12, 20264 min read - 104Engineering
Mock LLMs in tests: when to fake, when to call
Mar 12, 20263 min read - 105Engineering
The red set: adversarial cases you're allowed to fail
Mar 12, 20262 min read - 106Engineering
The new test pyramid for AI products
Mar 11, 20267 min read - 107Engineering
Per-feature evals vs. per-model evals
Mar 11, 20262 min read - 108Engineering
Sampling production traffic for eval
Mar 11, 20262 min read - 109Engineering
Security tests: prompt-injection regression suite
Mar 10, 20262 min read - 110Engineering
Temperature, top-p, and the production tradeoff
Mar 10, 20263 min read - 111Engineering
The future of MCP
Mar 6, 20262 min read - 112Engineering
MCP testing: harnesses, fixtures, regressions
Mar 6, 20262 min read - 113Engineering
Output post-processors that don't hide the truth
Mar 6, 20263 min read - 114Engineering
Authoring eval cases
Mar 5, 20262 min read - 115Engineering
Snapshot tests: where they help, where they trap
Mar 5, 20262 min read - 116Engineering
Tests for streaming responses
Mar 5, 20262 min read - 117Engineering
Agent rollback: kill switches on day one
Mar 4, 20263 min read - 118Engineering
Determinism for tool calls: keys, ordering, side-effects
Mar 4, 20262 min read - 119Engineering
Output diffing in CI
Mar 4, 20263 min read - 120Engineering
Reading an eval dashboard
Mar 4, 20262 min read - 121Engineering
Accessibility tests for AI surfaces
Mar 3, 20262 min read - 122Engineering
Eval-driven development
Mar 3, 20263 min read - 123Engineering
Eval ownership in an org: PM, eng, or QA?
Mar 3, 20262 min read - 124Engineering
Performance tests: token budgets and latency SLAs
Mar 3, 20262 min read - 125AI
The agent maturity curve
Mar 2, 20269 min read - 126Engineering
Auto-generated eval cases from production logs
Mar 2, 20262 min read - 127Engineering
Eval cost management
Mar 2, 20262 min read - 128Engineering
AI-native debugging: the rubber duck got smarter
Feb 26, 20264 min read - 129Engineering
Semantic caching: why your top 1% of queries cost 60% of your bill
Feb 17, 20264 min read - 130Engineering
AI canary deployments: 1% traffic, 100% paranoia
Feb 2, 20264 min read - 131Engineering
AI incident response: the postmortem template you'll wish you had
Jan 15, 20264 min read - 132Engineering
HIPAA and AI: the BAA is the first conversation
Dec 26, 20254 min read - 133AI
AI and the symphony conductor: orchestration is older than software
Dec 19, 20254 min read - 134AI
AI and air traffic control: a 70-year-old playbook for safe autonomy
Dec 18, 20254 min read