the agile monkeys Lab

Jun 9, 2026

Our Take on AI SDLC ROI in Regulated Industries

Why regulated companies should focus less on baseline agent capability and more on the operating environment that makes enterprise work usable, governed, and measurable.

AI agents

AI SDLC

governance

Jun 2, 2026

The brand that survives AI

When AI produces the work but the brand still lives where only humans can read it, identity drifts one degree per cycle. Move the brand to where the AI reads.

AI strategy

brand

design systems

Mar 19, 2026

AI Agent Platform Architecture: What One Real Agent Taught Us

A control plane isn’t your starting point. It’s what remains after one real agent has forced you to solve retrieval, permissions, review, action visibility, and outcome tracking.

AI agents

architecture

AI strategy

Mar 19, 2026

RAG Retrieval Evaluation in the Ingestion Pipeline: What We Learned

Most retrieval regressions don’t begin when the user asks a question. They begin earlier, when new content is parsed, chunked, labeled, indexed, and quietly made available to the agent. Once we saw that clearly, we stopped treating retrieval QA as a chat problem and started moving it into the…

RAG

evaluation

data pipelines

Mar 19, 2026

LLM-Powered Codebase Analysis: Technical Due Diligence Before Software Estimation

This is the point where a supposedly small feature often reveals itself. Sometimes it’s a straightforward extension of existing flows. Sometimes it’s an integration into a partially built capability. And sometimes there’s a genuine architectural mismatch that justifies a rebuild.

LLM

architecture

Mar 4, 2026

Reducing LLM Cost and Latency in Production AI Agents: A Case Study

How we reduced token spend and latency in a production eCommerce agent flow, with before-and-after metrics.

AI agents

LLM

production

architecture

Mar 2, 2026

Legacy Modernization with AI: From 229 Vulnerabilities to Zero

A story about confronting technical debt, outsmarting false alarms, and building bridges across incompatible worlds — without rewriting everything from scratch.

legacy modernization

LLM

Feb 26, 2026

How to Build Products Out of AI Agents (Instead of Adding AI to Them)

Why product teams should stop adding AI as a helper and start composing features from agents, tools, prompts, and integrated evaluation.

AI agents

AI strategy

evaluation

Feb 11, 2026

Vector Search for Fashion E-commerce: Custom Embedding Lens Without Fine-Tuning

A step-by-step story of how we nudged our vector search toward beach-ready products without fine tuning the whole model (or re-indexing eleven-thousand SKUs).

RAG

e-commerce

LLM

Dec 11, 2025

AI Document Parsing in Production: Benchmarking GPT, Claude, AWS and Azure

64 controlled runs comparing GPT-5.1, Claude Sonnet 4.5, AWS Textract, Azure Document Intelligence, and Google DocAI across passports, certificates, and tax forms.

LLM

evaluation

production

Nov 24, 2025

AI-Powered E2E Testing: 70% Reduction in Test Execution Time with Stagehand

From ChatGPT experiments to production-ready E2E testing: how Stagehand, Browserbase, and Gemini 2.5 Flash enabled a 70% reduction in test execution time.

AI agents

testing

Nov 17, 2025

AI Evaluation in Production: Why Evals Are a Mindset, Not a Tool

Evals is not a UI, or a dashboard, or a platform — it’s about data. Understanding what evaluating really means and building a culture of continuous evaluation.

evaluation

AI strategy

production

Nov 13, 2025

AGI Is a Decade Away: Why Autonomous Agents and Practical AI Engineering Start Now

Highlights and reflections from The Agile Monkeys team discussion on Andrej Karpathy’s interview with Dwarkesh Patel — exploring cognitive cores, autonomous agents, and the practical path to AI integration.

AI agents

AI strategy

RAG

Jul 24, 2025

How to Migrate LLMs Safely in Production: GPT-4o to GPT-4.1 Case Study

How we migrated models in production while minimizing risks through a comprehensive multi-level evaluation system.

LLM

evaluation

production

Jul 24, 2025

Mega-Prompt vs Specialized Models vs Fine-Tuning: An E-commerce Search Benchmark

Putting the mega‑prompt vs micro‑models hypothesis to the test — can specialization really beat size?

LLM

e-commerce

evaluation

Jul 24, 2025

Improving Vector Search Recall by Fusing Multi-Modal Embeddings

Fusing multiple modalities embeddings have demonstrated consistent improvements (14% over just image embeddings) in fashion e-commerce retrieval.

RAG

e-commerce

Jul 18, 2025

Apache NiFi for AI Data Pipelines: Pros, Cons and Alternatives

Apache NiFi is a powerful visual-coding tool to build data pipelines. We explore its pros and cons and whether it’s the right tool for building your AI data pipelines.

data pipelines

RAG

architecture

Apr 28, 2025

Feature Augmented Retrieval: How to Improve RAG Without Changing Your LLM

Feature Augmented Retrieval provides a structured, explicit, and adaptive framework that significantly enhances retrieval accuracy and relevance.

RAG

LLM