Doc | The Researcher

Technical deep-dives into AI research, models, and architectures. Bridging the gap between academic papers and daily engineering.

45 articles Website

AI Research8 min

The 3B Parameter Frontier: Reasoning Is Compressing, Knowledge Isn't

A 3B model matches 500B+ models on math and coding benchmarks — changing how we think about architecture, on-device AI, and build-vs-buy.

AI Research

Jun 24

AI Research7 min

The Agent OS Race: Microsoft Project Solara and the Battle for Agent Runtime Supremacy

Microsoft Build 2026 revealed a three-front strategy to own the agent runtime layer. Project Solara, MXC, and WAF together form the most comprehensive enterprise agent OS play from any platform vendor.

AI Research

Jun 5

AI Research9 min

Hermes Agent's Closed Learning Loop Makes Static Prompts Obsolete

Hermes Agent's built-in skill creation, memory curation, and session search shift the AI product moat from prompt engineering to growth architecture.

AI Research

Jun 4

AI Research6 min

Agent Skills Marketplace: The Architectural Failure Worse Than Log4j

Snyk's ToxicSkills study found 36% of agent skills have security flaws. ClawChain proved 4 chainable CVEs can take a skill installation to persistent host control.

AI Research

Jun 4

Low poly 3D landscape showing a vast plain of identical monoliths

AI Research5 min

Agent Skills Are Eating the Plugin Layer — The Composable Capability Layer That Determines AI Platform Lock-In

The composable capability layer is evolving into an app-store-like ecosystem. Why the skill ecosystem — not the base model — determines AI platform lock-in in 2026.

AI Research

Jun 4

Low poly 3D scene of a massive dam with a hairline crack at the base,

AI Research7 min

AI Cyber Defense Patch Gap: Remediation Infrastructure Over Detection

The Patch Gap: Why Remediation Infrastructure Is the Only Defensible Bet in AI Cyber Defense

AI Research

Jun 4

A geometric abstract composition of a central AI sphere

8 min

AI as Distribution Layer: Why Deployment Surface Area Beats Model Quality

How AI models are becoming distribution layers embedded in enterprise workflows, where competitive advantage shifts from model capability to deployment surface area — compliance, RBAC, and platform integration.

Jun 4

Geometric abstract composition of a telescope

AI Research6 min

The Infrastructure Category That Didn't Exist Two Years Ago: AI Agent Observability

Why traditional APM breaks on agent workloads and how LangSmith, Braintrust, and Arize are building the observability stack for the AI era.

AI Research

Jun 3

Two abstract geometric figures facing each other across a luminous dividing line

AI Research7 min

AI Cybersecurity Arms Race: Anthropic Mythos vs OpenAI Cyber

When Anthropic announced Claude Mythos Preview on April 7, the real news was buried in their own press release: not that they had a better model, but that

AI Research

Jun 3

AI Research9 min

Inside Mistral's Full-Stack Pivot: Data Centers, Physics AI, and the Sovereignty Calculus

On May 28, 2026, Mistral AI held its AI Now Summit in Paris and laid out a strategic transformation that amounts to a fundamental repositioning of the comp

AI Research

May 31

Analysis9 min

The Trust Deficit: Agent Capabilities Leapt Ahead While Governance Crawled

On May 28, Claude Opus 4.8 shipped with a feature called dynamic workflows. Claude Code can now orchestrate hundreds of parallel subagents in a single sess

Analysis

May 29

AI Research11 min

The Verification Gap: AI-Generated Code Passes Benchmarks by Gaming the Tests

The benchmark said it was correct. The verifier said it passed. In production, it silently corrupted your training run. This is the verification gap — the most consequential blind spot in AI-generated code today.

AI Research

May 29

Analysis9 min

What $965 Billion Buys When the Model Frontier Flattens

Anthropic's $965 billion valuation is pricing the infrastructure platform — not the Opus 4.8 model. The May 28 launch was about agent infrastructure, not a model update.

Analysis

May 29

Landscape8 min

The Enterprise Agent Platform Land Grab: Why Incumbents With Context Advantage Will Win the "Human-Agent Team" Race

A popular thesis in venture circles holds that AI agents will hollow out enterprise SaaS. The argument goes like this: agents will abstract away the interface layer, users will interact with models rather than applications, and the trillion-dollar SaaS ecosystem will be reduced to plumbing behind an API. Salesforce becomes a dumb database. Asana becomes a task log. The agent becomes the platform.

Landscape

May 29

AI Research7 min

Protestware 2.0: When Open Source Maintainers Weaponize Prompt Injection Against AI Coding Agents

The security industry has spent the last five years building defenses against software supply chain attacks. We scan dependencies for known vulnerabilities

AI Research

May 29

Engineering9 min

The Cache-Aware Pricing Revolution: Why LLM 'Sticker Prices' Are Now Meaningless

The $/M token number plastered across every LLM pricing page has become a distraction. Two models with identical sticker prices can differ in effective cost by a factor of ten or more — and the cheaper-on-paper model is often the more expensive one in practice.

Engineering

May 29

Analysis11 min

The Inelasticity Trap: Why Your Soaring AI Bill Is Proof the Labs Won

Coding agents created a dependency so deep that enterprises have zero leverage on price. The April pricing reset wasn't a market failure — it was the endgame.

Analysis

May 29

Case Studies8 min

Priced at Zero: Testing Freebuff, the Ad-Supported AI Coding Agent

Freebuff challenges the assumption that serious AI coding help requires a subscription — and proves that multi-agent architecture matters more than the price tag.

Case Studies

May 29

Analysis10 min

The Swiss Precedent: Why Europe's Most Pragmatic AI Strategy Isn't Coming from Brussels

The dominant narrative in AI governance splits the world into two camps — the EU and the US. Switzerland proves this binary is incomplete.

Analysis

May 27

AI Research15 min

The Technique Is the Product: Why NVIDIA's Minitron Changes How We Build Model Families

Training model families from scratch is economically wasteful. NVIDIA's Minitron proves that pruning a large model and distilling it into smaller variants costs 1.8x less and often produces better results.

AI Research

May 27

Landscape8 min

The Agent Memory Stack: Why the Real AI Coding Advantage Is Open Source, Not Big AI

The coding agent wars are a sideshow. The real battle is being fought in a layer below — and it's already been won by open source.

Landscape

May 27

AI Research8 min

The Great Patching Bottleneck: When Discovery Outruns Remediation

The security bottleneck has flipped. AI models now find vulnerabilities faster than humans can fix them — and the data shows the discovery-to-patch ratio has structurally inverted.

AI Research

May 27

AI Research7 min

AI's Measurement Crisis: Why Every Coding Agent Benchmark Is Wrong

DeepSWE audited SWE-Bench Pro and found 32% of verdicts are wrong. Models cheat by reading git history. The real GPT-5.5 vs Claude Opus gap is 16 points — in the opposite direction.

AI Research

May 27

Analysis13 min

The AI Value Reckoning Is Here — Most Companies Won't Survive It

There's a conversation happening in boardrooms that the AI industry doesn't want you to hear. 'We spent $50 million on AI last year. Show me the revenue.' The awkward silence that follows is the defining economic fact of the AI industry in 2026.

Analysis

May 27

Analysis10 min

The Pope's AI Encyclica: What "Magnifica Humanitas" Means for the Global AI Debate

The Vatican's first AI-focused encyclical is not just a religious document — it's a strategic intervention that will shape the global AI debate.

Analysis

May 26

Engineering8 min

Chrome DevTools MCP: Full Browser Control for Every Coding Agent

For the last two years, coding agents have been remarkably effective at writing, debugging, and explaining code. But they've had a blind spot: the browser.

Engineering

May 25

Engineering11 min

The AI Tokenmaxxing Reckoning: When More Tokens Don't Mean More Value

For the past eighteen months, engineering leaders have been playing a game of AI chicken. The rules are simple: whoever burns through the most tokens wins.

Engineering

May 25

Landscape8 min

Anthropic Acquires Stainless: The MCP Infrastructure Play That Changes Everything

Anthropic's acquisition of Stainless signals that agent connectivity infrastructure — SDKs, MCP servers, and API tooling — is the next great platform battleground. Here's what technical leaders need to know.

Landscape

May 25

AI Research7 min

Constraint Decay: Why LLM Coding Agents Collapse Under Real-World Backend Requirements

A new academic paper systematically evaluates LLM agents on multi-file backend generation and reveals 'constraint decay' — as requirements increase, agent performance drops 30+ points.

AI Research

May 25

Opinion7 min

Claude Is Not Your Architect: Why AI-Generated Designs Are a Jenga Tower Waiting to Collapse

Somewhere between asking Claude for a quick second opinion and letting it write your Jira tickets, you lost the plot. And now you are building a Jenga tower on a conference room table, pretending it is architecture.

Opinion

May 25

Case Studies9 min

Reasonix: The DeepSeek-Native Coding Agent That Cuts Token Costs by 80%

A terminal-native coding agent that treats prefix caching as an engineering invariant, not an afterthought. Real-world data shows 99.82% cache hit rates and $12/day for 435M tokens.

Case Studies

May 24

Landscape8 min

The Great Unbundling: Why Nvidia's Crown Won't Fit the Agentic Future

The AI market's biggest blind spot is the gap between answer inference and agentic inference. Nvidia's premium-on-latency bet may miss the mark.

Landscape

May 24

Guides8 min

Your Local LLM Workflow in 2026: From Model Management to Production

A step-by-step tutorial on setting up a modern local LLM workflow in mid-2026, covering Ollama, MLX, and Edgee with cost comparisons vs cloud.

Guides

May 24

Opinion12 min

The Death of the Blue Link: What Google's AI-First Search Means for Developers and Publishers

If you blinked during Google I/O 2026 (May 20-21), you might have missed the single biggest shift in web search since Larry and Sergey filed their PageRank

Opinion

May 24

Landscape10 min

Coding Agents in 2026: Codex vs Claude Code vs Antigravity vs Copilot

By mid-2026, coding agents have moved from experimental novelty to the default way professional developers build software. Four platforms dominate the conv

Landscape

May 24

AI Research12 min

OpenAI's New Model Disproved an 80-Year-Old Math Conjecture — What This Means for AI Reasoning

In May 2026, OpenAI's reasoning model independently disproved a famous unsolved geometry conjecture by Paul Erdős (1946). Here's what happened, why the math community accepted it, and what it means for AI reasoning and developers.

AI Research

May 24

Landscape10 min

Google's AI Agent Ecosystem Is a Mess — Here's How Developers Can Navigate It

Google's agent ecosystem is expanding faster than developers can track. At Google I/O 2026, seven distinct agent products were announced. Here's how to navigate it all.

Landscape

May 24

Case Studies10 min

How I Built an AI-Powered Editorial Pipeline with OpenCode and EmDash CMS

I built artificialus.com on Astro 6 with EmDash CMS, and the core operational challenge was this: how do you get the rigour of a traditional editorial proc

Case Studies

May 24

Landscape8 min

The Claw Wars: How Open-Source Personal AI Assistants Are Reshaping Development

In six months, open-source AI assistants went from a niche hobbyist pursuit to one of the most competitive battlegrounds in software development.

Landscape

May 24

AI Research7 min

Dreaming Is Not a Metaphor. It Is a Cognitive Architecture Decision.

Anthropic's choice to call the new memory consolidation feature 'Dreaming' is not branding. The biological analogy maps precisely onto the design decisions underneath it — and understanding why tells you more about how Anthropic thinks about agent cognition than any product announcement ever will.

AI Research

May 24

Guides9 min

Is Your Site Agent-Ready? The 5-Category Framework Every Developer Needs to Check

Paste your URL into Is It Agent Ready and you'll know in thirty seconds how invisible your site is to the AI agents already browsing it. Most sites fail every category — not because they blocked agents, but because they never declared themselves. Here is the 5-category framework every developer needs to check before their site becomes invisible to the next wave of automated clients.

Guides

May 23

Guides7 min

Building Production-Ready Claude Code Skills

Claude Code Skills are filesystem-based modules that extend the agent with specialized capabilities, and they're not the same thing as CLAUDE.md. Here's how the progressive-disclosure architecture actually works, how to build a production-ready skill end-to-end, and why Simon Willison thinks they might be a bigger deal than MCP.

Guides

May 17

Opinion9 min

What It Means to Be a Developer in 2026

AI coding agents are no longer just tools that write code faster — they're starting to operate as genuine collaborators with memory, context, and the ability to act across an entire codebase. The developers who'll matter most in 2026 aren't those who write the most code. They're the ones who still ask the right questions.

Opinion

May 17

Guides9 min

Understanding MCP: The Model Context Protocol Explained

A deep dive into the Model Context Protocol, the open standard that enables AI agents to interact with tools, data sources, and services securely.

Guides

May 17

Opinion6 min

Who Owns AI-Generated Code?

Your AI agent writes thousands of lines a day. But who legally owns them? Courts in the US, EU, and UK are reaching different conclusions — and the implications for every developer and company building on AI-generated code are more serious than the industry is admitting.

Opinion

May 15

Doc | The Researcher

The 3B Parameter Frontier: Reasoning Is Compressing, Knowledge Isn't

The Agent OS Race: Microsoft Project Solara and the Battle for Agent Runtime Supremacy

Hermes Agent's Closed Learning Loop Makes Static Prompts Obsolete

Agent Skills Marketplace: The Architectural Failure Worse Than Log4j

Agent Skills Are Eating the Plugin Layer — The Composable Capability Layer That Determines AI Platform Lock-In

AI Cyber Defense Patch Gap: Remediation Infrastructure Over Detection

AI as Distribution Layer: Why Deployment Surface Area Beats Model Quality

The Infrastructure Category That Didn't Exist Two Years Ago: AI Agent Observability

AI Cybersecurity Arms Race: Anthropic Mythos vs OpenAI Cyber

Inside Mistral's Full-Stack Pivot: Data Centers, Physics AI, and the Sovereignty Calculus

The Trust Deficit: Agent Capabilities Leapt Ahead While Governance Crawled

The Verification Gap: AI-Generated Code Passes Benchmarks by Gaming the Tests

What $965 Billion Buys When the Model Frontier Flattens

The Enterprise Agent Platform Land Grab: Why Incumbents With Context Advantage Will Win the "Human-Agent Team" Race

Protestware 2.0: When Open Source Maintainers Weaponize Prompt Injection Against AI Coding Agents

The Cache-Aware Pricing Revolution: Why LLM 'Sticker Prices' Are Now Meaningless

The Inelasticity Trap: Why Your Soaring AI Bill Is Proof the Labs Won

Priced at Zero: Testing Freebuff, the Ad-Supported AI Coding Agent

The Swiss Precedent: Why Europe's Most Pragmatic AI Strategy Isn't Coming from Brussels

The Technique Is the Product: Why NVIDIA's Minitron Changes How We Build Model Families

The Agent Memory Stack: Why the Real AI Coding Advantage Is Open Source, Not Big AI

The Great Patching Bottleneck: When Discovery Outruns Remediation

AI's Measurement Crisis: Why Every Coding Agent Benchmark Is Wrong

The AI Value Reckoning Is Here — Most Companies Won't Survive It

The Pope's AI Encyclica: What "Magnifica Humanitas" Means for the Global AI Debate

Chrome DevTools MCP: Full Browser Control for Every Coding Agent

The AI Tokenmaxxing Reckoning: When More Tokens Don't Mean More Value

Anthropic Acquires Stainless: The MCP Infrastructure Play That Changes Everything

Constraint Decay: Why LLM Coding Agents Collapse Under Real-World Backend Requirements

Claude Is Not Your Architect: Why AI-Generated Designs Are a Jenga Tower Waiting to Collapse

Reasonix: The DeepSeek-Native Coding Agent That Cuts Token Costs by 80%

The Great Unbundling: Why Nvidia's Crown Won't Fit the Agentic Future

Your Local LLM Workflow in 2026: From Model Management to Production

The Death of the Blue Link: What Google's AI-First Search Means for Developers and Publishers

Coding Agents in 2026: Codex vs Claude Code vs Antigravity vs Copilot

OpenAI's New Model Disproved an 80-Year-Old Math Conjecture — What This Means for AI Reasoning

Google's AI Agent Ecosystem Is a Mess — Here's How Developers Can Navigate It

How I Built an AI-Powered Editorial Pipeline with OpenCode and EmDash CMS

The Claw Wars: How Open-Source Personal AI Assistants Are Reshaping Development

Dreaming Is Not a Metaphor. It Is a Cognitive Architecture Decision.

Is Your Site Agent-Ready? The 5-Category Framework Every Developer Needs to Check

Building Production-Ready Claude Code Skills

What It Means to Be a Developer in 2026

Understanding MCP: The Model Context Protocol Explained

Who Owns AI-Generated Code?

Track the tools. Lead the shift.