Episodes

  • Issue #10: GPT-5.5 Reclaims the Agentic Crown
    Apr 29 2026
    Issue #10: GPT-5.5 reclaims the agentic crown with 82.7% on Terminal-Bench 2.0 and fewer tokens per task. Stanford's SWE-chat study reveals 44% of agent-produced code gets thrown away. ToolSimulator from Strands Evals SDK lets you test agents without live APIs. NVIDIA exposes AGENTS.md injection as a supply chain attack vector hiding in every coding agent. Plus: Bedrock AgentCore, Deep Research Max, context-mode, and the Agent Index. Subscribe to the newsletter: https://theagenticengineer.waltsoft.net YouTube: https://www.youtube.com/@theagenticengineerpod Twitter: https://x.com/natearcher_ai
    Show More Show Less
    15 mins
  • Issue #9: Claude Opus 4.7 Ships Cyber Safeguards to Production
    Apr 22 2026
    Issue #9: Claude Opus 4.7 ships differential capability reduction as the first production cyber safeguard baked into model weights. Vercel breached through an AI tool's OAuth scope. Spring AI SDK for Bedrock AgentCore goes GA for Java. GTA-2 paper proves your agent harness matters more than your model. And CMU documents 6 million fake GitHub stars across the AI ecosystem. Subscribe to the newsletter: https://theagenticengineer.waltsoft.net YouTube: https://www.youtube.com/@theagenticengineerpod Twitter: https://x.com/natearcher_ai
    Show More Show Less
    15 mins
  • Issue #8: Anthropic ships Managed Agents, UC Berkeley breaks every major AI benchmark, AWS Agent Registry launches in preview
    Apr 15 2026
    Issue #8: Anthropic ships Managed Agents, UC Berkeley breaks every major AI benchmark, AWS Agent Registry launches in preview. Plus Cursor 3, Copilot Rubber Duck, Cloudflare Agent Cloud, and the hot take on exploitable benchmarks. Subscribe to the newsletter: https://theagenticengineer.waltsoft.net YouTube: https://www.youtube.com/@theagenticengineerpod Twitter: https://x.com/natearcher_ai
    Show More Show Less
    15 mins
  • Issue #7: Anthropic published the blueprint for multi-hour coding agents
    Apr 9 2026
    Anthropic published the blueprint for multi-hour coding agents. GitHub shipped /fleet for parallel multi-agent coding. Amazon Nova Act MCP gives your agent a browser with one install. Plus: Gemma 4 goes agentic on-device, Oh-My-Codex hits 17K stars, and LiteLLM fixes 3 CVEs post-breach. Subscribe to the newsletter: https://theagenticengineer.waltsoft.net YouTube: https://www.youtube.com/@theagenticengineerpod Twitter: https://x.com/natearcher_ai
    Show More Show Less
    17 mins
  • Issue #6: JetBrains Central, ARC-AGI-3, Claude Mythos Leak, Copilot Ads in PRs
    Apr 1 2026
    This week: JetBrains Central launches an open control plane for coding agents. ARC-AGI-3 drops and frontier AI scores below 1%. Claude Mythos gets leaked via CMS misconfiguration. MolmoWeb beats GPT-4o at 8B parameters. AI Scientist v2 passes peer review. 177K MCP tools show agents shifted from reading to writing. AWS Labs ships Agent Plugins for Claude Code and Cursor. Microsoft merges Semantic Kernel and AutoGen. And Copilot literally put an ad in someone's pull request. Subscribe to the newsletter: https://theagenticengineer.waltsoft.net YouTube: https://www.youtube.com/@theagenticengineerpod Twitter: https://x.com/natearcher_ai
    Show More Show Less
    16 mins
  • Issue #5: OpenCode 120K Stars, Claude Code Channels, Agent Memory Wars
    Mar 24 2026
    This week: OpenCode crosses 120K GitHub stars and 5M monthly devs. Claude Code ships Channels for event-driven coding agents. Hindsight hits #1 on LongMemEval for agent memory. Plus: Flash-MoE runs 397B params on a MacBook, NVIDIA open-sources NemoClaw, and our hot take on why memory is the real moat. Subscribe to the newsletter: https://theagenticengineer.waltsoft.net YouTube: https://www.youtube.com/@theagenticengineerpod Twitter: https://x.com/natearcher_ai
    Show More Show Less
    15 mins