
In this episode we explore how companies should evaluate AI agents across multiple dimensions — including correctness, tool selection, multi-turn reasoning, and safety . The conversation covers building reliable evaluation frameworks, balancing automated vs. human-in-the-loop testing, and leveraging observability to debug agent behavior in production.Links from the ShowAgentCore Evaluation: https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/07-AgentCore-evaluationsStrands Evaluation: https://strandsagents.com/docs/user-guide/evals-sdk/quickstart/AWS Hosts: Nolan Chen & Malini ChatterjeeEmail Your Feedback: rethinkpodcast@amazon.com
Podzilla Summary coming soon
Sign up to get notified when the full AI-powered summary is ready.
Free forever for up to 3 podcasts. No credit card required.

Episode 51: Rethinking Cloud Security in the Age of Zero-Days and AI

Episode 50: Rethink Agentic AI: From Shadow Risk to Strategic Value

Episode 48: Rethinking Software Engineering through a Spec-Driven Approach with Kiro

Episode 47: What it Takes to Win in 2026
Free AI-powered recaps of AWS re:Think Podcast and your other favorite podcasts, delivered to your inbox.
Free forever for up to 3 podcasts. No credit card required.