Platform Engineering teams running Kubernetes-based IDPs have solved observability for modern workloads. Prometheus scrapes metrics, Loki aggregates logs, Tempo stores distributed traces, OpenTelemetry instruments everything, and Grafana ties it together. For microservices, APIs, and databases, that stack works. It handles scale, it handles failure and it handles cardinality. Then AI agents enter the platform as first-class workloads and the assumptions that underpinned that observability stack start to crack. Not because the stack is wrong, but because agents behave differently. A microservice executes deterministic logic and returns a result while an AI agent reasons across multiple steps, selects and calls tools dynamically, retrieves context from vector stores, chains LLM calls, and produces outputs that are non-deterministic by design. A span that shows 200 OK and 1.4 seconds tells you nothing about whether the agent reached the right conclusion, used the right context, or stayed within safe operational boundaries. This talk walks through the concrete evolution of a production observability stack as AI agents are introduced to the platform. Starting from a standard observability setup, this session covers the new challenges that are introduced by AI agents and how to face them.
LANGUAGE
English
LEVEL
Intermediate
FORMAT
Session