Agentic AI Projects: From Deterministic to Probabilistic

Article Agentique 01.06.2026
By David Guede

Key Takeaways

  • Traditional digital is deterministic: a test that passes once generally passes the next time. Generative AI systems are probabilistic: they can work 99 times and break on the 100th. This doesn’t just change the tooling; it changes the entire mindset.
  • Three elements define this shift: upgrading data governance for LLMs, running a massive volume of probabilistic tests, and pairing deterministic logs with LLM-based evaluations to monitor the agent.
  • The right mindset for decision-makers: explicitly accept, at the executive sponsor level, that you cannot guarantee 100% accuracy. This alignment is what makes these projects possible.

For two decades, digital teams have operated in a deterministic world. A tested web page works. A validated workflow does the exact same thing twice. A bug is a bug: it reproduces, you patch it, and it disappears. This predictability shaped our methods, tools, roles, and trade-offs.

Generative AI systems shatter this logic. Asking the exact same question to the exact same model can yield two different answers. An agent that nails a response in 99 cases might derail on the 100th. And this isn’t a bug you patch—it is a structural property of the technology.

Shifting from deterministic to probabilistic is not just an IT issue. It demands a fundamental mindset shift across the entire organization: engineering, business units, governance, and executive sponsors.

Upgrading Data Governance

This is the most underestimated topic, and likely the most critical one long-term. Data governance, across most large enterprises, was built for digital. Content is structured to be displayed on websites, read by human users, and indexed by search engines.

Tomorrow, these same datasets will be read, interpreted, and outputted by LLMs. And what is legible to a human is not necessarily legible to a model. The example from the Orange team working on the “Sharlie” project is eye-opening: on certain questions regarding mobile plans covering Morocco or Switzerland, the agent failed half the time. The root cause wasn’t the model’s lack of intelligence. It was that the input data was formatted for a website—a large block of text that made perfect sense to a human, but completely confused the agent.

The takeaway is broad. The entire data chain (APIs, knowledge bases, business content, product specs) must be progressively overhauled so its outputs are inherently readable by an LLM.

  • How do you structure a table?
  • How do you explicitly define an eligibility condition?
  • How do you format a pricing sheet hierarchy?

Industrializing Validation Through Volume

In a deterministic world, you test a few representative edge cases. In a probabilistic world, that is no longer enough. If an error only occurs once every hundred or thousand times, manual QA will never catch it.

You have to industrialize the generation and evaluation of conversations. In practice, this requires two complementary layers: virtual customers that automatically replay massive volumes of diverse conversations, and an “LLM-as-a-Judge” that grades response quality against explicit criteria.

For a voice agent, this infrastructure becomes the equivalent of an A/B testing suite for a website: a continuous monitoring tool that flags deviations before they ever reach real customers.

The challenge isn’t just technical; it’s organizational:

  • Who defines the grading criteria?
  • Who interprets the results?
  • Who makes the call to halt a deployment when the quality score drops?

These are net-new roles that simply didn’t exist with this level of intensity in legacy digital orgs.

Monitoring by Combining Deterministic and Probabilistic Data

Observability is also changing fundamentally. On a traditional app, we rely on logs: structured, deterministic, and easy to query. You know an API call failed, latency crossed a threshold, or a specific error fired at a specific time.

For a voice agent, these logs still exist and remain critical. But they tell you absolutely nothing about the quality of the interaction. A call can be technically flawless (no errors, low latency, clean transcription) but a complete disaster for the customer (off-topic answer, inappropriate tone, factual hallucination).

Modern monitoring therefore blends two data sources:

  • Deterministic logs to track platform health (latency, error rates, tool calls, endless agent loops).
  • LLM evaluations to track qualitative health (answer relevance, brand tone adherence, factual accuracy, implicit customer satisfaction).

When you successfully cross-reference these two sources, you get a much clearer picture of what is actually happening. Latency spiking at the exact same time quality drops points to a systemic issue, not just an isolated hallucination. This combination is what makes observability truly actionable.

The Decisive Role of Executive Sponsors

Everything above can be engineered. But it will fail if the project’s executive sponsors haven’t internalized this paradigm shift.

The classic trap: demanding the project team guarantee 100% accuracy. In a probabilistic world, that is impossible, and promising it is dishonest. Conversely, explicitly agreeing at the governance level that certain critical topics demand 100% accuracy (and must therefore bypass the AI or include a hard safety net), while other areas tolerate a managed margin of error, is the absolute prerequisite for moving forward.

This executive alignment is often the dividing line between projects that hit production and those that die as POCs. It’s not about perfect alignment—that’s a pipe dream with tech this new—but real alignment on the risk profile required to go live.


Moving from deterministic to probabilistic is not a problem you can dump solely on the IT team. It impacts data, validation methods, observability, and governance. It requires business units to embrace a degree of uncertainty, and engineering teams to make that uncertainty manageable through new tooling.

Teams that grasp the scale of this shift are building the right muscle memory today. They know that a successful voice agent project isn’t one where everything works perfectly on day one; it’s a project where you can instantly detect what is breaking and fix it even faster. That capability, far more than the specific foundational model you choose, is what will drive competitive advantage in the years ahead.

By David Guede

Partner Data, IA et Agentique

1 / 1
charles cortes

Product page optimization: how to succeed in GEO in 2026

How to adapt your PDPs to LLMs? Product page optimization for GEO and answer engines maximizes your conversions in 2026.

AI and CDPs: Transforming Your Marketing Automation and Data

How are AI and autonomous agents revolutionizing CDPs? Boost your marketing agility and data-driven performance with a smart activation strategy.

Data Foundations: How to Structure Your Data for LLMs

How do you structure data and APIs for LLMs?

AI vocal agent: how to guarantee its reliability through QA

How to guarantee the reliability of an AI vocal agent in production? Apply demanding QA: real-world testing, security, and monitoring.

AI vocal agent: how to guarantee its reliability through QA

charles cortes

Product feed optimization for Google and LLMs

How to transform your product feed into a commerce API? Use AI to enrich your data and increase your ROAS on an industrial scale.

AI vocal agent and customer relationship: the Sharlie case by Converteo

How to secure the customer experience with a probabilistic AI vocal agent? Discover the multi-agent architecture and semantic monitoring for Sosh.

Agentic AI Projects: From Deterministic to Probabilistic

How do you ensure your agentic AI project succeeds? Adopt a probabilistic approach to data governance, validation, and monitoring.
De l’IA “boîte noire” à l’IA “responsable par design

Google I/O 2026 announcements: what you need to remember | Converteo

How are the Google I/O 2026 announcements transforming e-commerce? Analysis of the agentic revolution with Gemini Spark and Antigravity.

AI Voice Agents and Customer Experience: The Sharlie Case by Converteo

Raphael Fétique

Agentic AI in the Enterprise: The New Performance Standard

How do you integrate agentic AI in the enterprise to automate workflows? Discover the strategies to deploy high-performing autonomous agents.

Meet René, LACOSTE’s Agentic AI for Customer Elegance

How do you successfully deploy agentic AI in retail?