Follow

Keep up to date with the latest Stelia advancements

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Why the limiting factor in production-grade AI isn’t the model, it’s the infrastructure

In Conversation with our VP of Platform Engineering on why platform decisions determine whether production AI succeeds or fails

Conversations around production-grade AI continue to gravitate toward models, applications, and use cases. While the infrastructure that sits beneath all of it – the platform decisions that ultimately determine whether any of it works at scale – rarely gets the same attention.

In this instalment of our In Conversation series, we speak with Cao Hoang, Stelia’s VP of Platform Engineering, about why that gap exists, why it matters more than most organisations realise, and what the teams getting it right are doing differently from the start.


Our first question for Cao was this: there’s a tendency across the industry to treat infrastructure as an operational concern rather than a strategic one – do you think that’s changing, and what needs to shift for it to change faster?

It is changing, though often out of painful necessity rather than proactive strategy.

In the traditional SaaS era, infrastructure became highly commoditised – it was the “plumbing” you just needed to keep the lights on. But with AI, infrastructure is the product’s ceiling. Your compute constraints, network latency, and unit economics directly dictate what you can offer commercially.

Right now, an organisation will spend months debating their multi-model routing strategy or designing compound AI systems, but will treat the platform it all runs on as a secondary checklist item. For this to shift faster, technical leaders need to start translating infrastructure metrics directly into business outcomes – a practice increasingly known as AI FinOps. If a brilliant architecture drives your cost-per-token too high or introduces unacceptable latency, it’s commercially unviable. Once we explicitly tie infrastructure capabilities to gross margins and time to market, it instantly becomes a boardroom strategic priority.


It is also one of the clearest indicators of how far along an organisation actually is in its AI journey – not the sophistication of the models it’s evaluating, but whether the conversation about infrastructure is happening at a strategic level at all. The teams that have made that shift tend to move faster, not slower, because the platform beneath them is built to support speed rather than constrain it.

It is a framing shift with significant implications for how organisations approach their earliest platform decisions, and as Cao explains, those early decisions carry more weight than most teams anticipate.


Another important question for Cao was: when organisations are making infrastructure decisions early in their AI journey, what advice would you give them about the choices that will matter most later?

The key is to strictly distinguish between decisions that are reversible and those that compound. Swapping out a specific model provider or changing your agentic orchestration framework is a two-way door. But decisions around data architecture, network topology, and orchestration lock-in compound massively over time.

Early on, the “easy” option is tempting. You might choose a highly abstracted, proprietary managed service to get a prototype out by Friday. But data in AI has massive gravity, alongside strict modern data sovereignty and residency requirements like the EU AI Act.

As your scale multiplies, moving petabytes of vector data to a more cost-effective or regionally compliant environment becomes a logistical nightmare.

Design for portability from day one. Invest in open standards, containerise your workloads using standard primitives like Kubernetes or Ray, and own your orchestration layer. It takes slightly more effort upfront, but it prevents multi-year migrations down the line.


This is advice that runs counter to the instinct most teams have under pressure – to reach for the fastest available option and worry about the consequences later.

But the organisations that resist that instinct and invest in portability, open standards, and architectural ownership from the start are the ones that find themselves able to move with the market rather than being constrained by decisions made by a previous pressure.

This is a perspective that sits at the heart of how Stelia approaches engineering of the full AI stack – designing for longevity rather than convenience, and making the architectural choices that may take slightly more forethought upfront but create the conditions for AI to scale reliably and continuously.

The strategic case for investing in infrastructure foundations is clear, but as Cao explains in the next instalment, knowing what to prioritise is only part of the challenge. The harder question is what happens when the “move fast” culture meets the reality of building something that has to last.

Stelia AI OS