Follow

Keep up to date with the latest Stelia advancements

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

What GTC 2026 told us about where AI infrastructure is really heading

From inference at scale to nation state infrastructure – six things that stood out at NVIDIA GTC and what they mean for enterprise operationalising AI.

Known as one of the most significant gatherings in the AI calendar, this year’s NVIDIA GTC was, by any measure, a packed event; 30,000 attendees, four days, and a programme that ranged from hardware advancements to autonomous vehicles and enterprise software.

Having spent the week on the ground, the overall picture was one of a market maturing, moving from raw compute and brute-force scaling toward efficiency, integration, and accessibility, as conversations this year became more substantive and operationally grounded.

Here are six observations that stood out, and what I think they reflect about where the industry is heading.

1. Rack-scale thinking and hardware maturity

Notably, what emerged across the week was a clearer focus on integrated, rack-scale infrastructure; a shift away from the mentality of assembling commodity components and hoping the system holds together under load.

NVIDIA’s hardware announcements reflected some genuine design progress, particularly around power delivery at the PCB level, an area that has historically been an afterthought in GPU infrastructure. Vera Rubin’s positioning as a full-stack platform spanning multiple chips and rack-scale infrastructure serves, at the very least, as an acknowledgement that the system as a whole matters, not just the silicon on which it operates.

Application-specific hardware was also more prominent – Groq’s LPU being the most visible example. Whether these approaches prove out at scale remains to be seen, but the direction of travel, toward purpose-built infrastructure designed around specific AI workload demands rather than generalised compute, is the right one.

2. Efficiency over brute force is becoming a design principle

    For years, the dominant approach to AI infrastructure has followed a familiar pattern: more power, more silicon, more spend. What came through at GTC this year was a growing recognition that this approach has limits, both practically and economically, and that efficiency is beginning to be taken seriously as a design principle rather than an afterthought, a shift visible in conversations across the event as much as in the announcements themselves.

    The hardware announcements reflected this. The performance-per-watt improvements in the latest generation of silicon are significant; and when inference has overtaken training as the dominant AI workload, running continuously at scale, the economics of brute force become increasingly difficult to justify.

    That said, it would be premature to suggest the industry has moved on entirely. The appetite for raw compute remains significant. But the conversation around efficiency – at the silicon, system, and workload levels – was more prominent this year than it has been in the past, and that is a direction worth paying attention to.

    3. Developer accessibility is being taken seriously

    Something that came through clearly in Jensen’s keynote was a deliberate push to lower the barrier between developers and NVIDIA hardware, not just at the enterprise level, but earlier in the development cycle.

    Jensen himself made the point that most of us were NVIDIA customers before we ever bought a product ourselves; our parents paid for NVIDIA in our gaming PCs. The implication being that the same logic applies now: get developers interfacing with the technology as early as possible, before purchasing decisions are made.

    NemoClaw is the most concrete expression of that intent – a reference stack built on OpenClaw that allows developers to rapidly stand up an AI agent, without the overhead of enterprise-grade configuration from day one. This approach remains in its early days, but one thing is certain: getting capable tooling into developers’ hands early, before architectural decisions are locked in, is how ecosystems can effectively be built and sustained. NVIDIA is executing on this deliberately, and developer experience is rising in its priorities.

    4. Beyond training: inference, physical AI, and robotics dominated the agenda

    Perhaps the most material shift on show at GTC this year, compared with last year, was the extent to which the conversation has moved beyond training.

    Inference, physical AI, and robotics took up a significant portion of the agenda, which broadly reflects where the real operational challenges now sit for enterprises attempting to run AI at scale.

    On the inference side, the framing was unambiguous: that inference has overtaken training as the dominant AI workload. The hardware and software announcements followed that logic. Vera Rubin’s architecture – designed around NVLink 6 bandwidth and token throughput at a scale that previous generations couldn’t approach – is explicitly built for the demands of continuous inference rather than periodic training runs, and speaks directly to where the real infrastructure demand now sits.

    Physical AI was similarly prominent, with over 110 robots on the show floor, autonomous-vehicle partnerships announced among multiple manufacturers, and significant investment in digital-twin and simulation tooling. Some of it felt closer to production than others. But the architectural implications of running AI in real-world physical environments – the latency constraints, the edge requirements, the system complexity – are substantial, and it was encouraging to see that being taken seriously rather than treated as a future consideration.

    5. The emergence of the AI grid

    The growing demands of inference and physical AI are pulling a previously peripheral conversation into sharp focus: the framing of AI as nation-state infrastructure.

    The analogy being drawn, increasingly explicitly, is to electricity and telecoms grids: distributed, interconnected systems designed to deliver a utility reliably at scale, rather than centralised facilities serving whoever can afford to access them.

    NVIDIA introduced an AI Grid reference design at GTC, with major telcos including AT&T, T-Mobile, Comcast, and Spectrum announcing deployments; geographically distributed AI infrastructure embedded across existing network footprints to run inference closer to users, devices, and data.

    With real-time AI applications demanding latency that centralised architectures cannot reliably deliver, and the economics of routing every inference request to a hyperscale facility becoming increasingly difficult to justify at scale, distributing compute across existing network infrastructure offers an effective solution to both problems.

    But more broadly, this momentum signals that AI infrastructure is beginning to be treated as critical national infrastructure, not just by technologists, but by governments and national operators making long-term capital commitments. This direction – toward distributed sovereign, grid-scale AI infrastructure – will shape how enterprises think about where their AI workloads run and under what conditions.

    6. Full-stack thinking is no longer a fringe argument

    The thread running through all of the above – rack-scale design, efficiency, developer accessibility, and the shift beyond training toward distributed, national-scale AI infrastructure – is that the industry is beginning to realise AI infrastructure must be treated as a system, rather than a collection of components to be optimised in isolation.

    The organisations making meaningful progress with AI are the ones thinking across the full stack, not just at the layer most visible to them.

    For us at Stelia, this reflects precisely the challenge we have been working through with enterprise customers – that the gap between what AI promises and what production environments actually demand is rarely a model problem but a systems one. The infrastructure beneath the models, how data moves, how compute is orchestrated, and how inference is served at scale, determines whether AI capabilities translate into operational reality.

    Seeing industry developments begin to reflect this more prominently is welcome progress. The harder work – helping enterprises architect for it in practice – is what comes next.

    A market in motion

    Each year, GTC acts as a useful moment to take stock of where the industry is and where it is heading. This year, the market is asking better questions than it was twelve months ago, as the conversation moves from what AI can do to what it takes to run it reliably at scale. The answers to these questions, hinging on the architectural decisions being made now across the full stack, and the infrastructure commitments that follow, will separate the organisations that operationalise AI successfully from those who remain stuck in experimentation.

    Enterprise AI 2025 Report