NVIDIA’s DGX Cloud is reshaping how enterprises and researchers tackle artificial intelligence, offering a fully managed, cloud-based platform that puts cutting-edge AI supercomputing within reach. With its latest expansions across major cloud providers and high-profile partnerships driving breakthroughs in fields like autonomous vehicles and biomedicine, DGX Cloud is proving itself a cornerstone of the AI revolution, one GPU cluster at a time.
Since its debut in March 2023, DGX Cloud has grown into a powerhouse, delivering serverless access to NVIDIA’s top-tier hardware, including H100 GPU clusters optimized for multi-node AI training. Priced at $36,999 per instance per month, it pairs this muscle with the NVIDIA AI Enterprise software suite and direct support from the company’s experts, all accessible via a browser interface. The result? A scalable, turnkey solution that slashes the time and complexity of building AI models from scratch.
Recent Developments
Recent moves underscore its momentum. In December 2024, NVIDIA brought DGX Cloud to AWS Marketplace Private Offers, joining existing integrations with Microsoft Azure, Google Cloud, and Oracle Cloud Infrastructure. That same month, the $700 million acquisition of Run:ai, a specialist in AI workload orchestration, signaled plans to streamline resource management for DGX Cloud users.
Early 2025 saw Uber tap the platform to accelerate autonomous vehicle AI alongside its Cosmos system, while Japan’s SoftBank unveiled a Blackwell-powered DGX SuperPOD for local innovation.
The platform’s real-world impact is striking. Amgen, using NVIDIA’s BioNeMo on DGX Cloud, trained its Evo 2 protein model in under a month, a feat that once took far longer. Denmark’s Gefion supercomputer, powered by NVIDIA tech, is fueling public AI research, and the U.S. National Science Foundation is leveraging DGX Cloud to build shared infrastructure for responsible AI development. “It’s about speeding up discovery,” NVIDIA stated. “From drug design to self-driving tech, we’re giving teams the tools to move faster.”
Enterprise Adoption
Flexibility is a big draw. Enterprises can rent DGX Cloud clusters monthly, scaling compute power as needed without the overhead of on-premise setups. The addition of Run:ai’s tech promises even smarter GPU allocation, boosting efficiency. Meanwhile, NVIDIA’s CES 2025 reveal of Project DIGITS, a compact $3,000 AI system offers a local prototyping bridge to DGX Cloud’s cloud-scale training, broadening its appeal.
The multi-cloud approach is another win. By partnering with AWS, Azure, and others, NVIDIA ensures DGX Cloud slots into existing workflows, complementing rather than competing with hyperscalers’ offerings. “We’re guests in their ecosystems,” the spokesperson said, pointing to Project Ceiba—an AWS-hosted AI supercomputer driving NVIDIA’s own R&D—as a model of collaboration.
Still, DGX Cloud’s rise hasn’t gone unnoticed. Its deep integration across cloud platforms and dominance in AI hardware (NVIDIA GPUs power 80% of training workloads, per IDC) highlight its outsized role. Some wonder if smaller players can keep pace with its scale and support, though NVIDIA frames this as a tide lifting all boats. “Our goal is enabling breakthroughs, not gatekeeping them”.
For now, DGX Cloud is delivering on that promise. Whether it’s Cerence training automotive AI on Azure or Block Inc. tapping a DGX SuperPOD for enterprise solutions, the platform’s blend of power, simplicity, and expertise is accelerating AI’s march forward. As adoption grows, it’s clear NVIDIA’s cloud play is less about controversy and more about capability, unlocking a future where AI innovation knows fewer bounds.
DGX Cloud By The Numbers
- Over half of the Fortune 100 companies use DGX systems, including both on-premise and cloud.
DGX Cloud Overview
Key Features
- Accelerated Computing Clusters: DGX Cloud delivers clusters of NVIDIA DGX systems, featuring the latest NVIDIA GPUs (like the H100 or A100), optimized for multi-node AI training workloads. This setup enables enterprises to handle complex models, including those with hundreds of billions of parameters.
- Cloud Flexibility: It operates on leading cloud providers such as Microsoft Azure, Google Cloud, Oracle Cloud Infrastructure (OCI), and, as of early 2025, Amazon Web Services (AWS). This multi-cloud approach allows users to choose their preferred provider and supports hybrid strategies.
- Fully Managed Service: NVIDIA handles the setup, configuration, and management of the infrastructure, ensuring “day-one productivity.” This reduces the time and effort enterprises spend on standing up AI systems, letting developers focus on modeling rather than IT logistics.
- NVIDIA AI Software: The platform includes the NVIDIA AI Enterprise suite, which provides optimized frameworks, pre-trained models, and tools like NVIDIA Base Command for workload management. This software layer accelerates data science and AI development pipelines.
- Expert Support: Users gain direct access to NVIDIA AI experts, offering guidance on everything from multi-node training to domain-specific optimizations (e.g., healthcare or automotive applications).
- Scalability and Flexibility: Enterprises can rent DGX Cloud clusters on a monthly basis, scaling resources up or down as needed. This eliminates the need for large upfront investments and provides near-limitless GPU access.
- User Interface: A simple browser-based interface allows users to schedule, monitor, and allocate computing resources efficiently.
Use Cases
DGX Cloud is tailored for industries and organizations needing robust AI capabilities:
- Drug Discovery: Companies like Amgen use DGX Cloud with NVIDIA BioNeMo to accelerate biologics discovery, training protein large language models (LLMs) in under a month.
- Automotive: Cerence leverages DGX Cloud on Azure to train automotive-specific LLMs for next-gen in-car computing platforms.
- Research and Innovation: The U.S. National Science Foundation employs DGX Cloud to build a shared infrastructure for responsible AI development.
- Enterprise AI: ServiceNow combines DGX Cloud with on-premise DGX systems for hybrid-cloud AI research, including LLMs and code generation.
Benefits
- Speed: Optimized for faster training times, it reduces the time-to-insight for AI projects.
- Cost Efficiency: Higher GPU utilization and a managed service model aim to maximize return on investment compared to traditional infrastructure-as-a-service approaches.
- Accessibility: Enterprises of all sizes can access state-of-the-art AI supercomputing via a web browser, democratizing advanced AI development.
Join Stelia at Nvidia GTC 2025
If you’re looking to avoid the fate of companies that clung to siloed architectures and missed the hyperscale boat, don’t repeat the past. Instead, discover how to build on the lessons learned by hyperscalers and apply them to the AI revolution.
NVIDIA #GTC2025 Conference Session Catalog
Attend our session, “Beyond Silos: Unlocking AI’s Full Potential with Petabit-Scale Data Mobility,” Tuesday, Mar 18 4:20 PM – 4:35 PM PDT and learn how interconnected, elastic infrastructures are transforming AI at every level. We’ll dissect:
- Why traditional cloud computing creates bottlenecks for AI
- How a petabit-scale platform accelerates data mobility
- The blueprint for building an interconnected compute model