Follow

Keep up to date with the latest Stelia advancements

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Stelia co-hosts vLLM & llm-d AI meetup alongside Red Hat and NVIDIA

Dave Hughes, Stelia CTO, shares lessons from taking vLLM from early evaluation to deployment – and the infrastructure decisions that matter most at scale.

On 10th June, Stelia co-hosted the first London edition of vLLM & llm-d’s AI Community meetup alongside NVIDIA and Red Hat at Sustainable Ventures. Stelia CTO Dave Hughes shared his experience taking vLLM from early evaluation all the way through to production – and the lessons learned along the way.

In his session, Dave drew on his experience evaluating and deploying vLLM at Stelia – from early prototypes and internal testing through to the considerations that emerge at production scale. vLLM was a confident early choice: its OpenAI-compatible API, active open-source community, and rich feature set made it the clear foundation to build on. The talk explored what it looks like to take that foundation and transform it into a reliable, multi-tenant inference service across distributed infrastructure.

Dave also gave a candid account of navigating the broader ecosystem – evaluating tools such as AI Bricks and llm-d alongside vLLM – and shared the practical lessons Stelia picked up on authentication, observability, and knowing when to keep things simple.

The evening also featured talks from Michael Goin, Eldar Kurtić and Stuart Battersby from Red Hat alongside Ganesh Kudleppanavar from NVIDIA – covering everything from the latest vLLM updates and speculative decoding to model weight delivery and AI safety at scale.

Watch the full talk here:

The conversation covered:

  • vLLM compatibility with Stelia: key considerations for vLLM as an inference engine for Stelia – and what to assess when building a managed service on top of it.
  • Moving from prototype to production: the orchestration, multi-tenancy and scaling considerations that emerge as you progress from prototype to production.
  • Evaluating the landscape: how Stelia evaluated the wider inference ecosystem.
  • Practical lessons: focusing on authentication, observability and scoping – the things that seem obvious but are often overlooked.

Watch the full talk to hear Dave’s perspective on what it takes to build a managed vLLM service – from the early testing of getting something working, through the trade-offs of evaluating the broader ecosystem, to where Stelia is headed next.

Stelia AI OS