Webinars
Efficient and Scalable AI Inference: Navigating the Challenges of Model Deployment at Scale
17 September 2025 at 12:00pm ET | Virtual Event
In machine learning, model deployment strategies are crucial for managing high scale infrastructure where the goal is to achieve efficient, scalable, and cost effective inference. This session will cover the challenges involved in deploying models, both small and large, in heterogeneous environments where models use varying amounts of resources like GPUs. The session will explore the complexities of orchestrating such a system, emphasizing that efficient GPU usage is a priority to prevent idling and wasted compute, while also serving inference requests quickly. The session will delve into suitable design approaches to navigate these complexities and aims to equip attendees with the knowledge to design their infrastructure for deploying and managing models effectively.
Guest Speaker:
Bhala Ranganathan is a seasoned software engineer and technical leader, specializing in cloud services and distributed systems with a strong focus on Data and AI infrastructure. He is currently a Principal Software Engineer and Tech Lead on the Azure Open AI service runtime team, where he works on large scale AI inferencing. Throughout his time at Microsoft, he has contributed to several impactful initiatives, including Azure Cosmos DB’s Multi-Master offering, and core components of the Azure AI platform such as Feature Store and Model-as-a-Service. Beyond his technical accomplishments, he is a tech author contributing technical articles on cloud services and infrastructure.
This webinar is organized by the IEEE AI Hardware & Infrastructure Working Group.