NVIDIA explains how NIMs will revolutionise generative AI deployment
At COMPUTEX 2024, NVIDIA founder and CEO Jensen Huang provided more details about NVIDIA NIMs, a new ecosystem of microservices designed to vastly improve AI model deployment. Now available to developers worldwide, they aim to expedite the process of integrating AI into various applications, including chatbots and digital assistants, by cutting deployment times from weeks to mere minutes.
The need for complex AI and generative AI (GenAI) applications has surged, with multiple models required for tasks such as text, image, video, and speech generation. NVIDIA NIMs address this complexity by providing a simple, standardised approach to embedding AI into applications. This innovation not only enhances developer productivity but also allows enterprises to maximise their existing infrastructure investments. For instance, running Meta Llama 3-8B on a NIM system generates up to three times more AI tokens on accelerated infrastructure compared to non-NIM systems, enabling more efficient use of computing resources.
"Every enterprise is looking to add generative AI to its operations, but not every enterprise has a dedicated team of AI researchers," said Jensen Huang. "Integrated into platforms everywhere, accessible to developers everywhere, running everywhere – NVIDIA NIM is helping the technology industry put generative AI within reach for every organisation."
The NIM microservices are pre-built to speed up model deployment for GPU-accelerated inference, incorporating NVIDIA software such as CUDA, Triton Inference Server, and TensorRT-LLM. Over 40 models from the NVIDIA community, including Databricks DBRX, Google's open model Gemma, Meta Llama 3, Microsoft Phi-3, Mistral Large, Mixtral 8x22B, and Snowflake Arctic, are available as NIM endpoints, making it easier for developers to access and use these resources.
Developers can now access NVIDIA NIM microservices for Meta Llama 3 models via the Hugging Face AI platform. This allows for easy deployment and execution of Llama 3 NIM with just a few clicks, using NVIDIA GPUs on their preferred cloud infrastructure. Enterprises can leverage NIM for generating text, images, video, speech, and digital humans. Additionally, NVIDIA BioNeMo NIM microservices are available for digital biology applications, aiding researchers in accelerating drug discovery by building novel protein structures.
Broad partner support
Over 150 technology partners, including companies like Cadence, Cloudera, Cohesity, DataStax, NetApp, Scale AI, and Synopsys, are integrating NIMs into their platforms to speed up the deployment of generative AI in domain-specific applications. Hugging Face has also announced it will offer NIM, starting with Meta Llama 3.
Hundreds of AI infrastructure partners, including Canonical, Red Hat, Nutanix, VMware, Amazon SageMaker, Microsoft Azure AI, Dataiku, and others, are embedding NIM into their platforms, enabling developers to build and deploy domain-specific generative AI applications with optimised inference. Leading system integrators and service providers like Accenture, Deloitte, Infosys, Quantiphi, SoftServe, Tata Consultancy Services, and Wipro have developed NIM competencies to assist enterprises in swiftly developing and executing production AI strategies.
Customer usage
Numerous healthcare companies are already deploying NIMs to enhance a range of applications, such as surgical planning, digital assistants, drug discovery, and clinical trial optimisation. The new ACE NIM is also available for developers to create and manage interactive, lifelike digital humans for customer service, telehealth, education, gaming, and entertainment applications.
Outside of healthcare, a number of corporations are already leveraging NIM for generative AI applications. Electronics giant Foxconn is using NIM to develop domain-specific large language models (LLMs) for smart manufacturing, smart cities, and smart electric vehicles. Pegatron is utilising NIM for Project TaME, a local LLM development initiative. Amdocs, a provider of software and services to communication and media companies, is using NIM to enhance its customer billing LLM, achieving significant cost, accuracy, and latency improvements.
Retail giant Lowes is employing NVIDIA NIM microservices to enhance customer and associate experiences with generative AI. ServiceNow is integrating NIM within its Now AI multimodal model to facilitate fast, scalable, and cost-effective LLM development for its clients, while Siemens is utilising NIM microservices for shop floor AI workloads and building an on-premises Industrial Copilot for Machine Operators.