NVIDIA launches Jarvis conversational AI framework
NVIDIA has announced the availability of its conversational AI framework, nicknamed Jarvis. This framework is a set of pre-trained deep learning and software tools that help developers to create interactive conversational AI services.
NVIDIA states the models can produce accurate speech recognition and language understanding, as well as language translation and text-to-speech capabilities.
These features can be developed through an end-to-end speech pipeline that can take less than 100 milliseconds. They can also be deployed in the cloud, data center, or the edge.
The models have been trained through phone conversations, web meetings, and streaming video content. This includes several million GPU hours on over 1 billion pages of text,
60,000 hours of speech data, and in different languages, accents, environments and lingos.
NVIDIA envisions a future in which conversational AI enables new language-based applications, improving interactions with humans and machines.
The company states, "It opens the door to the creation of such services as digital nurses to help monitor patients around the clock, relieving overloaded medical staff; online assistants to understand what consumers are looking for and recommend the best products, and real-time translations to improve cross-border workplace collaboration and enable viewers to enjoy live content in their own language.
According to NVIDIA founder and CEO Jensen Huang, conversational AI is like the 'ultimate' AI.
"Deep learning breakthroughs in speech recognition, language understanding and speech synthesis have enabled engaging cloud services.
He believes that NVIDIA Jarvis is now bringing state-of-the-art conversational AI to customers everywhere.
Developers can access Jarvis pre-trained models via NIVIDIA's NGC catalogue. Developers can also customise models via NVIDIA Transfer Learning Toolkit and add a 'few lines of code' without the need for dedicated AI expertise.
Further, new features will be released in the second quarter as part of the ongoing NVIDIA Jarvis open beta program.
This program has attracted customers including US telco provider T-Mobile, and Mozilla Common Voice. Mozilla Common Voice is an open source voice data pool, currently containing more than 9,000 hours of voice data in 60 languages, which is used to help train voice-enabled applications, devices, and services.
"We launched Common Voice to teach machines how real people speak in their unique languages, accents and speech patterns," explains Mozilla executive director Mark Surman.
"NVIDIA and Mozilla have a common vision of democratising voice technology — and ensuring that it reflects the rich diversity of people and voices that make up the internet.