Master Speech AI APIs to enhance your applications! Hands-On-Lab Mathieu Busquet & Eléa Petton Thusday 08th Oct
A presentation at Devoxx in October 2024 in Antwerp, Belgium by Eléa PETTON
Master Speech AI APIs to enhance your applications! Hands-On-Lab Mathieu Busquet & Eléa Petton Thusday 08th Oct
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References SPEECH AI IS EVERYWHERE TODAY “Hey Siri” “Hey Cortana” Master Speech AI APIs to enhance your applications! “Alexa” “Ok Google”
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References ABOUT US Picture Picture Éléa Petton Mathieu BUSQUET Machine Learning Engineer AI Solutions Team OVHcloud Master Speech AI APIs to enhance your applications! Machine Learning Engineer AI Solutions Team OVHcloud
Introduction Speech AI concepts Hands-On-Lab AGENDA 01 Introduction 02 Speech AI concepts 03 Hands-On-Lab 04 Challenges of Speech AI 05 Conclusion 06 References Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References
01 INTRODUCTION Master Speech AI APIs to enhance your applications!
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion INTRODUCTION How to consider Speech AI in AI ecosystem? AI NLP MACHINE LEARNING DEEP LEARNING CONVERSATIONAL AI SPEECH AI Master Speech AI APIs to enhance your applications! References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References HOW DOES IT WORK? Virtual Assistants in practice SPEECH AI SYSTEM AUDIO REQUEST by human DIALOG SYSTEM CONVERT audio into text PREPROCESS audio Master Speech AI APIs to enhance your applications! SPEECH AI SYSTEM GENERATE text answer SEND text data to language model AUDIO ANSWER by AI models CONVERT text into audio
02 SPEECH AI CONCEPTS Master Speech AI APIs to enhance your applications!
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References SPEECH AI CONCEPTS 3 main tasks ASR Automatic Speech Recognition Master Speech AI APIs to enhance your applications! NMT Neural Machine Translation TTS Text To Speech
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References SPEECH AI CONCEPTS 3 main tasks ASR Automatic Speech Recognition Master Speech AI APIs to enhance your applications! NMT Neural Machine Translation TTS Text To Speech
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References AUTOMATIC SPEECH RECOGNITION Transcribe human voice into text Speech Speech to Text model Master Speech AI APIs to enhance your applications! Text (transcript)
Introduction How does it work? Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References ASR PIPELINE Transcribe human voice into text Hello, how are you? Feature Extraction Acoustic Model Speech /h/ /ə/ /l/ /əʊ/ /h/ /aʊ/ /ɑː/ /j/ /uː/ Spectrogram (Freq. vs Time) Punctuation & Capitalization Model Hello, how are you? hello how are you hello, now are you Master Speech AI APIs to enhance your applications! Decoder hello how are your hello how are you hello now are you Language Model
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References SPEECH AI CONCEPTS 3 main tasks ASR Automatic Speech Recognition Master Speech AI APIs to enhance your applications! NMT Neural Machine Translation TTS Text To Speech
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References NEURAL MACHINE TRANSLATION Translate text into an other language Welcome to Devoxx Belgium! English Text Master Speech AI APIs to enhance your applications! Willkommen bei Devoxx Belgien! AI model Other language equivalent
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References NMT PIPELINE Translate text into an other language Welcome to Devoxx Belgium English Text Tokenization “Welcome”, “to”, “Devoxx”, “Belgium” NMT Encoder [0,93 0,2 0,3] NMT Decoder Embeddings (token representation in its specific context) “Willkommen, “bei”, “Devoxx”, “Belgien” Detokenization Willkommen bei Devoxx Belgien Other language equivalent Master Speech AI APIs to enhance your applications!
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References SPEECH AI CONCEPTS 3 main tasks ASR Automatic Speech Recognition Master Speech AI APIs to enhance your applications! NMT Neural Machine Translation TTS Text To Speech
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References TEXT TO SPEECH Convert text into spoken words Text Text to Speech model Master Speech AI APIs to enhance your applications! Speech
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References TTS PIPELINE Convert text into spoken words Text Text Preprocessing Text Encoder Pitch and Duration Predictor for each phoneme Spectrogram generator VoCoder Speech Master Speech AI APIs to enhance your applications!
03 HANDS-ON-LAB Master Speech AI APIs to enhance your applications!
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Develop Gradio web app Focus on Speech AI key features Get started with Speech AI Develop Speech AI inference functions Master Speech AI APIs to enhance your applications! Deploy and test Video Translator app Containerize web app
Introduction Speech AI concepts Hands-On-Lab TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References
Introduction Speech AI concepts Hands-On-Lab DISCOVER AI ENPOINTS OVHcloud AI Endpoints website Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DISCOVER AI ENPOINTS Catalog of AI models “ A serverless platform providing access to advanced AI models, such as LLM, NLP, translation, speech recognition, or image recognition. ” ASSISTANT AUDIO ANALYSIS COMPUTER VISION EMBEDDINGS NLP TRANSLATION Master Speech AI APIs to enhance your applications!
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DISCOVER AI ENPOINTS Democratizing AI “ Improve your applications with AI Endpoints ” DESIGNED FOR DEVELOPERS COMMITTED TO PRIVACY CURATED LIST OF AI MODELS LOCK-IN FREE TECHNOLOGY with complete documentation, simple APIs, and code examples we do not store and do not share your data during or after the use of the model making available the latest models, optimized for maximum performance and accuracy thanks to our transparency about the AI models used, clients can implement these models on their own infrastructure or on other cloud services Master Speech AI APIs to enhance your applications!
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DISCOVER AI ENPOINTS Concerning Speech AI Endpoints USER NVIDIA OVHcloud request inference NVIDIA RIVA MODELS optimization exportation storage NGC response OBJECT STORAGE MODELS SERVER Master Speech AI APIs to enhance your applications! AI ENDPOINTS SPEECH AI APIs INTERNET CUSTOMER 27
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DISCOVER AI ENPOINTS Concerning Speech AI Endpoints Master Speech AI APIs to enhance your applications! 27
Introduction Speech AI concepts Hands-On-Lab TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Get started with Speech AI Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References GET STARTED WITH SPEECH AI How to use ASR, NMT and TTS APIs easily? Master Speech AI APIs to enhance your applications! 27
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References GET STARTED WITH SPEECH AI Connect your Speech AI Endpoints to each other! Input audio ASR endpoint AS R r e s ult esult r T NMT endpoint Target language en-US Master Speech AI APIs to enhance your applications! French transcription translated in english T result NM Voice gender Female TTS endpoint Audio generated in english with a happy woman voice TT S resu lt NM Source language fr-FR Voice emotion Happy
Introduction Speech AI concepts Hands-On-Lab TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Focus on Speech AI key features Get started with Speech AI Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References FOCUS ON SPEECH AI KEY FEATURES Enhance your Speech AI Endpoints with cutting-edge features GENERATE SRT file Master Speech AI APIs to enhance your applications! KEEP SILENCE during translation SUPERIMPOSE audio on video
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References FOCUS ON SPEECH AI KEY FEATURES Master Speech AI API endpoints by developing key features Master Speech AI APIs to enhance your applications! 27
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Focus on Speech AI key features Get started with Speech AI Develop Speech AI inference functions Master Speech AI APIs to enhance your applications! Conclusion References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DEVELOP SPEECH AI INFERENCE FUNCTIONS Develop ASR, NMT and TTS scripts asr.py AUDIO INPUT wav file in FR TRANSCRIBE audio into text Master Speech AI APIs to enhance your applications! nmt.py TRANSLATE text tts.py SYNTHETIZE text into spoken words AUDIO OUTPUT wav file in EN
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DEVELOP SPEECH AI INFERENCE FUNCTIONS Use Speech AI Endpoints in Python scripts Master Speech AI APIs to enhance your applications! 27
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Develop Gradio web app Focus on Speech AI key features Get started with Speech AI Develop Speech AI inference functions Master Speech AI APIs to enhance your applications! Conclusion References
Introduction Speech AI concepts Hands-On-Lab DEVELOP GRADIO WEB APP Gradio web app overview Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI DEVELOP GRADIO WEB APP Define Gradio app features Upload a video Transcribe the audio part into text Subtitle video in any language Dub the video in another language Choose the gender of the dubbing voice Download resulting video Master Speech AI APIs to enhance your applications! Conclusion References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DEVELOP GRADIO WEB APP Connect ASR, NMT and TTS inside a Gradio app COMMON FUNCTIONS asr_transcription .wav video_to_audio_wav GRADIO INTERFACE .mp4 video_input .mp4 text asr.py USER audio_to_text nmt_translation translation_mode nmt.py text_to_text_translation text translation_mode==”subtitles” tts_transcription tts.py translation_mode==”voice_dubbing” generate_str_file .wav .mp4 video_output superimpose_audio_on_video utils.py Master Speech AI APIs to enhance your applications! main.py .mp4
Introduction Speech AI concepts Hands-On-Lab DEVELOP GRADIO WEB APP Develop your Gradio app in Python Challenges of Speech AI Conclusion References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Develop Gradio web app Focus on Speech AI key features Get started with Speech AI Develop Speech AI inference functions Master Speech AI APIs to enhance your applications! Containerize web app References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI CONTAINERIZE WEB APP Create the Dockerfile FROM python:3.10 WORKDIR /workspace ADD . /workspace RUN apt-get update && apt-get install -y ffmpeg libsndfile1-dev RUN pip install -r requirements.txt RUN chown -R 42420:42420 /workspace ENV HOME=/workspace CMD [ “python3” , “/workspace/main.py” ] Master Speech AI APIs to enhance your applications! Dockerfile Conclusion References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Develop Gradio web app Focus on Speech AI key features Get started with Speech AI Develop Speech AI inference functions Master Speech AI APIs to enhance your applications! Deploy and test Video Translator app Containerize web app
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References CONTAINERIZE WEB APP Define your infrastructure requirements COMPUTE RESOURCES 4 CPUs COMPUTE RESOURCES HIGH AVAILABILITY API scalable on the fly Custom number of replicas HIGH AVAILABILITY SECURE ACCESS Master Speech AI APIs to enhance your applications! SECURE ACCESS Private mode Personal token access
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion DEPLOY AND TEST VIDEO TRANSLATOR APP Introduction to OVHcloud AI Deploy solution 01 CONTAINER AS A SERVICE Customer provides a docker container through Docker registry 02 COMPUTE RESOURCES Container is running in the cloud over GPU (or CPU) 03 BILLING METHOD Customer is billed per minute used 04 API/APP DEPLOYMENT Industrial way of deploying stateless API(s) 05 SCALING STRATEGY Scalable on the fly Master Speech AI APIs to enhance your applications! References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DEPLOY AND TEST VIDEO TRANSLATOR APP Choose OVHcloud AI Deploy to deploy the solution AI ENPOINTS ASR NMT TTS AI DEPLOY input audio translated text generated audio EXTRACT AUDIO (.WAV) CONVERT SUBTITLES (.SRT) MERGE AUDIO AND VIDEO (.MP4) Managed AI Endpoints Custom Solution CHOOSE API ENDPOINTS BASED ON YOUR NEEDS DEPLOY A CUSTOM SOLUTION AS A GRADIO APP IN THE CLOUD Master Speech AI APIs to enhance your applications! vidéo d’entrée resulting video with accurately translated subtitles and/or voice dubbing in the target language USER
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI DEPLOY AND TEST VIDEO TRANSLATOR APP Launch the app deployment in the cloud Conclusion References
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion DEPLOY AND TEST VIDEO TRANSLATOR APP Live testing time! https://bit.ly/video-translator-devoxx-be Master Speech AI APIs to enhance your applications! References
04 CHALLENGES OF SPEECH AI Master Speech AI APIs to enhance your applications!
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References CHALLENGES OF SPEECH AI LANGUAGE AND ACCENT NOISE, BACKGROUND SOUNDS AND MICROPHONE QUALITY i SPECIFIC VOCABULARY AND PERSONAL INFORMATION EMOTIONS, AMBIGUITY, TONE SPEECH OVERLAPS (ALSO FOR DIARIZATION) REAL-TIME TRANSCRIPTION Master Speech AI APIs to enhance your applications!
05 CONCLUSION Master Speech AI APIs to enhance your applications!
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References CONCLUSION AI Endpoints and the Video Translator in summary Speech AI models OVHcloud AI Endpoints speech and translation AI models can work together real-time conversational AI accurate models in several languages transcribe any audio into text transform text into spoken words cutting-edge GenAI and ML models simple, secured and ready-touse AI APIs easy-integration for nextgeneration solutions Master Speech AI APIs to enhance your applications! OVHcloud AI Deploy turnkey serverless solution Container As A Service platform the best of GPUs (H100, A100, L4, L40S, V100S) per-minute billing Going further build your own solution based on Open-Source models create an Audio Virtual Assistant in less than 100 lines of code transcribe and summarize any meetings using diarization Be creative!
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI CONCLUSION Find out Hands-On-Lab resources https://github.com/eleapttn/workshopmastering-speech-ai.git Master Speech AI APIs to enhance your applications! Conclusion References
06 REFERENCES Master Speech AI APIs to enhance your applications!
Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References REFERENCES AI Endpoints and the Video Translator in summary AI Endpoints https://endpoints.ai.cloud.ovh.net/ GitHub repositories https://github.com/eleapttn/workshop-mastering-speech-ai.git https://github.com/ovh/public-cloud-examples/tree/main/ai/ai-endpoints Blog articles https://blog.ovhcloud.com/master-speech-ai-and-build-your-own-video-translator-app-with-ai-endpoints/ https://blog.ovhcloud.com/how-to-build-a-speech-to-text-application-with-python-1-3/ https://blog.ovhcloud.com/build-a-powerful-audio-virtual-assistant-with-ai-endpoints/ https://blog.ovhcloud.com/create-audio-summarizer-assistant-with-ai-endpoints/ Master Speech AI APIs to enhance your applications!
THANK YOU! Master Speech AI APIs to enhance your applications!