Master Speech AI APIs to enhance your applications!

A presentation at Devoxx in October 2024 in Antwerp, Belgium by Eléa PETTON

Slide 1

Slide 1

Master Speech AI APIs to enhance your applications! Hands-On-Lab Mathieu Busquet & Eléa Petton Thusday 08th Oct

Slide 2

Slide 2

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References SPEECH AI IS EVERYWHERE TODAY “Hey Siri” “Hey Cortana” Master Speech AI APIs to enhance your applications! “Alexa” “Ok Google”

Slide 3

Slide 3

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References ABOUT US Picture Picture Éléa Petton Mathieu BUSQUET Machine Learning Engineer AI Solutions Team OVHcloud Master Speech AI APIs to enhance your applications! Machine Learning Engineer AI Solutions Team OVHcloud

Slide 4

Slide 4

Introduction Speech AI concepts Hands-On-Lab AGENDA 01 Introduction 02 Speech AI concepts 03 Hands-On-Lab 04 Challenges of Speech AI 05 Conclusion 06 References Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References

Slide 5

Slide 5

01 INTRODUCTION Master Speech AI APIs to enhance your applications!

Slide 6

Slide 6

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion INTRODUCTION How to consider Speech AI in AI ecosystem? AI NLP MACHINE LEARNING DEEP LEARNING CONVERSATIONAL AI SPEECH AI Master Speech AI APIs to enhance your applications! References

Slide 7

Slide 7

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References HOW DOES IT WORK? Virtual Assistants in practice SPEECH AI SYSTEM AUDIO REQUEST by human DIALOG SYSTEM CONVERT audio into text PREPROCESS audio Master Speech AI APIs to enhance your applications! SPEECH AI SYSTEM GENERATE text answer SEND text data to language model AUDIO ANSWER by AI models CONVERT text into audio

Slide 8

Slide 8

02 SPEECH AI CONCEPTS Master Speech AI APIs to enhance your applications!

Slide 9

Slide 9

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References SPEECH AI CONCEPTS 3 main tasks ASR Automatic Speech Recognition Master Speech AI APIs to enhance your applications! NMT Neural Machine Translation TTS Text To Speech

Slide 10

Slide 10

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References SPEECH AI CONCEPTS 3 main tasks ASR Automatic Speech Recognition Master Speech AI APIs to enhance your applications! NMT Neural Machine Translation TTS Text To Speech

Slide 11

Slide 11

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References AUTOMATIC SPEECH RECOGNITION Transcribe human voice into text Speech Speech to Text model Master Speech AI APIs to enhance your applications! Text (transcript)

Slide 12

Slide 12

Introduction How does it work? Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References ASR PIPELINE Transcribe human voice into text Hello, how are you? Feature Extraction Acoustic Model Speech /h/ /ə/ /l/ /əʊ/ /h/ /aʊ/ /ɑː/ /j/ /uː/ Spectrogram (Freq. vs Time) Punctuation & Capitalization Model Hello, how are you? hello how are you hello, now are you Master Speech AI APIs to enhance your applications! Decoder hello how are your hello how are you hello now are you Language Model

Slide 13

Slide 13

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References SPEECH AI CONCEPTS 3 main tasks ASR Automatic Speech Recognition Master Speech AI APIs to enhance your applications! NMT Neural Machine Translation TTS Text To Speech

Slide 14

Slide 14

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References NEURAL MACHINE TRANSLATION Translate text into an other language Welcome to Devoxx Belgium! English Text Master Speech AI APIs to enhance your applications! Willkommen bei Devoxx Belgien! AI model Other language equivalent

Slide 15

Slide 15

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References NMT PIPELINE Translate text into an other language Welcome to Devoxx Belgium English Text Tokenization “Welcome”, “to”, “Devoxx”, “Belgium” NMT Encoder [0,93 0,2 0,3] NMT Decoder Embeddings (token representation in its specific context) “Willkommen, “bei”, “Devoxx”, “Belgien” Detokenization Willkommen bei Devoxx Belgien Other language equivalent Master Speech AI APIs to enhance your applications!

Slide 16

Slide 16

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References SPEECH AI CONCEPTS 3 main tasks ASR Automatic Speech Recognition Master Speech AI APIs to enhance your applications! NMT Neural Machine Translation TTS Text To Speech

Slide 17

Slide 17

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References TEXT TO SPEECH Convert text into spoken words Text Text to Speech model Master Speech AI APIs to enhance your applications! Speech

Slide 18

Slide 18

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References TTS PIPELINE Convert text into spoken words Text Text Preprocessing Text Encoder Pitch and Duration Predictor for each phoneme Spectrogram generator VoCoder Speech Master Speech AI APIs to enhance your applications!

Slide 19

Slide 19

03 HANDS-ON-LAB Master Speech AI APIs to enhance your applications!

Slide 20

Slide 20

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Develop Gradio web app Focus on Speech AI key features Get started with Speech AI Develop Speech AI inference functions Master Speech AI APIs to enhance your applications! Deploy and test Video Translator app Containerize web app

Slide 21

Slide 21

Introduction Speech AI concepts Hands-On-Lab TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References

Slide 22

Slide 22

Introduction Speech AI concepts Hands-On-Lab DISCOVER AI ENPOINTS OVHcloud AI Endpoints website Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References

Slide 23

Slide 23

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DISCOVER AI ENPOINTS Catalog of AI models “ A serverless platform providing access to advanced AI models, such as LLM, NLP, translation, speech recognition, or image recognition. ” ASSISTANT AUDIO ANALYSIS COMPUTER VISION EMBEDDINGS NLP TRANSLATION Master Speech AI APIs to enhance your applications!

Slide 24

Slide 24

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DISCOVER AI ENPOINTS Democratizing AI “ Improve your applications with AI Endpoints ” DESIGNED FOR DEVELOPERS COMMITTED TO PRIVACY CURATED LIST OF AI MODELS LOCK-IN FREE TECHNOLOGY with complete documentation, simple APIs, and code examples we do not store and do not share your data during or after the use of the model making available the latest models, optimized for maximum performance and accuracy thanks to our transparency about the AI models used, clients can implement these models on their own infrastructure or on other cloud services Master Speech AI APIs to enhance your applications!

Slide 25

Slide 25

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DISCOVER AI ENPOINTS Concerning Speech AI Endpoints USER NVIDIA OVHcloud request inference NVIDIA RIVA MODELS optimization exportation storage NGC response OBJECT STORAGE MODELS SERVER Master Speech AI APIs to enhance your applications! AI ENDPOINTS SPEECH AI APIs INTERNET CUSTOMER 27

Slide 26

Slide 26

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DISCOVER AI ENPOINTS Concerning Speech AI Endpoints Master Speech AI APIs to enhance your applications! 27

Slide 27

Slide 27

Introduction Speech AI concepts Hands-On-Lab TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Get started with Speech AI Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References

Slide 28

Slide 28

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References GET STARTED WITH SPEECH AI How to use ASR, NMT and TTS APIs easily? Master Speech AI APIs to enhance your applications! 27

Slide 29

Slide 29

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References GET STARTED WITH SPEECH AI Connect your Speech AI Endpoints to each other! Input audio ASR endpoint AS R r e s ult esult r T NMT endpoint Target language en-US Master Speech AI APIs to enhance your applications! French transcription translated in english T result NM Voice gender Female TTS endpoint Audio generated in english with a happy woman voice TT S resu lt NM Source language fr-FR Voice emotion Happy

Slide 30

Slide 30

Introduction Speech AI concepts Hands-On-Lab TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Focus on Speech AI key features Get started with Speech AI Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References

Slide 31

Slide 31

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References FOCUS ON SPEECH AI KEY FEATURES Enhance your Speech AI Endpoints with cutting-edge features GENERATE SRT file Master Speech AI APIs to enhance your applications! KEEP SILENCE during translation SUPERIMPOSE audio on video

Slide 32

Slide 32

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References FOCUS ON SPEECH AI KEY FEATURES Master Speech AI API endpoints by developing key features Master Speech AI APIs to enhance your applications! 27

Slide 33

Slide 33

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Focus on Speech AI key features Get started with Speech AI Develop Speech AI inference functions Master Speech AI APIs to enhance your applications! Conclusion References

Slide 34

Slide 34

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DEVELOP SPEECH AI INFERENCE FUNCTIONS Develop ASR, NMT and TTS scripts asr.py AUDIO INPUT wav file in FR TRANSCRIBE audio into text Master Speech AI APIs to enhance your applications! nmt.py TRANSLATE text tts.py SYNTHETIZE text into spoken words AUDIO OUTPUT wav file in EN

Slide 35

Slide 35

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DEVELOP SPEECH AI INFERENCE FUNCTIONS Use Speech AI Endpoints in Python scripts Master Speech AI APIs to enhance your applications! 27

Slide 36

Slide 36

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Develop Gradio web app Focus on Speech AI key features Get started with Speech AI Develop Speech AI inference functions Master Speech AI APIs to enhance your applications! Conclusion References

Slide 37

Slide 37

Introduction Speech AI concepts Hands-On-Lab DEVELOP GRADIO WEB APP Gradio web app overview Master Speech AI APIs to enhance your applications! Challenges of Speech AI Conclusion References

Slide 38

Slide 38

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI DEVELOP GRADIO WEB APP Define Gradio app features Upload a video Transcribe the audio part into text Subtitle video in any language Dub the video in another language Choose the gender of the dubbing voice Download resulting video Master Speech AI APIs to enhance your applications! Conclusion References

Slide 39

Slide 39

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DEVELOP GRADIO WEB APP Connect ASR, NMT and TTS inside a Gradio app COMMON FUNCTIONS asr_transcription .wav video_to_audio_wav GRADIO INTERFACE .mp4 video_input .mp4 text asr.py USER audio_to_text nmt_translation translation_mode nmt.py text_to_text_translation text translation_mode==”subtitles” tts_transcription tts.py translation_mode==”voice_dubbing” generate_str_file .wav .mp4 video_output superimpose_audio_on_video utils.py Master Speech AI APIs to enhance your applications! main.py .mp4

Slide 40

Slide 40

Introduction Speech AI concepts Hands-On-Lab DEVELOP GRADIO WEB APP Develop your Gradio app in Python Challenges of Speech AI Conclusion References

Slide 41

Slide 41

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Develop Gradio web app Focus on Speech AI key features Get started with Speech AI Develop Speech AI inference functions Master Speech AI APIs to enhance your applications! Containerize web app References

Slide 42

Slide 42

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI CONTAINERIZE WEB APP Create the Dockerfile FROM python:3.10 WORKDIR /workspace ADD . /workspace RUN apt-get update && apt-get install -y ffmpeg libsndfile1-dev RUN pip install -r requirements.txt RUN chown -R 42420:42420 /workspace ENV HOME=/workspace CMD [ “python3” , “/workspace/main.py” ] Master Speech AI APIs to enhance your applications! Dockerfile Conclusion References

Slide 43

Slide 43

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References TABLE OF CONTENT Hands-On-Lab schedule Discover AI Endpoints Develop Gradio web app Focus on Speech AI key features Get started with Speech AI Develop Speech AI inference functions Master Speech AI APIs to enhance your applications! Deploy and test Video Translator app Containerize web app

Slide 44

Slide 44

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References CONTAINERIZE WEB APP Define your infrastructure requirements COMPUTE RESOURCES 4 CPUs COMPUTE RESOURCES HIGH AVAILABILITY API scalable on the fly Custom number of replicas HIGH AVAILABILITY SECURE ACCESS Master Speech AI APIs to enhance your applications! SECURE ACCESS Private mode Personal token access

Slide 45

Slide 45

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion DEPLOY AND TEST VIDEO TRANSLATOR APP Introduction to OVHcloud AI Deploy solution 01 CONTAINER AS A SERVICE Customer provides a docker container through Docker registry​ 02 COMPUTE RESOURCES Container is running in the cloud over GPU (or CPU)​ 03 BILLING METHOD Customer is billed per minute used 04 API/APP DEPLOYMENT Industrial way of deploying stateless API(s) 05 SCALING STRATEGY Scalable on the fly Master Speech AI APIs to enhance your applications! References

Slide 46

Slide 46

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References DEPLOY AND TEST VIDEO TRANSLATOR APP Choose OVHcloud AI Deploy to deploy the solution AI ENPOINTS ASR NMT TTS AI DEPLOY input audio translated text generated audio EXTRACT AUDIO (.WAV) CONVERT SUBTITLES (.SRT) MERGE AUDIO AND VIDEO (.MP4) Managed AI Endpoints Custom Solution CHOOSE API ENDPOINTS BASED ON YOUR NEEDS DEPLOY A CUSTOM SOLUTION AS A GRADIO APP IN THE CLOUD Master Speech AI APIs to enhance your applications! vidéo d’entrée resulting video with accurately translated subtitles and/or voice dubbing in the target language USER

Slide 47

Slide 47

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI DEPLOY AND TEST VIDEO TRANSLATOR APP Launch the app deployment in the cloud Conclusion References

Slide 48

Slide 48

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion DEPLOY AND TEST VIDEO TRANSLATOR APP Live testing time! https://bit.ly/video-translator-devoxx-be Master Speech AI APIs to enhance your applications! References

Slide 49

Slide 49

04 CHALLENGES OF SPEECH AI Master Speech AI APIs to enhance your applications!

Slide 50

Slide 50

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References CHALLENGES OF SPEECH AI LANGUAGE AND ACCENT NOISE, BACKGROUND SOUNDS AND MICROPHONE QUALITY i SPECIFIC VOCABULARY AND PERSONAL INFORMATION EMOTIONS, AMBIGUITY, TONE SPEECH OVERLAPS (ALSO FOR DIARIZATION) REAL-TIME TRANSCRIPTION Master Speech AI APIs to enhance your applications!

Slide 51

Slide 51

05 CONCLUSION Master Speech AI APIs to enhance your applications!

Slide 52

Slide 52

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References CONCLUSION AI Endpoints and the Video Translator in summary Speech AI models OVHcloud AI Endpoints speech and translation AI models can work together real-time conversational AI accurate models in several languages transcribe any audio into text transform text into spoken words cutting-edge GenAI and ML models simple, secured and ready-touse AI APIs easy-integration for nextgeneration solutions Master Speech AI APIs to enhance your applications! OVHcloud AI Deploy turnkey serverless solution Container As A Service platform the best of GPUs (H100, A100, L4, L40S, V100S) per-minute billing Going further build your own solution based on Open-Source models create an Audio Virtual Assistant in less than 100 lines of code transcribe and summarize any meetings using diarization Be creative!

Slide 53

Slide 53

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI CONCLUSION Find out Hands-On-Lab resources https://github.com/eleapttn/workshopmastering-speech-ai.git Master Speech AI APIs to enhance your applications! Conclusion References

Slide 54

Slide 54

06 REFERENCES Master Speech AI APIs to enhance your applications!

Slide 55

Slide 55

Introduction Speech AI concepts Hands-On-Lab Challenges of Speech AI Conclusion References REFERENCES AI Endpoints and the Video Translator in summary AI Endpoints https://endpoints.ai.cloud.ovh.net/ GitHub repositories https://github.com/eleapttn/workshop-mastering-speech-ai.git https://github.com/ovh/public-cloud-examples/tree/main/ai/ai-endpoints Blog articles https://blog.ovhcloud.com/master-speech-ai-and-build-your-own-video-translator-app-with-ai-endpoints/ https://blog.ovhcloud.com/how-to-build-a-speech-to-text-application-with-python-1-3/ https://blog.ovhcloud.com/build-a-powerful-audio-virtual-assistant-with-ai-endpoints/ https://blog.ovhcloud.com/create-audio-summarizer-assistant-with-ai-endpoints/ Master Speech AI APIs to enhance your applications!

Slide 56

Slide 56

THANK YOU! Master Speech AI APIs to enhance your applications!