Mojo

A presentation at Breizh Data Day in March 2024 in 22000 Saint-Brieuc, France by Jean-Luc Tromparent

Slide 1

Slide 1

Mojo 🔥 The future language of AI ? Jean-Luc TROMPARENT HELLOWORK Version 20240325

Slide 2

Slide 2

CONCLUSION Use Case Value Proposition AI in 2024

Slide 3

Slide 3

What is The current language of AI ?

Slide 4

Slide 4

ChatGPT Advisor L’IA, ou intelligence artificielle, peut être développée et programmée dans différents langages de programmation. Certains des langages les plus couramment utilisés pour créer des systèmes d’IA incluent Python, Java, C++, et R, entre autres. Python est particulièrement populaire dans le domaine de l’IA en raison de sa simplicité, de sa flexibilité, de sa large gamme de bibliothèques et de frameworks dédiés à l’IA (comme TensorFlow, PyTorch, scikit-learn, etc.), et de sa communauté active de développeurs.

Slide 5

Slide 5

StackOverflow Advisor

Slide 6

Slide 6

StackOverflow Advisor

Slide 7

Slide 7

StackOverflow Advisor

Slide 8

Slide 8

StackOverflow Advisor

Slide 9

Slide 9

StackOverflow Advisor

Slide 10

Slide 10

Comment choisir un framework ? https://www.youtube.com/watch?v=k4Tfg6-7cyQ

Slide 11

Slide 11

AI Programming landscape Model System Hardware CUDA, OpenCL, ROCm

Slide 12

Slide 12

NEW KiD in ToWN ! Mojo 🔥 02/05/2023

Slide 13

Slide 13

Value proposition https://www.modular.com/blog/the-future-of-ai-depends-on-modularity

Slide 14

Slide 14

Value proposition Modular Accelerated eXecution platform https://www.modular.com/blog/a-unified-extensible-platform-to-superpower-your-ai

Slide 15

Slide 15

Value proposition • Member of the python family (superset of python) • Support modern chip architectures (thanks to MLIR) • Predictable low level performance https://www.modular.com/blog/a-unified-extensible-platform-to-superpower-your-ai

Slide 16

Slide 16

Mojo is born ! Chris Lattner 2000 beginning of the project LLVM 2003 release of LLVM 1.0 2007 release of CLang 1.0 2008 XCode 3.1 2011 Clang replace gcc on macos 2014 release of Swift 1.0 2018 beginning of the MLIR 2022 creation of Modular cie 2023 🔥 https://www.nondot.org/sabre/

Slide 17

Slide 17

Mojo is blazing fast ! https://www.modular.com/blog/how-mojo-gets-a-35-000x-speedup-over-python-part-1

Slide 18

Slide 18

Mojo is blazing fast ! Changelog 2022/01 incorporation 2022/07 seed round (30 M$) 2023/05 announce MAX & Mojo 2023/08 serie B (100 M$) 2023/09 release mojo 0.2.1 2023/10 release mojo 0.4.0 .. 2024/01 release mojo 0.7.0 2024/02 release MAX & mojo 24.1 https://www.modular.com/blog/how-mojo-gets-a-35-000x-speedup-over-python-part-1

Slide 19

Slide 19

Mojo is blazing fast ! https://www.modular.com/blog/how-mojo-gets-a-35-000x-speedup-over-python-part-1

Slide 20

Slide 20

Performance matters ! Performance matters : • for our users

Slide 21

Slide 21

Performance matters ! Your resume is being processed

Slide 22

Slide 22

Performance matters ! Performance matters : • for our users • for (artificial) intelligence

Slide 23

Slide 23

Performance matters !

Slide 24

Slide 24

Performance matters ! Performance matters : • for our users • for (artificial) intelligence • for the planet

Slide 25

Slide 25

Performance matters ! https://haslab.github.io/SAFER/scp21.pdf

Slide 26

Slide 26

Meetup Python-Rennes https://www.meetup.com/fr-FR/python-rennes/ https://www.youtube.com/watch?v=gE6HUsmh554

Slide 27

Slide 27

Performance matters ! Performance matters : • for our users • for (artificial) intelligence • for the planet

Slide 28

Slide 28

It’s demo time ! Laplacian filter (edge detection)

Slide 29

Slide 29

Edge Detection

Slide 30

Slide 30

Edge Detection kernel Convolve

Slide 31

Slide 31

Edge Detection 2D Convolution Animation — Michael Plotke, CC BY-SA 3.0 via Wikimedia Commons

Slide 32

Slide 32

Edge Detection 2D Convolution Animation — Michael Plotke, CC BY-SA 3.0 via Wikimedia Commons

Slide 33

Slide 33

Edge Detection 2D Convolution Animation — Michael Plotke, CC BY-SA 3.0 via Wikimedia Commons

Slide 34

Slide 34

Python implementation

Slide 35

Slide 35

Python implementation

Slide 36

Slide 36

Python implementation

Slide 37

Slide 37

Python implementation

Slide 38

Slide 38

Python implementation

Slide 39

Slide 39

Berkeley Segmentation Data Set 500 (BSDS500)

Slide 40

Slide 40

Python implementation

Slide 41

Slide 41

Python implementation

Slide 42

Slide 42

Python implementation (numpy)

Slide 43

Slide 43

Python implementation (numpy+numba)

Slide 44

Slide 44

Python implementation (numpy+numba)

Slide 45

Slide 45

Python implementation (numpy+numba)

Slide 46

Slide 46

Python implementation (opencv)

Slide 47

Slide 47

Python implementation (opencv)

Slide 48

Slide 48

naïve version : 500 ms numpy mul : 250 ms numpy+numba : 50 ms Recap opencv : 0.5 ms x2 x 10 x 1000 And now in mojo ?

Slide 49

Slide 49

And now in mojo ! https://www.modular.com/blog/implementing-numpy-style-matrix-slicing-in-mojo

Slide 50

Slide 50

Mojo : let’s create a Matrix

Slide 51

Slide 51

Mojo : module

Slide 52

Slide 52

Mojo : naive.mojo

Slide 53

Slide 53

Mojo : naive.mojo

Slide 54

Slide 54

Mojo : interoperability with Python

Slide 55

Slide 55

Mojo : interoperability with Python

Slide 56

Slide 56

Mojo : interoperability with Python

Slide 57

Slide 57

Mojo : loading PGM picture

Slide 58

Slide 58

Mojo : loading PGM picture

Slide 59

Slide 59

Mojo : naive.mojo

Slide 60

Slide 60

Mojo : naive.mojo

Slide 61

Slide 61

Mojo implementation

Slide 62

Slide 62

It’s demo time ! Let’s optimize !

Slide 63

Slide 63

SISD Architecture

Slide 64

Slide 64

SIMD Architecture

Slide 65

Slide 65

Algorithm vectorization fn naive(img: Matrix[DType.float32], kernel: Matrix[DType.float32]) -> Matrix[DType.float32]: var result = Matrix[DType.float32](img.height, img.width) # Loop through each pixel in the image # But skip the outer edges of the image for y in range(1, img.height-1): for x in range(1, img.width-1): # For each pixel, compute the product elements wise var acc: Float32 = 0 for k in range(3) : for l in range(3): acc += img[y-1+k, x-1+l] * kernel[k, l] # Normalize the result result[y, x] = min(255, max(0, acc)) return result

Slide 66

Slide 66

Algorithm vectorization alias nelts = simdwidthofDType.float32 fn naive(img: Matrix[DType.float32], kernel: Matrix[DType.float32]) -> Matrix[DType.float32]: var result = Matrix[DType.float32](img.height, img.width) # Loop through each pixel in the image # But skip the outer edges of the image for y in range(1, img.height-1): for x in range(1, img.width-1): # For each pixel, compute the product elements wise var acc: Float32 = 0 for k in range(3) : for l in range(3): acc += img[y-1+k, x-1+l] * kernel[k, l] # Normalize the result result[y, x] = min(255, max(0, acc)) return result

Slide 67

Slide 67

Algorithm vectorization alias nelts = simdwidthofDType.float32 fn naive(img: Matrix[DType.float32], kernel: Matrix[DType.float32]) -> Matrix[DType.float32]: var result = Matrix[DType.float32](img.height, img.width) # Loop through each pixel in the image # But skip the outer edges of the image for y in range(1, img.height-1): for x in range(1, img.width-1, nelts): # For each pixel, compute the product elements wise var acc: Float32 = 0 for k in range(3) : for l in range(3): acc += img[y-1+k, x-1+l] * kernel[k, l] # Normalize the result result[y, x] = min(255, max(0, acc)) return result

Slide 68

Slide 68

Algorithm vectorization alias nelts = simdwidthofDType.float32 fn naive(img: Matrix[DType.float32], kernel: Matrix[DType.float32]) -> Matrix[DType.float32]: var result = Matrix[DType.float32](img.height, img.width) # Loop through each pixel in the image # But skip the outer edges of the image for y in range(1, img.height-1): for x in range(1, img.width-1, nelts): # For each pixel, compute the product elements wise # var acc: Float32 = 0 var acc: SIMD[DType.float32,nelts] = 0 for k in range(3) : for l in range(3): acc += img[y-1+k, x-1+l] * kernel[k, l] # Normalize the result result[y, x] = min(255, max(0, acc)) return result

Slide 69

Slide 69

Algorithm vectorization alias nelts = simdwidthofDType.float32 fn naive(img: Matrix[DType.float32], kernel: Matrix[DType.float32]) -> Matrix[DType.float32]: var result = Matrix[DType.float32](img.height, img.width) # Loop through each pixel in the image # But skip the outer edges of the image for y in range(1, img.height-1): for x in range(1, img.width-1, nelts): # For each pixel, compute the product elements wise # var acc: Float32 = 0 var acc: SIMD[DType.float32,nelts] = 0 for k in range(3) : for l in range(3): # acc += img[y-1+k, x-1+l] * kernel[k, l] acc += img.simd_load[nelts](y-1+k, x-1+l) * kernel[k, l] # Normalize the result # result[y, x] = min(255, max(0, acc)) result.simd_store[nelts](y, x, min(255, max(0, acc))) return result

Slide 70

Slide 70

Algorithm vectorization fn naive(img: Matrix[DType.float32], kernel: Matrix[DType.float32]) -> Matrix[DType.float32]: var result = Matrix[DType.float32](img.height, img.width) # Loop through each pixel in the image # But skip the outer edges of the image for y in range(1, img.height-1): for x in range(1, img.width-1, nelts): # For each pixel, compute the product elements wise # var acc: Float32 = 0 var acc: SIMD[DType.float32,nelts] = 0 for k in range(3) : for l in range(3): # acc += img[y-1+k, x-1+l] * kernel[k, l] acc += img.simd_load[nelts](y-1+k, x-1+l) * kernel[k, l] # Normalize the result # result[y, x] = min(255, max(0, acc)) result.simd_store[nelts](y, x, min(255, max(0, acc))) # Handle remaining elements with scalars. for n in range(nelts * (img.width-1 // nelts), img.width-1) : var acc: Float32 = 0 for k in range(3) : for l in range(3): acc += img[y-1+k, n-1+l] * kernel[k, l] result[y, n] = min(255, max(0, acc))

Slide 71

Slide 71

Algorithm vectorization alias nelts = simdwidthofDType.float32 fn naive(img: Matrix[DType.float32], kernel: Matrix[DType.float32]) -> Matrix[DType.float32]: var result = Matrix[DType.float32](img.height, img.width) # Loop through each pixel in the image # But skip the outer edges of the image for y in range(1, img.height-1): @parameter fn dot[nelts: Int](x: Int): # For each pixel, compute the product elements wise var acc: SIMD[DType.float32,nelts] = 0 for k in range(3) : for l in range(3): acc += img.simd_load[nelts](y-1+k, x-1+l) * kernel[k, l] # Normalize the result result.simd_store[nelts](y, x, min(255, max(0, acc))) vectorizedot, nelts return result

Slide 72

Slide 72

Algorithm vectorization alias nelts = simdwidthofDType.float32 fn vectorized(img: Matrix[DType.float32], kernel: Matrix[DType.float32]) -> Matrix[DType.float32]: var result = Matrix[DType.float32](img.height, img.width) # Loop through each pixel in the image # But skip the outer edges of the image for y in range(1, img.height-1): @parameter fn dot[nelts: Int](x: Int): # For each pixel, compute the product elements wise var acc: SIMD[DType.float32,nelts] = 0 for k in range(3) : for l in range(3): acc += img.simd_load[nelts](y-1+k, x+l) * kernel[k, l] # Normalize the result result.simd_store[nelts](y, x+1, min(255, max(0, acc))) vectorizedot, nelts return result

Slide 73

Slide 73

Benchmark results

Slide 74

Slide 74

• Far from stable • Compilation AOT or JIT • Python friendly but not Python Recap • Dynamic Python vs Static Mojo • Python interoperability • Predictable behavior with semantic ownership • Low level optimization • Blazingly fast

Slide 75

Slide 75

Mojo 🔥 The future language of AI ?

Slide 76

Slide 76

Conclusion • Python is not yet dead ! But he moves slowly • This is a great team ! Will they be able to deploy their platform strategy ? • Will they be able to unite a community? To be open-source or not to be

Slide 77

Slide 77

Jean-Luc Tromparent Principal Engineer @ https://linkedin.com/in/jltromparent MERCI ! https://github.com/jiel/laplacian_filters_benchmark https://noti.st/jlt/5Ym6LX/mojo 👉 Feedback at slido.com #1245 954