A presentation at Remote Python Pizza by Ong Chin Hwee
Just-in-Time with Numba Presented by: Ong Chin Hwee (@ongchinhwee) 25 April 2020 Remote Python Pizza
About me Ong Chin Hwee 王敬惠 ● Data Engineer @ ST Engineering ● Background in aerospace engineering + computational modelling ● Contributor to pandas 1.0 release ● Mentor team at BigDataX @ongchinhwee
Bottlenecks in a data science project ● Lack of data / Poor quality data ● Data Preprocessing ○ The 80/20 data science dilemma ■ In reality, it’s closer to 90/10 ○ Slow processing speeds in Python! ■ Python runs on the interpreter, not compiled @ongchinhwee
Compiled vs Interpreted Languages Written Code Compiler Execution Compiled Code in Target Language Loader Linker Machine Code (executable) @ongchinhwee
Compiled vs Interpreted Languages Written Code Compiler Execution Lower-level bytecode Virtual Machine @ongchinhwee
What is Just-in-Time? Just-In-Time (JIT) compilation ● Converts source code into native machine code at runtime ● Is the reason why Java runs on a Virtual Machine (JVM) yet has comparable performance to compiled languages (C/C++ etc., Go) @ongchinhwee
Just-in-Time with Numba numba module ● Just-in-Time (JIT) compiler for Python that converts Python functions into machine code ● Can be used by simply applying a decorator (a wrapper) around functions to instruct numba to compile them ● Two modes of execution: ○ njit (nopython compilation of Numba-compatible code) ○ jit (object mode compilation with “loop-lifting”) @ongchinhwee
Numba Compiler Architecture Lower-level bytecode Numba interpreter Numba IR IR: Intermediate Representation Type inference Typed Numba IR Machine Code (executable) LLVM JIT Compiler LLVM IR Lowering (codegen) @ongchinhwee
Numba Compiler Architecture Lower-level bytecode Numba interpreter IR: Intermediate Representation Numba IR Type inference Numba frontend Typed Numba IR Numba backend Machine Code (executable) LLVM JIT Compiler LLVM IR Lowering (codegen) @ongchinhwee
Practical Implementation @ongchinhwee
Initialize File List in Directory import numpy as np import os import sys import time No. of images in ‘train/NORMAL’: 1431 DIR = ‘./chest_xray/train/NORMAL/’ train_normal = [DIR + name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))] @ongchinhwee
With numba from PIL import Image from numba import jit @jit def image_proc(index): ”’Convert + resize image”’ im = Image.open(define_imagepath(index)) im = im.convert(“RGB”) im_resized = np.array(im.resize((64,64))) return im_resized @ongchinhwee
With numba from PIL import Image from numba import jit Code runs in object mode (@jit) @jit def image_proc(index): ”’Convert + resize image”’ im = Image.open(define_imagepath(index)) im = im.convert(“RGB”) im_resized = np.array(im.resize((64,64))) return im_resized @ongchinhwee
With numba start_cpu_time = time.clock() Python-only: 218.1 seconds listcomp_output = np.array([image_resize(x) for x in train_normal]) After compilation: 169.6 seconds end_cpu_time = time.clock() total_tpe_time = end_cpu_time - start_cpu_time sys.stdout.write(‘List comprehension completed in {} seconds.\n’.format( total_tpe_time)) @ongchinhwee
With numba import numpy as np from numba import njit @njit def square(a_list): squared_list = [] ”’Calculate square of number in a_list”’ for x in a_list: squared_list.append(np.square(x)) return squared_list @ongchinhwee
With numba import numpy as np from numba import njit @njit Code runs in no-Python/native machine mode (@njit or @jit(nopython=true)) def square(a_list): squared_list = [] ”’Calculate square of number in a_list”’ for x in a_list: squared_list.append(np.square(x)) return squared_list @ongchinhwee
With numba a_list = np.array([i for i in range(1,100000)]) Python-only: start_cpu_time = time.time() 0.51544 seconds listcomp_array_output = square(a_list) end_cpu_time = time.time() total_tpe_time = end_cpu_time - start_cpu_time After compilation: 0.00585 seconds sys.stdout.write( ‘Elapsed (after compilation) {} seconds.\n’.format(total_tpe_time)) @ongchinhwee
Key Takeaways @ongchinhwee
Just-in-Time with numba ● Just-in-Time (JIT) compilation with numba ○ converts source code from non-compiled languages into native machine code at runtime ○ may not work for some functions/modules - these are still run on the interpreter ○ significantly enhances speedups provided by optimized numerical codes @ongchinhwee
Reach out to me! And check out my slides on: : ongchinhwee : @ongchinhwee : hweecat : https://ongchinhwee.me hweecat/talk_jit-numba
Facing slow processing times in your numerical codes? In this lightning talk, we will explore JIT compilation in Numba.
Here’s what was said about this presentation on social media.
#JIT, Just-in-Time, with #Numba🐍😎🚀!!!
— Enrica (@enricapq) April 25, 2020
Thanks Ong @ongchinhwee, one of the first talk @pythonpizzaconf 🍕 #remotepythonpizza #python pic.twitter.com/jxwarOAtkI
We had @ongchinhwee explaining how JIT compilation works in Numba. Numba is one of these tools I bump into every now and then and never really looked into it in details, maybe now it's the time 😅
— Ania Kapuścińska (@lambdanis) April 25, 2020
Ong Ching Hwee @ongchinhwee joined #remotepythonpizza from Singapore to talk about Numba and JIT compilation 👏🏼 pic.twitter.com/6vHMgvxTZG
— Alessia Marcolini (@viperale) April 25, 2020
Kudos to @ongchinhwee for explaining the non-trivial topics of Numba, JIT, LLVM, and more, in 10 minutes. Really good talk! #remotepythonpizza
— 「Cristián」 (@cmaureir) April 25, 2020
I just loved @ongchinhwee talk at #remotepythonpizza I've hear about #numba in #Python before, now I'm super curious to play with it!
— Claudia Millán (@cheshireminima) April 25, 2020
I am really digging this talk by @ongchinhwee ! @pythonpizzaconf
— Laurent 🏄🏻♂️ Swift (@wrmultitudes) April 25, 2020
#remotepythonpizza pic.twitter.com/zzTAKwAFcL
Great talks this morning @pythonpizzaconf. Particularly enjoyed @numba_jit talk by @ongchinhwee, @olgamatoula on deps, @hackebrot on comp vs. inheritance, Norbert's ice crystal analysis, @eumiro getting home & @hendorf on @spacy_io 🙌🏼 Looking forward to the afternoon ahead!
— Ben 👨🏻💻 (@hum_annoyed) April 25, 2020