Modern PWA, a winning combo for the best client experience

A presentation at PWA Summit in October 2022 in by Kenneth Rohde Christiansen

Slide 1

Slide 1

Modern PWA A winning combo for the best client experience Kenneth Christiansen, Anssi Kostiainen

Slide 2

Slide 2

The web is an unquestioned Key Application Platform The unique qualities of the web, strengthened Offer the core building blocks for application development Remain cross platform to work across all desktop operation systems Achieve great, near native-like performance Offer capabilities and common features available to native apps More responsive - load instantly, be responsive from the get-go Meet and exceed the user safe environment needs

Slide 3

Slide 3

• The Web is becoming an unrivaled Application Development Platform • Solid application building blocks • Web App Manifest and Service Workers • CSS Grid, Flexbox, Container Queries • Design Systems powered by Web Components • URLPattern, Navigation API and Shared Element Transitions • Relentless focus on performance • Web Assembly, SIMD optimizations • MediaPipe with native support for framing, background blurring etc. • Native support for machine learning • Innovating on capabilities via Project Fugu, e.g., File System Access

Slide 4

Slide 4

Platforms/ecosystems are only as important as the experience they bring The cost of creating high quality apps for the target audience often a balancing act Today it is often mobile first with a web based (or hybrid solution) for the desktop

Slide 5

Slide 5

270% 31.9% • Cross platform with great support on desktop OSes like Windows and ChromeOS • Most browsers are built around open-source projects which we can contribute to • Web features are being standardized in the open by interested parties • Interoperability and backwards compatibility ensure the platform keeps evolving [1] Source: Slashdata – Dev Nation ’22, [2] Source: Intel DCA, Microsoft [3] https://firt.dev/pwa-2021, https://www.emergenresearch.com/industry-report/progressive-web-application-market

Slide 6

Slide 6

† Source: Intel DCA, July 2022 * Other names and brands may be claimed as the property of others.

Slide 7

Slide 7

Performance and capabilities + an enjoyable user experience • Core capabilities for desktop apps are now built in • Seamless copy and paste • Frictionless access to local files • Safe access to external hardware for education, hobbyists, and enterprise usage Project Fugu effort

Slide 8

Slide 8

Made possible with File System Access API

Slide 9

Slide 9

Made possible with a set of new APIs • High performance storage - Origin Private File Systems provide highly optimized in-place and exclusive write access • Web Assembly to bring existing C++ code to the web • Dynamic Multithreading support for Web Assembly • SIMD - Halide is essential to Adobe’s performance and it provides a 3-4× speedup on average and in some cases an 80-160× speedup. • P3 colorspace support for canvas • Web Components

Slide 10

Slide 10

Slide 11

Slide 11

• Native apps offers “fake safety”, but users use just a few apps • Signing, app store approval, yet often own installers (Windows) • Very powerful direct access to many native APIs without user approval • Users install a limited set of apps, but browse many web sites/apps per month • Web APIs are designed with safety and privacy in mind • Hard problems, but we are constantly innovating and refining our approaches

Slide 12

Slide 12

Capabilities Performance Bridge the gap between the web and native Achieve near-native performance No silicon left behind as the desktop moves to the web Optimize the use of Intel silicon

Slide 13

Slide 13

WebAssembly Fast CPU execution, SIMD + MT support WebGL WebGPU Web Neural Network Legacy GPU execution, 97.6% device support Modern GPU execution 3.7x faster than WebGL* CPU/GPU/VPU execution Near-native performance *Geomean of TensorFlow.js Model Benchmark (https://tensorflow.github.io/tfjs/e2e/benchmarks/local-benchmark/index.html), Machine config

Slide 14

Slide 14

SSE: 128 bit 128bit SIMD 128bit SIMD 256bit SIMD AVX: 256 bit Convert 128-bit Wasm SIMD instructions into 256-bit IA instructions dynamically * XNNPACK end2end Benchmark (https://github.com/google/XNNPACK/blob/master/bench/end2end.cc) Machine Config: TGL-RVP, Ubuntu 20.04.1 LTS

Slide 15

Slide 15

WebCode VideoDecoder Stream VideoRender WebGL/WebGPU/Canvas Demuxer JavaScript/Wasm WebCodec AudioDecoder AudioWorklet Enables hardware encoders/decoders via WebCodec API • • • Video formats: AV1, AVC1, VP8, VP9, HEVC Audio formats: MP3, MP4a, Opus, Vorbis, ULAW, ALAW Supports HDR (High Dynamic Range) on Intel

Slide 16

Slide 16

Chromium Processes / Threads ThreadType API Thread QoS API Thread Priority API Windows Scheduler QoS Class API MacOS Scheduler nice cgroup uclamp Linux/CrOS Scheduler Legends Chromium existing HGS+ (for Intel platform) Dynamic Core/Frequency Scheduling Chromium new SoC pcode/Punit- HFI Operating system SoC P-Cluster E-Cluster HW components Reduce PWA power by tagging threads/tasks with their roles instead of priority

Slide 17

Slide 17

Encoder Decoder Signal Network Worker • Assign threads with different ThreadTypes • TheadPriority: desired task starting time • ThreadQoS: desired task completion time • Assign WebRTC signal/network/worker threads with ResourceEfficient Type ResourceEfficient • • • P-core P-core P-core L2 (MLC) L2 (MLC)P-core P-core L2 (MLC) E-core L2 (MLC)P-core L2 (MLC)P-core E-core P-core L2 (MLC) L2 (MLC) L2 (MLC) E-core E-core E-core E-core E-core • Scheduled frequently Less sensitive to latency Less computation heavy ResourceEfficient threads will be scheduled to ECores whenever possible E-core L2 (MLC) L2 (MLC) L3 (LLC) 100+mW power savings achieved for video call on Windows (w/ 12 th Gen Intel Core)

Slide 18

Slide 18

Media Player 1 sample Browser Process 20 samples Repeat Decoded frame (GPU mem) 1 IPC Reduction Decoder Decode IPCs … Composition IPCs Driver invocation … Repeat 1 5 4 333ms … 16.67ms Single compressed frame (CPU mem) Media Player Browser Process Compositor Decoder 2 GPU process MF Utility Process Renderer Algorithm 16.67 ms 2 3 … 6 GPU Driver/Hardware 2 3 … GPU Driver/Hardware 10% SoC power saving for video playback on Windows (w/ 12th Gen Intel Core)

Slide 19

Slide 19

Performance TensorFlow-Lite Web CPU Performance § (higher is better) ONNXRuntime Web GPU Performance £ (higher is better) 14 12 12 9.6 Speedup (times) 10 9.8 8.8 8.9 10 8 8 6 6 4 2 12.7 13 14 3.4 3 1 1 8 7.8 7.9 2.8 2.9 4 2 8.6 1 1 1 1 0 0 MobileNetV2 XNNPACK Delegate (Wasm SIMD) WebNN Delegate (XNNPACK backend) ResNet50V2 XNNPACK Delegate (Wasm SIMD+Threads) Native TFLite XNNPACK Delegate MobileNetV2 WebGL EP SqueezeNetV1.1 WebNN EP (DirectML backend) Emotion FerPlus Yolo Native ONNXRuntime DirectML EP EP: Execution Provider WebNN delivers near-native Power and Perf characteristics thanks to efficient paths to the HW capabilities & features * Other names and brands may be claimed as the property of others. † All models are FP32, batch size 1, Tested on ADL Laptop DELL Vostro 5620, CPU: 12th Gen Intel(R) Core(TM) i7-1260P 4 p-cores / 4.70 GHz, 8 e-cores / 3.40 GHz, OS: Windows 11 Professional latest (21H2) . § Performance data was tested on Chromium WebNN/XNNPACK Prototype by running TensorFlow.js end-2-end model benchmark. £Performance data was tested on Electron.js with WebNN-native node.js add-on by running ONNXRuntime Web Demo.

Slide 20

Slide 20

PWA technology enables modern Zoom experiences on ChromeOS and on any browser • Leverages Web Codecs for full control over media processing, incl. hardware acceleration • New web capabilities led by Intel to enhance these experiences: • • • • • • Background blurring Face detection / auto-framing Eye gaze correction Lighting correction Noise suppression Compute Pressure https://pwa.zoom.us/wc

Slide 21

Slide 21

Slide 22

Slide 22

Gain insights into different kinds of system pressure, starting with CPU Adjust number of video feeds Adjust video resolution and frameper-second Skip feed filters and non-essentials like WebRTC noise suppression Turn quality-vs-speed and size-vs-speed towards ”speed” in WebCodecs

Slide 23

Slide 23

Abstract CPU stalls, temperature, other factors

Slide 24

Slide 24

function pressureChange(records, observer) { // For simplicity only look at last sample. const record = records.at(-1); switch (record.state) { case “critical”: // Stop all unnecessary work. break; case “serious”: // We are OK for now, but don’t do additional work. default: // We are fine! } } const observer = new PressureObserver(pressureChange); observer.observe(“cpu”);

Slide 25

Slide 25

Intel collaborates with the web ecosystem. Help us build the APIs you need: • • Tell us how we can make your web experience better, what are your pain points, we listen Adopt the new APIs to benefit from Intel hardware* capabilities and to leverage hybrid-core architecture efficiently *CPU, GPU and other accelerators

Slide 26

Slide 26