What can we learn from 15 million websites?

A presentation at DevFest, Malta 2022 in November 2022 in by Kevin Farrugia

Slide 1

Slide 1

What can we learn from 15 million websites? Kevin Farrugia DevFest 2022 - Malta

Slide 2

Slide 2

A brief intro… Hi, I’m Kevin Farrugia ● Consultant on Web Performance & Frontend Architecture. ● HTTP Archive & Web Almanac contributor. ● Author of the Resource Hints chapter in 2021 Web Almanac. @imkevdev | @kevinfarrugia@webperf.social | imkev.dev

Slide 3

Slide 3

HTTP Archive “We periodically crawl the top sites on the web and record detailed information about fetched resources, used web platform APIs and features, and execution traces of each page.” Source: https://httparchive.org/

Slide 4

Slide 4

HTTP Archive “We periodically crawl the top sites on the web and record detailed information about fetched resources, used web platform APIs and features, and execution traces of each page.”

Slide 5

Slide 5

CrUX “We periodically crawl the top sites on the web and record detailed information about fetched resources, used web platform APIs and features, and execution traces of each page.”

Slide 6

Slide 6

Chrome User Experience Report Collected from real-world Chrome users. ● BigQuery ● Dashboard ○ ● E.g. https://timesofmalta.com API ○ curl -s —request POST “https://chromeuxreport.googleapis.com/v1/records:queryRecord?key=${CR UX_API_KEY}” —header ‘Accept: application/json’ ‘Content-Type: application/json’ —header —data ‘{“formFactor”:”PHONE”,”origin”:”https://timesofmalta.com”,”metrics”:[ “largest_contentful_paint”]}’

Slide 7

Slide 7

WPT “We periodically crawl the top sites on the web and record detailed information about fetched resources, used web platform APIs and features, and execution traces of each page.”

Slide 8

Slide 8

WebPageTest ● Private instance of WebPageTest ○ ● E.g. https://timesofmalta.com Data is augmented using Wappalyzer, Lighthouse, custom metrics and other tools.

Slide 9

Slide 9

BigQuery “We periodically crawl the top sites on the web and record detailed information about fetched resources, used web platform APIs and features, and execution traces of each page.”

Slide 10

Slide 10

BigQuery SELECT COUNT(*) FROM httparchive.urls.latest_crux_mobile LIMIT 1

Slide 11

Slide 11

BigQuery SELECT COUNT(*) FROM httparchive.urls.latest_crux_mobile LIMIT 1 16,784,417

Slide 12

Slide 12

Slide 13

Slide 13

Queries ● Usage: ○ ● Which JavaScript technology is the most popular? Comparison: ○ Which websites have a better LCP - those built using React or those built using Svelte? * ● Correlation: ○ How does the number of preload hints correlate with good LCP? *

  • correlation does not imply causation

Slide 14

Slide 14

Hypothesis ● Lighthouse Audits ● Opportunities: new ideas, directives or frameworks ● Recommendations ● The unusual

Slide 15

Slide 15

Hypothesis - Preload LCP image ● Preload Largest Contentful Paint image ● Query ○ https://www.anandfurnishers.in/ ■ PageSpeed Insights ■ WebPageTest ■ Experiment

Slide 16

Slide 16

Hypothesis - fetchpriority ● Demo ○ Render-blocking scripts ○ fetchpriority=”high” ○ Opportunity: when there is more than one high priority inflight request AND render-blocking scripts ● Query ○ https://greenenergy.nus.edu.sg/ ○ WebPageTest ○ Experiment

Slide 17

Slide 17

Hypothesis - WebP vs JPG Source: @rick_viscomi

Slide 18

Slide 18

Hypothesis - WebP vs JPG ● Query

Slide 19

Slide 19

Hypothesis - Unusual ● Websites downloading React and AngularJS ● Query ○ https://www.goneforarun.com/ ○ App (AngularJS) ○ ZenDesk’s Web Widget (React)

Slide 20

Slide 20

Performance is Accessibility ● “The mission of web performance is to expand access to information and services on the web.” Source: Alex Russell

Slide 21

Slide 21

Contribute ● HTTP Archive Forums ● Web Almanac ● Web Performance Calendar

Slide 22

Slide 22

Resources ● DevFest 2022 ● HTTP Archive ● 2022 Web Almanac ● CrUX documentation ● GitHub - kevinfarrugia/crux_csv ● GitHub - kevinfarrugia/bq-query