Matt Hobbs Head of Frontend, Lead Developer Government Digital Service @TheRealNooshu Hello everyone! Thanks: Andy, Andrew, and Simon for organising and allowing me to speak. I’m Matt Hobbs…
A presentation at London Web Performance Meetup in April 2020 in London, UK by Matt Hobbs
Matt Hobbs Head of Frontend, Lead Developer Government Digital Service @TheRealNooshu Hello everyone! Thanks: Andy, Andrew, and Simon for organising and allowing me to speak. I’m Matt Hobbs…
I work at the This is a text slide Government Digital Service I’m the Head of Frontend at Government Digital Service (GDS).
Bringing HTTP/2 to GOV.UK “Bringing HTTP/2 to GOV.UK”
Who are GDS? GDS: we are a central government department that has created and maintains a number of government services.
GOV.UK Building and maintaining GOV.UK. GOV.UK is the website for the UK government. It’s the best place to find policy, announcements, information about the government, and guidance for citizens. Since 2012 it has replaced 1,884 government websites with just one, to become the home of all central government’s online content and services. And it’s what the rest of this talk is about. All this work was conducted way before the current coronavirus outbreak.
What is HTTP/2? What is HTTP/2? HTTP/2 is the latest stable version of the HTTP protocol. Improvements over HTTP/1.1….
● ● ● ● HPACK header compression Multiplexing streams Prioritisation Server push† †: May or may not be an improvement, but it’s in the specifications GDS Minimise protocol overhead via the compression of headers using HPACK Reduce network latency with the use of request and response multiplexing streams over a single TCP connection Much more control over the prioritisation of assets and the order in which they are downloaded Ability to use server push (e.g. push an asset to the browser without it having to request it) Server push is a controversial topic, and depending on who you speak to, it may or may not offer any perf improvements. It’s a whole talk in itself. I’m including it as it is in the H2 spec.
Why enable it? So where did it all begin? What prompted us to enable it in the first place? Well there are many articles on the web that talk about the performance improvement HTTP/2 can bring to a website, all you need to do is “enable it”. A magic bullet to solve all performance problems maybe…
And if you happen to use Google Lighthouse (v5) for auditing your sites performance….
…you may have seen something similar to this under ‘Best Practices’. 14 passed audits gets you a score of 93%. To add the missing 7% and reach the magic 100% score, just enable HTTP/2.
You can see where the best practice scores come from by looking at the ‘Lighthouse v5 Score Weighting’ spreadsheet. Here’s the missing 7%.
10 page report on HTTP/2 GDS Make a case for enabling HTTP/2 on the GOV.UK, 10 page report on what it is and advantages How we could enable it and roll it out.
On examining all the evidence I cannot see any downsides to enabling this protocol on our Fastly CDN layer. Matt Hobbs - 8th October 2018 GDS Very last sentence in the report I write this: On examining all the evidence I cannot see any downsides to enabling this protocol on our Fastly CDN layer. Ignorance is bliss.
Initial trial Positivity of the report, and everything I’d been reading about it. Contacted Fastly support to enable it Eagerly awaited the results Tools: WebPageTest, SpeedCurve, SiteSpeed.io, and Lighthouse.
● ● 5 page types selected, different content / templates Tested on: ○ Chrome Desktop - Native (Sitespeed.io) ○ Chrome Mobile - 3G & 3G slow (Sitespeed.io) ○ Firefox Desktop - Native (Sitespeed.io) ○ Firefox Mobile - 3G & 3G slow (Sitespeed.io) ○ Nexus 5 - Chrome - 3G (WebPageTest) ○ iPhone 5C - 4G (WebPageTest) ○ Nexus 5X - 3G Fast (Lighthouse) GDS Selected 5 pages to test - slightly different content and templates. Tested both on simulated devices and real devices via WebPageTest. Connection speed ranged from Native down to 3G Slow.
The results weren’t at all what I expected. Explanation of the graph (Nexus 5, 3G): ● diff between the HTTP/1.1 results and HTTP/2 ● Any bar above the x-axis is worse. Example: On the ‘homepage’ the first visual change was 149 ms slower on HTTP/2 than HTTP/1.
Pattern repeated itself across different setups and different tools. Example where HTTP/2 was actually quicker for one of the pages (bar charts below the x-axis).
Tried comparing a on a ‘warm cache’ (moving to the page via the homepage). Many of the results were the same: ● H2 starts well, ● Stagnates against HTTP/1.1 for 2 seconds (3G connection).
HTTP/2 - Initial trial Summary page made for a disappointing reading. Many test cases: HTTP/1.1 actually performed better than HTTP/2 under the synthetic tests I’d chosen. Only users with iPhone 5C on a 4G connection: performance actually improved across all pages.
Investigation What exactly was happening? Decided to leave HTTP/2 enabled for 1 month so we could investigate the issue.
Could see H2 was enabled, browser seeing it: ● reduced number of connection ID’s ● fewer TCP connections were being opened.
HAR files: multiplexing of files over the single TCP connection. HTTP/1.1 (left): files requested one after the other creates a stepped slope down the graph. HTTP/2 (right): vertical line showing all files requested at the same time.
HPACK header compression was working as expected. Compression seen after ‘space savings’: 0% for HTTP/1.1, 68% for HTTP/2.
Issue I had a theory about what the issue was.
Domain Sharding ● ‘www.gov.uk’ ○ Used for HTML only ● ‘assets.publishing.service.gov.uk’ ○ Used for all other assets GDS Had (and still have) a shard domain. This is a throwback to a ‘best practice’ for improving performance with HTTP/1.1. Only HTML is served from the origin. All other assets (CSS, JavaScript, Images, Fonts etc) all loaded from the assets domain (shard).
Via WebPageTest waterfall graph see the second TCP connection being established (highlighted in red). All CSS, JS and images waiting on this connection to establish. Was this the issue? How to reduce the time for 2nd connection to establish?
Possible solutions 3 possible solutions I could see to fix the issue:
GDS ‘preconnect’ hint header. Browser can connect to the assets domain earlier it won’t be waiting as long When assets are needed since the connection has already been negotiated.
HTTP/2 connection coalescing GDS HTTP/2 connection coalescing. Allows a browser to use the same TCP connection to transfer data from multiple domains (similar properties like IP address and SSL certificate) If working properly there’d be no need for the 2nd TCP connection at all. I read that post by Daniel Stenberg so many times…
Domain Sharding ● ‘www.gov.uk’ ○ Used for HTML, CSS, JavaScript, and images ● ‘assets.publishing.service.gov.uk’ ○ Used for all other assets GDS Lastly: remove the need for the assets domain for static assets. Serve everything from the origin, no wait time as connection already established.
GDS I even asked Pat Meenan at LondonWebPerf in December 2018. As you can see in a screenshot from the video.
HTTP/2 → HTTP/1.1 Trying to fix, number of weeks, no success. Decided to disable HTTP/2. Knowing it was negatively impacting many users, especially slow mobile connections. It was the correct thing to do.
Left it at that for a while. Few things happening in government for a couple of years. H2 wasn’t a top priority.
The rogue image GDS December 12th (my birthday), I received a question From my ‘How to read a WebPageTest’ blog post. Yew-li-a Lacoban (who may actually be watching) asked a question about an an image download happening in one of the waterfall charts.
GDS It was this image. Fairly unremarkable at first sight.
GDS It’s request number 3 that really stands out. A single image that looks to be out of place compared to other images.
GDS Full waterfall with other assets it gets even weirder. Image (labeled 1) actually downloading from the ‘assets’ domain, before the connection to the ‘assets’ domain has been negotiated (labelled 2) How is that possible?
GDS
Answer: HTTP/2 connection coalescing. I was certain wasn’t happening in the initial trial. It turns out it was happening. Connection view: Connection number 1 you may just be able to make out 2 URL’s. www.gov.uk
and the assets domain. Signifies that the the two domains have coalesced over a single TCP connection. Once the connection established, browser is downloading a single image from the assets domain.
GDS Another pattern connection view shows: ● Only HTML and images are downloaded on connection 1 ● CSS, JavaScript and fonts are only downloaded on connection 2. That’s unusual. So what was happening here?
Subresource Integrity (SRI) Using Subresource Integrity on both our JavaScript and our CSS.
GDS
Security feature / stop third-party code that has been modifying from executing on your site. integrity
attribute with a file hash (as seen in the code). Hash in attribute and the file hash of the asset downloaded don’t match, the file won’t execute.
GDS
Requirements of SRI is the crossorigin
attribute must be used (as seen in the code). Attribute provide support for Cross-Origin Resource Sharing (CORS). Setting this attribute to anonymous
- forcing both the CSS and the JavaScript to be downloaded on the second TCP connection (we saw in the WPT connection view). An anonymous
connection means that there will be no exchange of user credentials unless on the same origin: ● via cookies, ● client-side SSL certificates or ● HTTP authentication
GDS
Second anonymous
connection needs to be established before anything could be downloaded. All our CSS (which is render blocking) is waiting on this connection to be established. Example: CSS and JS allowed to use a credentialed
connection (the one to the origin), bring download forward by 750 ms (in this example)
Change anonymous
to usecredentials
?
GDS
Rather than removing SRI completely, is there an alternative to anonymous
? Looking at MDN documentation on the web there is: The use-credentials
: allows the requests for the asset to include credentialed information.
RFC-114 GDS Following our RFC process for changes to GOV.UK, wrote an RFC and fed back on a few comments Proceeded with the change Note: All RFC’s are publicly available and can be found on Github.
GDS Tested this on a single application on our integration server. All the CSS / JS on the page failed to load. Console shows a CORS issue.
GDS
When it comes to CORS, it always pays to read the fine print. Closer look at the Fetch specifications under CORS protocol and credentials. Row 5 states that: ‘If credentials mode is set to “include” (or ‘use-credentials’), then Access-Control-Allow-Origin
cannot be *
.
Access-Control-Allow-Origin and web fonts
GDS
Access-Control-Allow-Origin
header is used to tell a browser where a cross-origin resource being requested can be used. If an asset is being requested cross-origin from a domain where this header isn’t set to “*” or the domain isn’t listed, you will get a CORS error. In our case: “Access-Control-Allow-Origin” header added to allow our web fonts to be viewed correctly in all browsers when served from the (cross-origin) assets domain. They are not only served with fonts, served for all assets. (issue now logged to fix)
● Access-Control-Allow-Origin: * ● crossorigin=”use-credentials” GDS You can see why it is written in the spec that way: ● Access-Control-Allow-Origin “*” is allowing an asset to be fetched cross-origin and executed from any domain ● crossorigin=”use-credentials” saying: allow this fetch to happen on a connection that can transfer credentialed information about the domain That doesn’t sound very secure…
Subresource Integrity (SRI) Next and easiest step would be to remove SRI from our CSS and JS. We weren’t using it in the way it was intended for scripts hosted on a third-party domain outside our control it was also a safe, low impact change.
RFC-115 GDS Now a different proposal, previous RFC was closed and a new one created, explaining all the details and learnings. Then waited a week for comments and feedback.
Nine small PR’s GDS No blockers so 9 small PR’s were raised to remove SRI from the relevant GOV.UK applications.
Results So let’s take a look at a few results.
HTTP/1.1 (SRI) to HTTP/1.1 (no-SRI) Interested to see the difference between the two setups (SRI to no-SRI) in terms of performance. SpeedCurve was a fantastic tool to visualise this.
Homepage - slow mobile (Samsung S3, 2G) GDS Graph of the homepage on a slow mobile (Samsung S3, 2G connection). Visually complete: dropped from almost 28 seconds to 18 seconds, a 36% improvement.
Answers page - medium mobile (Samsung S4, 3G) GDS Some instances we actually saw an increase in visually complete when removing SRI. Here’s an answer page on a Samsung S4 on a 3G connection. An increase of just under 1 second.
HTTP/1.1 with SRI
GDS
Examining: due to late loading fonts and the impact this has on the visually complete metric. SRI: browser opening 6 anonymous
TCP connections. Fonts need to be downloaded via an anonymous TCP connection. Fonts have 6 connections to be downloaded on.
HTTP/1.1 without SRI
GDS
SRI removed: browser has no need to establish all the anonymous
TCP connections. All assets can download via a credentialed connection. Upon font download: only one anonymous
connection established. Browser has to open another very late to download the other font: extending visually complete metric.
HTTP/1.1 with SRI
GDS
We really start to see improvements is on the WebPageTest connection view graphs: In the example with SRI we have 13 connections: ● 5 anonymous
connections ● 6 credentialed
connections ● 1 third-party connection (GA) Note: big orange space after the font loading. inefficient use of domain sharding. extra connections opened by the browser aren’t being fully utilised.
HTTP/1.1 without SRI
GDS
Compare it to SRI disabled. Here we have 9 connections: ● 2 anonymous
connections for the fonts ● 7 credentialed
connections for all other assets. NOTE: much smaller gap is visible within the connections. Showing the open connections are being used more efficiently by the browser.
HTTP/1.1 (no-SRI) to HTTP/2 It’s looking better, but it can still be improved. So what about finally switching on HTTP/2?
Homepage - slow mobile (Samsung S3, 2G) GDS Graph of the homepage on a slow mobile (Samsung S3, 2G connection). Initial drop from the SRI change (first line), Additional drop due to enabling HTTP/2 (second line). Visually complete: 28 seconds at the start of January, down to 14 seconds now. A 50% improvement.
Answers page - medium mobile (Samsung S4, 3G) GDS We see the 1 second uplift we noticed from the SRI switch on the answers page fall right back down. Visually complete drops the peak of 5.7 seconds down to around 4.4 seconds. 23% improvement.
Start page - Chrome - Cable GDS We’ve seen this dip all over our SpeedCurve graphs, even on fast devices in modern browsers. Page load and fully loaded time drop by 100 ms, even on a very simple page like a start page. May not sound like much, page is loading in around 1 second anyway, 10% improvement on an already quick page!
HTTP/2
GDS
My favorite graph is the connection view from a WebPageTest. We’ve gone from 13 TCP connections down to 2: ● HTTP/2 coalescing can be seen on connection 1 ● anonymous
TCP connection for the fonts on connection 2 NOTE: hardly any empty space on the first connection, meaning it is being fully utilised. Observant among you: impact of the preconnect
header on the 2nd connection. The connection is negotiated way before it is required by the fonts.
HTTP/1.1 with SRI enabled Lastly let’s relook at our summary table. One from earlier (initial trial). Unhealthy looking tests where HTTP/1.1 performed better than HTTP/2.
HTTP/1.1 with SRI enabled Updated table: Much healthier looking. Few instances and page setups where h1 performs better in some metrics so I judged them to be performing slightly better. Overall it is much improved. Couldn’t repeat the tests: iPhone 5C, and Nexus 5, having a few WPT issues at the time I compiled this table.
What’s next for GOV.UK? So what’s next for performance on GOV.UK?
● Access-Control-Allow-Origin: * ● Remove assets domain (for static assets)
GDS
Couple of issues left to fix in the RFC: Reducing the scope of the CORS headers (basic cleanup). ‘removal’ of the assets domain for our static resources. Serve all CSS, JS, images, and fonts off www.gov.uk
. Browsers that have flakey HTTP/2 coalescing will then get the full benefits of HTTP/2. Second TCP connection for the font can then be removed. [fonts come from the document origin, they won’t use a separate connection] Single connection for all page assets, server can have complete control over H2 asset priorities.
TLSv1.3 (+ 0-RTT?) GDS Fastly started rolling out TLSv1.3 to POP’s across the globe. Could see some TLS negotiation performance improvements when this happens in the UK. Investigate the use of 0-RTT session resumption too. Allow users who visit the site on multiple occasions, use a previous TLS negotiation, could remove a chunk of time on initial page load (assuming the browser support that is).
Brotli compression GDS Brotli is a new compression algorithm supported that is now supported by 92% of browsers globally (caniuse). Research i’ve done for GOV.UK written a report, found it improves file compression over the network by around 20% compared to our current GZip implementation. This is something Fastly are working on. Beta program now being trialed. Could be a huge benefit to many GOV.UK users.
New webfont GDS Incredibly close to switching all apps over to our new web font reduces data required by 47% for both font weights we use
JS improvements GDS GOV.UK team are unpicking and removing our dependencies on jQuery. Soon be able to remove another 33KB of minified and compressed JavaScript.
Summary So there you have the story of how HTTP/2 was enabled on GOV.UK. It wasn’t as simple as just “turning it on”, but it was worth the time and investment. I’ve learnt a fair bit in the process which is always good. Couple of quick thank you’s: Thanks Andy Davies and Barry Pollard (HTTP/2 in Action). You would not believe the number of questions I’ve fired across to them both over the past 18 months. And finally thanks to the whole GOV.UK team. I feel very lucky to be able to work with such an incredible bunch of people who are always very patient with me when I propose changes!
Thanks for listening! Matt Hobbs Twitter: @TheRealNooshu Thanks for listening!