The intersection of Performance & Accessibility, #PerfMatters 2019
So, let’s set the stage
We’re at a conference about performance,
With performance being the manner in which a mechanism behaves
I was conducting some accessibility auditing last year
Which is the process of performing manual and automated checks to see if a website can work with assistive technology
The client was a medical education app used to train caregivers
It was built using a single page application framework, using Google’s Material Design library to build the user interface
When I learned that, I thought:
Google made it? Oh sweet. I don’t have to worry as much.
So I fire up VoiceOver, macOS’s screen reader and start testing.
Things are going pretty well and then
I try restarting Safari, I try clearing the cache, I try closing other tabs. Heck, I even try rebooting
Same result every time
So, I ask my boss what’s going on, and he says, “Oh, it’s probably problems with the Accessibility Tree.”
The what now?
Before I get more into this, here’s a question for you:
How do you describe an interface?
I’m not saying, “tell me what you see or hear after you turn the computer on.”
I’m saying how do you speak computer to a computer
The way we do that is with the Accessibility Tree
Fundamentally, it is is a collection of objects, all the atomic bits that make up a user interface
People who rely on assistive technology use the Accessibility Tree to navigate and take action on user interfaces
Without a functioning Accessibility Tree there is no way to describe the current state of what an interface is
And consequently no way for some people to be able to use a computer or smartphone
So suffice to say it’s a really important piece of technology!
So, how do you build an Accessibility Tree?
First, you have to give each individual object in an interface a Name
Names are a short identifiers used to describe purpose.
Then you give it a Role
Roles are predefined values that tell someone what they can do to the interface object
For example, a role of button means that someone can press it to activate some predefined action
Then we have Properties, which are the attributes of the object
Examples of this are its position in an interface, it’s state, or interactions that are allowed
For a button, that could mean a disabled state to prevent it from being clicked on
Finally, we have a Description which can provide further information about the object, if needed
These accessible objects are then programmatically grouped together to form more complicated interface components
So, how do you build these interface components?
Let’s say an operating system alert dialog, that’s a pretty commonplace thing for UI
Starting from the top down, we have a menubar
That’s the anchor that we’re going to attach everything to
And then we add a title for the menubar, so we know what it’s all about, and what contents we can expect to find
If you’re using a screen reader, titles are really helpful, as it saves you from having to dig around inside the component to figure out what it’s for
We might also need to close this dialog window once we learn what it’s for, so we include a close button in the menubar
Then we add the body of the dialog, the place where we can place the bulk of the dialog content
In the dialog’s body there’s another title, which asks us if we want to save the changes we made to the thing we were working on
We then add text to the body, which will provide more information about what happens if you choose not to save
Then we add a save button, to allow us to commit to saving the changes we made
And an accompanying cancel button, in case we don’t
And presto! We have a dialog component!
Because these collected objects have accessible names and are therefore present in the Accessibility Tree, we can speak computer to computer to get what we want
I can programatically say “go to the dialog
Then it’s body
and activate the save button
and it works!
-So, this might seem a little pedantic, but please bear with me for a bit
We do have this dialog component, but there’s more to it than just that
Interface components have to have a place to live, and that place is an operating system
Operating systems also include components that allow us to navigate to, and access the stuff you store on them
It’s also the place we install and run programs
And all the little doodads and widgets we can’t live without
And the browser, which is rapidly becoming an operating system in it’s own right
The browser contains the mechanisms we use to access the web
And the web is sort of the Wild West in that you can write more or less whatever you want and it’ll usually work, which is both a blessing and a curse
But regardless of how you write what you write, you can’t escape the fact that you have to ultimately create HTML
Again, there’s many ways to go about doing this, but they are the other two requisite parts that make up the whole that is a website or web app
Browsers then read the DOM, and all the information contained within it to draw an interface
Which is then shown to a user
The user can then take actions, which updates the DOM, which in turn updates what is visually rendered, allowing us to make our websites dynamic
Running in parallel is the Accessibility Tree, which is sampled from the generated DOM
I say sampled because the Accessibility Tree will use specialized heuristics to only surface things it deems necessary
Modern versions of the Accessibility Tree are generated after styles are computed, as certain CSS properties such as display and pseudo element content will affect it
This sampled version of the DOM is then read by various kinds of assistive technology, including, but not limited to screen readers
There’s two things I’d like to point out here:
First, using a visually rendered interface and assistive technology aren’t mutually exclusive
Not all screen reader users are blind
Many people choose to augment their visual browsing with devices that rely on the Accessibility Tree
Secondly, the Accessibility Tree relies on the user interacting with the DOM to update
It’s effectively a “read only” technology, in that we can’t directly work with it
Another thing you should be aware of is that it’s more an accessibility forest than an Accessibility Tree
There are different implementations of the Accessibility Tree, each depends on what operating system you’re using, and what version of said operating system is running
This is due to the different ways companies have built and updated their operating system’s underlying code
Just so we’re clear:
The DOM can be a part of the Accessibility Tree, but the Accessibility Tree is larger, and not limited to just the contents of your browser
Because of this, the accessibility tree is more brittle
It has more to pay attention to, and it’s technical architecture was developed before this whole internet thing really took off
Meaning it didn’t anticipate the sheer amount of information we’d be throwing at it.
Crashing the Accessibility Tree, and therefore breaking assistive tech support is bad, yes
But a large amount of information present in the DOM means that there’s an accompanying large amount of work the Accessibility Tree needs to do to figure out what’s what and report on it
Which slows. it. down.
This can create a lack of synchronization between the current state of the page and what is being reported by assistive tech
Meaning that a user may not be working with an accurate model of current screen, and be taking action on interactive components that are no longer present
This can create the confusing experience of activating wrong control or navigating to a place they didn’t intend to
So, how do we help prevent this?
Start with writing semantic HTML
Using things like the button element for buttons instead of ARIA slapped on a div helps lessen the amount of guesswork the Accessibility Tree has to do when it runs its heuristics to slot meaningful content into its description of the current state of things.
This serves to both speeds up its calculation and makes what it reports more reliable
And speaking of semantic HTML
Here’s the raw, living DOM tree of an actual website
In fact, it’s the website for this conference! Estelle, sorry not sorry
Even a static, performant website contains a lot of information a computer has to chew through
Again, utilizing semantic HTML, like the markup present here, helps reduce the effort it takes to generate the Accessibility Tree
And unfortunately, using semantic HTML is kind of a rare thing these days, which is partly why I’m here giving this talk
Okay, I think you get the gist of it
Of the code I just showed you, five slide’s worth,
We’ve only covered the top quarter of just the homepage
Again, it’s a lot of stuff to process
Keeping that DOM tree example in mind, we now have a simple form
It’s a pair of unstyled radio buttons asking you if you want the chicken or the fish as your meal preference
Here’s how that form might be translated into the Accessibility Tree
If we use semantic HTML, the names, roles, and properties are automatically filled in for us without any additional effort
When I was speaking about how semantic HTML slots in, this is a more granular example of what I’m getting at
Here’s a more high fidelity example, sampled again from the PerfMatters homepage
I’m using Firefox’s Accessibility Inspector to dig into a link
There’s a ton more information exposed, including attributes, states, actions, a child element count, relations, and available actions
All of this data is used by the Accessibility Tree to help it describe the shape of things
And here’s an even more high fidelity example, a raw text dump of the Accessibility Tree for this same page
This is an example of what the language might look like when we’re speaking machine directly to machine.
We don’t speak machine directly, though
Before we had Firefox’s Accessibility Inspector, we had to rely on more specialized tools to be the translators
The Paciello Group’s aViewer and Apple’s Accessibility Inspector are the two go-to resources
And you’ll still need them if you want to inspect anything other than websites, or do some really serious digging
So, let’s make the abstract immediate
What going on here, and why should you care?
With my auditing project, I narrowed the problem down to issues with Material Design’s radio inputs
Here’s how they appear visually
And here’s how they appear in code
To make a Material Design radio input, you need
Six HTML elements containing nine attributes, with a DOM depth of 3
You also need 66 CSS selectors containing 141 properties, which weighs in at 10k when minified
All that will get us a radio input
But you need more than one radio input to be able to use it properly
And oftentimes there’s a few options to select from
Sometimes there’s more than just a few options
And sometimes there’s even more than that
In the case of my audit, we had a ton of radio inputs conditionally being injected into the page
This all adds up
Google’s Lighthouse project, an open source tool that analyzes your website for performance problems, recommends an optimal website DOM tree has the following:
Less than 1,500 nodes
Max depth of 32 nodes
No parent node with more than 60 child nodes
What I’d like to call attention to is the max depth of 32 nodes bit
That might seem like a lot of wiggle room at first, but take a moment and think about the templates of the sites you’ve worked on.
There’s the frameset, wrapper divs, landmarks, components wrappers and if you’re doing it right, fieldsets on your forms
Each digging a little bit more inward
Another part of accessibility auditing is providing a fix to the problems you uncover
It’s tough work, but nobody likes to pay people money to tell them all the things that are wrong, but offer no solutions
We wound up recommending a radio input pattern that utilized three HTML elements
It’s a pattern developed by my friend Scott, taken from his excellent a11y_styled_form_controls project
It’s worth saying that Scott puts these patterns through their paces, and tests them with an incredibly robust set of assistive technologies
So a nice side benefit is knowing I can recommend them with confidence
Visually, it was completely indistinguishable from the original version
To compare the two solutions
We have a 50% reduction in HTML elements, and we cut the DOM depth down by a third
There’s also a 30% reduction in CSS selectors and properties, resulting in the CSS payload for this pattern being reduced by 90%
We’ve also completely removed 30k of JavasScript, which is 30k less blocking resource being served
Cobbling up a rough prototype, we were able to create an environment that mimicked the conditions of the site we were auditing, only using the new radio input code and guess what?
It worked! VoiceOver didn’t crash!
So why is this our problem?
Why should we care about the fragility of other people’s software?
Well, prioritizing developer ergonomics without considering the generated HTML can lead to bad things happening
And regardless of our setup, we tend to pile even more things on to our base experience
Things like ads
Social marketing engagement tools
And development integration utilities
And then there’s the also cold war raging between websites and their users
Here is Facebook splitting a single word into 11 DOM elements to avoid ad blockers
When we throw out what the browser gives us for free, we oftentimes don’t realize there are very real, very important things we sacrifice in doing so
Here’s what Marco Zehe, Mozilla’s senior Accessibility QA Engineer has to say about all this:
Nibbling away at those milliseconds it takes for information to reach assistive technologies when requested. My current task is to reduce the number of accessible objects we create for html:div elements. This is to reduce the size of the accessibility tree.
This reduces the amount of data being transferred across process boundaries. Part of what makes Firefox accessibility slower on Windows since Firefox Quantum is the fact that assistive technologies now no longer have direct access to the original accessible tree of the document.
A lot of it needs to be fetched across a process boundary. So the amount of data that needs to be fetched actually matters a lot now. Reducing that by filtering out objects we don’t need is a good way to reduce the time it takes for information to reach assistive technologies.
This is particularly true in dynamically updating environments like Gmail, Twitter, or Mastodon. My current patch, slatted to land early in the Firefox 67 cycle, shaves off about 10 to 20% of the time it takes for certain calls to return.
Note that all this optimization is only for one browser, Firefox
And that there are a lot of browsers out there
It’s also not all about browsers, either
Here’s a refreshable braille display, one of many other kinds of assistive technology that interfaces with the accessibility tree
“Someone should do something!” The battle cry of bystanders everywhere
Let’s unpack this some, and figure out our available options
For screenreaders, the main ones are JAWS
Both are Windows screen readers
VoiceOver for macOS and iOS
And Talkback for Android
These are the big four screen readers you’re going to hear about
It’s sort of analogous to Chrome, Safari, Firefox, and IE, in that there’s more screen readers out there, but these cover the main use cases you’re most likely to deal with
While not all assistive technology are screen readers, if your site works well for them, chances are good it’ll work well for other assistive tech
Of them, all but one have an open issue tracker
But half have closed source code
This means that while we can file issues, we aren’t really empowered to do much more about it on the assistive technology layer
It’s also completely unrealistic to expect people to submit and follow issues or code pull requests across all trackers in addition to working a full-time job
Our only other realistic option is to keep the DOM trees on our sites nice and shallow
There are actual, tangible benefits to doing this
Here’s Marco again weighing in again on his optimization efforts:
Reducing the number of milliseconds it takes for a screen reader to start speaking after a key press from about 140 to 100 here, or from 120 to 100 there, doesn’t matter much on a fast machine. On a slow machine, that reduction is from about 230 to 250 down to 200 or 190.
Let’s talk about what slow machines means
If you are disabled and/or underserved, you face significant barriers to entering and staying in the workforce
This means you may have less access to current technology,
especially the more expensive, higher quality versions of it
Another factor is some people who rely on assistive technology are reluctant to upgrade it, for a very justified fear of breaking the way they use to interact with the world
Slow machines may also not mean an ancient computer
Inexpensive Android smartphones are a common entry point for emerging markets, and with Android comes Talkback
A slow machine might also come from a place you may not be expecting
By which I mean you might have a state-of-the-art computer and are doing state-of-the-art things on it
This requires a ton of computational resources, which makes fast things slow
With less computational resources to go around, we may have unintended consequences, possibly re-creating a situation similar to our too many radio inputs problem
Think Electron apps, desktop programs that are built with web technologies
Accessibility auditing isn’t something people normally do out of the goodness of their own heart
It’s typically performed after a lawsuit has been levied against an organization
Let me be clear: when you create inaccessible experiences, you are denying people their civil rights
The Americans With Disabilities Act guarantees that people cannot be discriminated against based on their disability conditions, and this extends to both private organizations and the digital space
This is a good thing, as it guarantees protections as more and more of the services necessary to living life go online
You need to work with what you have, not what you hope will be
Part of this means understanding that when we want things to be better, we need to understand that these kinds of changes are really technically complicated under the hood and spread across multiple concerns
On top of that, accessibility fixes are often viewed as unglamorous and deprioritized compared to other features
Don’t believe me?
Here’s the support matrix for the title attribute, incorporated into the W3C’s HTML spec in 1993.
It’s 26 years later and we still have a ton of interoperability problems.
I don’t mean to bum you out, and I don’t expect you to become accessibility experts
However, as people who are interested in more than just the surface level of things, you are uniquely empowered to affect positive change
What I ask of you is to at least incorporate basic accessibility testing into your workflow
If anything, just check to see if a screen reader crashes:
A bad assistive technology experience is better than none at all
The slide background is yellow now!
This is what we call a great segue
I’m a designer by trade,
Part of that job means coming up with alternate strategies for allowing people to accomplish their goals
because sometimes the most effective solution isn’t necessarily the most obvious one
With that in mind, we’re going to talk about another definition of performance, which is the ability to actually accomplish something
All of these little surgical tweaks and optimizations don’t mean squat if people don’t understand how to operate the really fast thing you give them
Another aspect of dynamically injecting a ton of radio inputs into a page is that it adds more things for a person to think about
This is called cognitive load, and it’s a huge problem
It affects our ability to accomplish tasks, and accomplish tasks accurately
Namely it inhibits our memory
Reading comprehension level
Math comprehension level
And shockingly, our ability to actually understand what we see
It’s such an important problem that NASA developed the Task Load Index, an assessment tool to help quantify it
This isn’t warm fuzzy feelings about empathy, this is a serious attempt by a government agency to refine efficiencies and prevent errors
One of the most interesting things about the existence of the Index is that it’s an admission that disability conditions can be conditional
Think about it: when’s the last time you were tired, distracted, sick, or learning a new skill?
Heck, when’s the last time you were drunk?
Cognitive load an important thing to track for building rockets, sure
But it also translates to other complicated things like making and using software
One of the things we can do to lessen cognitive load is to lessen the amount of information someone has to parse at any given moment
For the medical education app, we could have added an additional step into the user flow that asks a high level segmenting question
It’s a little more friction, sure
but it’s being used strategically as a focused choice, to keep the cognitive load lower
This would help filter down the results you get
Which makes it both easier for the person and the browser to parse
The other big-picture question we need to ask is if all this work is even necessary?
The most performant page is the one you never have to use, by which I mean how often can we side-step these issues by using other resources and information made available to us?
If you’re interested in this sort of thing, the 2018 Design in Tech report is a must-read piece
One of the things it was revealed is that surprisingly very few companies conduct qualitative user research
Which is the practice of asking people how they’d use a feature, and how they feel while they do it
I’m not a numbers person, but there does seem to be a trend going on here
Another interesting thing is barely any companies conduct qualitative testing for features after launching them
So we’re all just throwing features into the larger, ever-evolving ecosystem that is your product and not checking to see how it’ll affect things
We also need to remember that we’re only seeing those companies that beat the odds
The market is built on top of the corpses of failed businesses who poured all their cash and resources into the wrong features
The second big ask from this talk is really just repeating the first one
It’s one thing to read about something and believe it to be true, but it’s another thing entirely to put it out into the world without verifying it works
The web, and more importantly, the people who use it, are too important not to
Thank you to our captioner,
Michelle and the other staff
Mina for being a great emcee
Estelle for organizing such a wonderful conference
And most importantly, you, our audience
The slides for this talk are available online on Notist
And I’ll be both tweeting out a link to it and linking to it on my personal website