A presentation at Coder Cruise in in New Orleans, LA, USA by Leon Fayer
Developing Applications for Performance by Leon Fayer @papa_fire
{me} ‣ ‣ ‣ ‣ developer ++ i drink and i fix code vp @ OmniTI make slow go fast @papa_fire
disclaimers @papa_fire
disclaimers 1. performance is technology agnostic @papa_fire
disclaimers 1. performance is technology agnostic 2. performance != scalability @papa_fire
disclaimers 1. performance is technology agnostic 2. performance != scalability 3. there is more to performance @papa_fire
disclaimers 1. performance is technology agnostic 2. performance != scalability 3. there is more to performance 4. it’s (almost) all about web @papa_fire
why performance? @papa_fire
it improves user experience @papa_fire
(and as a result) it effects business @papa_fire
“ Just a 100-millisecond delay in load time hurt conversion rate by 7% https://www.soasta.com/wp-content/uploads/2017/04/State-of-Online-Retail-Performance-Spring-2017.pdf @papa_fire
it saves money (in production) @papa_fire
50 users/sec x served by 2 web servers x costing $300/month $600/month @papa_fire
1000 users/sec x served by 40 web servers x costing $300/month $12,000/month @papa_fire
this is why cloud doesn’t solve all your problems @papa_fire
only slide on scalability ‣ don’t scale instead of improving performance ‣ keep scaling in mind during design @papa_fire
what can you optimize? @papa_fire
everything! @papa_fire
<front end> @papa_fire
High Performance Web Sites by Steve Souders @papa_fire
SPOF (single point of failure) @papa_fire
(in the world of web) service integrations @papa_fire
ngm.com @papa_fire
codercruise.com http://requestmap.webperf.tools @papa_fire
what happens when they fail? @papa_fire
http://www.webpagetest.org/ @papa_fire
3rd party = no control @papa_fire
one failure is enough @papa_fire
PSA services can be your own @papa_fire
tips @papa_fire
tips only connect to the services you need @papa_fire
tips only connect to the services you need avoid external connection on critical path @papa_fire
tips only connect to the services you need avoid external connection on critical path make external calls asynchronously @papa_fire
tips only connect to the services you need avoid external connection on critical path make external calls asynchronously trap errors and timeouts @papa_fire
tips only connect to the services you need avoid external connection on critical path make external calls asynchronously trap errors and timeouts have a fallback plan @papa_fire
</front end> @papa_fire
<database> @papa_fire
http://use-the-index-luke.com Markus Winand @papa_fire
n+1 @papa_fire
$blog = new Blog(‘leon’); $posts = $blog->get_all_posts(); foreach ($posts as $post) { $post->comments = $post->get_comments(); } @papa_fire
$dbh = new DB(…); $sth = $dbh->prepare(“select * from posts where author = ?”); $sth->execute(‘leon’); $posts = array(); foreach ($sth->fetchAll() as $p) { $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; array_push($posts, $post); } foreach ($posts as $post) { $dbh2 = new DB(…); $sth_comm = $dbh2->prepare(“select * from post_comments where post_id = ?”); $sth_comm->execute($post->post_id); foreach(sth_comm->fetchAll() as $c) { $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } } @papa_fire
$sth $dbh = new DB(…); $sth = $dbh->prepare(“select * from blogsposts where username = ?”); = $dbh->prepare(“select * from where author $sth->execute(‘leon’); $posts = array(); = ?”); foreach ($sth->fetchAll() as $p) { $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; array_push($posts, $post); } foreach ($posts as $post) { $dbh2 = new DB(…); $sth_comm = $dbh2->prepare(“select * from post_comments where post_id = ?”); $sth_comm->execute($post->post_id); foreach(sth_comm->fetchAll() as $c) { $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } } @papa_fire
$dbh = new DB(…); $sth = $dbh->prepare(“select * from posts where author = ?”); $sth->execute(‘leon’); $posts = array(); foreach ($sth->fetchAll() as $p) { $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; array_push($posts, $post); } foreach ($posts as $post) { $dbh2 = new DB(…); = $dbh2->prepare(“select ** from from post_comments where where post_id post_id = ?”); $sth_comm$sth_comm = $dbh2->prepare(“select post_comments $sth_comm->execute($post->post_id); foreach(sth_comm->fetchAll() as $c) { $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } } = ?”); @papa_fire
$dbh = new DB(…); $sth = $dbh->prepare(“select * from posts where author = ?”); $sth->execute(‘leon’); $posts = array(); foreach ($sth->fetchAll() as $p) { $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; array_push($posts, $post); } foreach ($posts $post) { foreach ($posts as as $post) { $dbh2 = new DB(…); = $dbh2->prepare(“select ** from from post_comments where where post_id post_id = ?”); $sth_comm$sth_comm = $dbh2->prepare(“select post_comments $sth_comm->execute($post->post_id); foreach(sth_comm->fetchAll() as $c) { $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } } = ?”); @papa_fire
1 post = 2 queries 10 posts = 11 queries 100 posts = 101 queries … @papa_fire
but that’s not all @papa_fire
$dbh == new $dbh newDB(…); DB(…); $sth = $dbh->prepare(“select * from posts where author = ?”); $sth->execute(‘leon’); $posts = array(); foreach ($sth->fetchAll() as $p) { $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; array_push($posts, $post); } foreach ($posts as $post) { $dbh2 = newDB(…); DB(…); $dbh2 = new $sth_comm = $dbh2->prepare(“select * from post_comments where post_id = ?”); $sth_comm->execute($post->post_id); foreach(sth_comm->fetchAll() as $c) { $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } } @papa_fire
@papa_fire
@papa_fire
the right way @papa_fire
$dbh = new DB(…); $sth = $dbh->prepare(“select [list columns you need] from posts p, post_comments c where p.id = c.post_id and p.username = ? order by p.id DESC”); $sth->execute(‘leon’); $posts = array(); $post = new Post(); $active_pid = 0; foreach ($sth->fetchAll() as $p) { if ($post->id != $active_pid) { // if this row is for the post we’re working on if ($post->id) { array_push($posts, $post); } $active_pid = $post->id; // getting comments for this post $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; } $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } @papa_fire
$dbh newnew DB(…); $dbh= = DB(…); $sth = $dbh->prepare(“select [list columns you need] from posts p, post_comments c $sth = $dbh->prepare(“select [list columns you need] where p.id = c.post_id and p.username = ? from posts p, post_comments c order by p.id DESC”); where p.id = c.post_id $sth->execute(‘leon’); and p.username = ? $posts = array(); $post = new Post(); order by p.id DESC”); $active_pid = 0; foreach ($sth->fetchAll() as $p) { if ($post->id != $active_pid) { // if this row is for the post we’re working on if ($post->id) { array_push($posts, $post); } $active_pid = $post->id; // getting comments for this post $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; } $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } @papa_fire
1 post = 1 query 10 posts = 1 query 100 posts = 1 query … @papa_fire
PSA Common Table Expressions (CTE) to further reduce a number of queries http://omniti.com/seeds/writable-ctes-improve-performance @papa_fire
$blog = new Blog(‘leon’); $posts = $blog->get_all_posts_with_comments(); @papa_fire
$dbh = new DB(…); $sth = $dbh->prepare(“select [list columns you need] from posts p, post_comments c where p.id = c.post_id and p.username = ? order by p.id DESC”); $sth->execute(‘leon’); $posts = array(); $post = new Post(); $active_pid = 0; $posts = $blog->get_all_posts_with_comments(); foreach ($sth->fetchAll() as $p) { if ($post->id != $active_pid) { // if this row is for the post we’re if ($post->id) { array_push($posts, $post); } $active_pid = $post->id; // getting comments for this post $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; } $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } @papa_fire
that’s awesome! (right?) @papa_fire
do you know what’s under the hood? @papa_fire
ORMs are evil @papa_fire
you have no idea how they work @papa_fire
“ think about ORM as the most junior developer you’ve ever worked with @papa_fire
select * from ( select ( bannerid, caption, client_url, image_file, sponsorid, weight from select V.bannerid, V.impressions, B.caption, B.client_url, B.image_file, s.sponsorid, s.weight, row_number() over (partition by s.sponsorid order by s.weight desc) ranking FROM ( — This level gives me a list of banners sorted by least seen,and then by highest weight select valid.bannerid, valid.totalweight, count(I.timestamp) as impressions FROM ( — This level gets me a list of banners that are valid for display select b.bannerid, — Add up the weight from 4 sources. Banner weight, and weight for each data item they match decode( decode(bitand(u.STATE_BM1,b.STATE_BM1),0,0,1) + decode(bitand(u.STATE_BM2,b.STATE_BM2),0,0,1) + decode(bitand(u.STATE_BM3,b.STATE_BM3),0,0,1), 0,0,b.STATE_WT ) + decode(bitand(u.AGE_BM,b.AGE_BM),0,0,b.AGE_WT)+ decode(bitand(u.GENDER_BM,b.GENDER_BM),0,0,b.GENDER_WT)+ b.weight as totalweight from tgif.tbl_users u, tgif.tbl_banners b, tgif.tbl_bannerstats bs where — I only care about ME! u.userid= 1 — Don’t show inactive banners and b.inactive != 1 — Only show banners that are currently running and sysdate < b.end_date and sysdate >=b.start_date — Only get the type of banner i’m looking for and b.type= 3 — Join on the total stats, and only display banners that haven’t reached their per banner maximums and b.bannerid = bs.bannerid and ( b.max_impressions IS NULL OR bs.total_impressions < b.max_impressions ) and ( b.max_clicks IS NULL OR bs.total_clicks < b.max_clicks ) and ( b.max_conversions IS NULL OR bs.total_conversions < b.max_conversions ) — Ignore any banners that don’t match their demographics (ie, male banner won’t go to females) and ( b.AGE_BM IS NULL OR b.AGE_BM = 0 OR bitand(u.AGE_BM, b.AGE_BM) != 0 ) and ( b.GENDER_BM IS NULL OR b.GENDER_BM =0 OR bitand(u.GENDER_BM, b.GENDER_BM) != 0 ) and ( b.STATE_BM1 IS NULL OR b.STATE_BM1 =0 OR bitand(u.STATE_BM1, b.STATE_BM1) != 0 ) and ( b.STATE_BM2 IS NULL OR b.STATE_BM2 =0 OR bitand(u.STATE_BM2, b.STATE_BM2) != 0 ) and ( b.STATE_BM3 IS NULL OR b.STATE_BM3 =0 OR bitand(u.STATE_BM3, b.STATE_BM3) != 0 ) — But don’t show me any banners that I have already signed up and b.bannerid NOT IN ( SELECT B.bannerid FROM tgif.tbl_bannerconversions C, tgif.tbl_banners B, tgif.tbl_sponsors sp WHERE C.USERID=1 AND C.bannerid=B.bannerid AND B.sponsorid=sp.sponsorid — unless they have a conversion interval, and that interval has expired AND ( sp.conversion_interval = 0 OR sysdate > C.timestamp+sp.conversion_interval ) ) — Don’t show me any banners that have SPONSORS that have reached their maximums and b.sponsorid NOT IN ( — I believe this would be better done using HAVING clauses, but I can’t figure it out — Take the banners for a sponsor in the bannerstats table, and get the totals per sponsor — return anything that has reached it’s maximum select sponsorid FROM ( SELECT S.sponsorid, S.max_impressions, S.max_conversions, S.max_clicks, sum(total_impressions) as imps, sum(total_conversions) as convs, sum(total_clicks) as clicks FROM tgif.tbl_sponsors S, tgif.tbl_banners B, tgif.tbl_bannerstats bs WHERE S.sponsorid=B.sponsorid AND B.bannerid=bs.bannerid GROUP BY S.Sponsorid, S.max_impressions, S.max_conversions, S.max_clicks ) exclude WHERE ( imps > max_impressions OR convs >= max_conversions OR clicks > max_clicks ) ) ) valid, tgif.tbl_bannerimpressions I where valid.bannerid=I.bannerid(+) and I.userid(+)=1 group by valid.bannerid, valid.totalweight — I want to see banners I haven’t seen yet, sorted by highest weight, so we sort by number — of times that this user has seen this particular banner, then we sort by weight order by impressions, totalweight DESC ) V, tgif.tbl_banners B, tgif.tbl_sponsors S where B.bannerid=V.bannerid and B.sponsorid=S.sponsorid and S.inactive != 1 and s.sponsorid not in ( ) valid, tgif.tbl_bannerimpressions I where valid.bannerid=I.bannerid(+) and I.userid(+)=1 group by valid.bannerid, valid.totalweight — I want to see banners I haven’t seen yet, sorted by highest weight, so we sort by number — of times that this user has seen this particular banner, then we sort by weight order by impressions, totalweight DESC ) V, tgif.tbl_banners B, tgif.tbl_sponsors S where B.bannerid=V.bannerid and B.sponsorid=S.sponsorid and S.inactive != 1 and s.sponsorid not in ( — Check the user impression cap to make sure it hasn’t been passed by the user select s.sponsorid from tgif.tbl_banners b, tgif.tbl_sponsors s, tgif.TBL_BANNERIMPRESSIONS i where s.sponsorid = b.sponsorid and b.bannerid = i.bannerid and i.timestamp >= sysdate - nvl(user_impression_cap_days,100) and userid = 1 group by s.sponsorid having count(*) >= max(nvl(user_impression_cap,1000000000)) ) — Make sure the sponsor is still in the valid table. This table is updated hourly — and contains the sponsors that have not gone over their sponsor level frequencies for — impressions/conversions/clicks and s.sponsorid in (select sponsorid from tgif.tbl_active_sponsors) ) where ranking=1 —Order the banners by sponsor weight, which is handled by the ranking —order by S.weight order by impressions, weight desc would you trust a junior with this query? ) where rownum <= 10; @papa_fire
object construction is very expensive @papa_fire
METHOD Base ORM REAL USER SYS PCPU 6.330 5.771 0.212 94.51 @papa_fire
METHOD REAL USER SYS PCPU Base ORM 6.330 5.771 0.212 94.51 SQL without objects 0.664 0.274 0.120 59.35 @papa_fire
METHOD REAL USER SYS PCPU Base ORM 6.330 5.771 0.212 94.51 SQL without objects 0.664 0.274 0.120 59.35 SQL with ORM objects 6.354 5.797 0.197 94.34 @papa_fire
get_all_* are the worst @papa_fire
@papa_fire
ORMs are bad for high-traffic production for replacing SQL skills @papa_fire
“ I have yet to find a tool built to avoid using SQL that doesn’t become more complicated to use than just learning SQL John Simone (@j_simone) @papa_fire
ORMs are not bad for prototyping for low-traffic systems for non-public traffic @papa_fire
PSA lazy loading doesn’t solve all your problems @papa_fire
</database> @papa_fire
if you can’t optimize the problem hide the problem @papa_fire
“ there are only two hard things in Computer Science: cache invalidation and naming things Phil Karlton @papa_fire
is stale content > slow content? @papa_fire
stale is relative @papa_fire
how valuable is 1 minute? @papa_fire
100 request/sec x served by 2 web servers x 1 minute cache (60 seconds) 2 “expensive” reqs/min 11,998 “cheap” reqs/min @papa_fire
① browser ② on-disk ③ in-memory @papa_fire
① browser ② on-disk ③ in-memory @papa_fire
remote @papa_fire
cache-control header @papa_fire
.htaccess # cache all pages for one month Header set Cache-Control “max-age=2628000, public” @papa_fire
.htaccess # cache static assets for one month <filesMatch “.(jpg|jpeg|png|gif|js|ico)$”> Header set Cache-Control “max-age=2628000, public” </filesMatch> @papa_fire
php # cache this page for one minute <? header(“Cache-Control: max-age=60”); ?> @papa_fire
you don’t control browsers @papa_fire
progressive web applications (service workers) https://www.youtube.com/watch?v=3ol3-kvCNeQ Patrick Meenan, Velocity @papa_fire
① browser ② on-disk ③ in-memory @papa_fire
local, persistant @papa_fire
read content from file @papa_fire
read content from file instead of … … external source (API, database) … processing content again @papa_fire
returnContent() { return generateContent(); } @papa_fire
returnContent() { if (has_cache($cache_file, $cache_ttl)) { return _read_cache($cache_file); } else { return _write_cache($cache_file); } } @papa_fire
@papa_fire
have to manage your own ttl @papa_fire
sub has_cache { my $cache_file = shift; my $cache_ttl = shift; #in seconds if (-f $cache_file) { # in seconds my $last_updated = (stat($cache_file))[9]; my $now = int (gettimeofday); my $diff = $now - $last_updated; return 1 if ($now - $last_updated <= $cache_ttl); } return 0; } @papa_fire
① browser ② on-disk ③ in-memory @papa_fire
memcache, redis @papa_fire
local or distributed @papa_fire
function returnContent($article_id) { $memcache = new Memcache; $cacheAvailable = $memcache->connect(MEMCACHED_HOST, MEMCACHED_PORT); if ($cacheAvailable) { $cached_article = $memcache->get(‘article-’.$article_id); if (!$cached_article) { $article = getArticle($article_id); $memcache->set(‘article-‘.$article_id, $article); return $article; } else { return $cached_article; } } } @papa_fire
function returnContent($article_id) { $memcache new Memcache; $memcache = =new Memcache; $cacheAvailable = $memcache->connect(MEMCACHED_HOST, MEMCACHED_PORT); if ($cacheAvailable) { $cached_article = $memcache->get(‘article-’.$article_id); $cached_article = $memcache->get(‘article-’.$article_id); (!$cached_article) { if if (!$cached_article) { $article = getArticle($article_id); $memcache->set(‘article-‘.$article_id, $article); $memcache->set(‘article-‘.$p->[‘article_id’], $article); return $article; } else { return $cached_article; } } } @papa_fire
returnContent() { $content = read_cache($cache_key); if (!$content) { $content = generateContent(); write_cache($cache_key, $content); } return $content; } @papa_fire
remember /get/articles/all ? (multi-nested ORM function) @papa_fire
METHOD Base call BASE LOAD 20x TRAFFIC SPIKE 5.782 s 12.127 s @papa_fire
BASE LOAD 20x TRAFFIC SPIKE Base call 5.782 s 12.127 s Base call with memcache 0.867 s 0.922 s METHOD @papa_fire
not just for db results @papa_fire
$memcache->set($uri, $html); @papa_fire
doesn’t have to be 100% static @papa_fire
don’t forget to purge @papa_fire
careful what you cache @papa_fire
“pretty” urls @papa_fire
“pretty” urls /my/profile @papa_fire
“ugly” urls @papa_fire
“ugly” urls /prodlist.php?SID=a3e9590a-6598-11e7-907b-a6006ad3dba0 @papa_fire
external cache @papa_fire
reverse HTTP proxy @papa_fire
Apache Traffic Server, Varnish @papa_fire
CDN Akamai, Fastly @papa_fire
rule-based controls @papa_fire
use as many or as few @papa_fire
remember the part about critical path? @papa_fire
anything not related to the immediate outcome should not be in the critical path @papa_fire
router.post(‘/register’, function(req, res){ var user = new User(req.body); user.save( function(err) { if (!err) { user.savePreferences(req.pref); user.sendWelcomeEmail(); res.render(‘first_time_welcome’, { title: ‘Welcome’ + user.name, errors: [err] }); } }); }); @papa_fire
router.post(‘/register’, function(req, res){ var user = new User(req.body); user.save( function(err) { if (!err) { user.savePreferences(req.pref); user.sendWelcomeEmail(); user.sendWelcomeEmail(); res.render(‘first_time_welcome’, { title: ‘Welcome’ + user.name, errors: [err] }); } }); }); @papa_fire
User.sendWelcomeEmail = function(callback){ if (!do_not_email(user.email) { var email = new Email(); email.subject = “Welcome ” + user.firstname; email.cta = getDealoftheDay(); email.secondary = getOffers(user); email.generateBody(user, function() { email.send(); }); } }; @papa_fire
does it have to be done now? @papa_fire
router.post(‘/register’, function(req, res){ var user = new User(req.body); user.save( function(err) { if (!err) { user.savePreferences(req.pref); sendToQueue(‘welcome email’,user); sendToQueue(‘welcome email’,user); res.render(‘first_time_welcome’, { title: ‘Welcome’ + user.name, errors: [err] }); } }); }); @papa_fire
queueing (v.) - putting things somewhere else to deal with later @papa_fire
where? 1. in-memory (RabbitMQ, SQS) 2. persistant (database) @papa_fire
so … @papa_fire
optimize everything (right?) @papa_fire
“ premature optimization is the root of all evil Donald Knuth @papa_fire
improvements should be … 1. needed @papa_fire
improvements should be … 1. needed 2. incremental @papa_fire
improvements should be … 1. needed 2. incremental 3. measurable @papa_fire
that said @papa_fire
always think about performance @papa_fire
parting tip there is an exception to every rule @papa_fire
thank you questions? @papa_fire
Scalability != performance. In fact, having to scale your architecture significantly with growth may be a symptom of a poor application performance. Yet, with the rise of cloud and the abundance of automation and container tools that simplify scalability aspect of your system, performance considerations are often pushed to the back row. Building systems for high performance is not easy. It requires a lot of considerations - from technology selection to design decisions. And “the cloud” does not magically solve those problems for you. In this talk I’ll discuss common performance pitfalls across the stack and talk about useful techniques and examples that every application could benefit from.
The following resources were mentioned during the presentation or are useful additional information.
Developer tips across the full stack for building web applications for high performance
Here’s what was said about this presentation on social media.
One of the best books for devs to learn about databases according to @papa_fire (“Not a joke!”) #ApplicationPerformance #phpworld #UseTheIndexLuke :-) pic.twitter.com/LHTtst1vhz
— Lisa McCray (@cotterpinx) November 16, 2017
To @papa_fire 's point -- ORMs are a great way to quickly get your product/feature to market, at which point, if it's successful, you've inherited all of the performance problems that come along with that choice -- plus now have to try and scale #phpworld
— Neal Anders (@nanderoo) November 16, 2017