Developing Applications for Performance by Leon Fayer @papa_fire
A presentation at Coder Cruise in November 2016 in New Orleans, LA, USA by Leon Fayer
Developing Applications for Performance by Leon Fayer @papa_fire
{me} ‣ ‣ ‣ ‣ developer ++ i drink and i fix code vp @ OmniTI make slow go fast @papa_fire
disclaimers @papa_fire
disclaimers 1. performance is technology agnostic @papa_fire
disclaimers 1. performance is technology agnostic 2. performance != scalability @papa_fire
disclaimers 1. performance is technology agnostic 2. performance != scalability 3. there is more to performance @papa_fire
disclaimers 1. performance is technology agnostic 2. performance != scalability 3. there is more to performance 4. it’s (almost) all about web @papa_fire
why performance? @papa_fire
it improves user experience @papa_fire
(and as a result) it effects business @papa_fire
“ Just a 100-millisecond delay in load time hurt conversion rate by 7% https://www.soasta.com/wp-content/uploads/2017/04/State-of-Online-Retail-Performance-Spring-2017.pdf @papa_fire
it saves money (in production) @papa_fire
50 users/sec x served by 2 web servers x costing $300/month $600/month @papa_fire
1000 users/sec x served by 40 web servers x costing $300/month $12,000/month @papa_fire
this is why cloud doesn’t solve all your problems @papa_fire
only slide on scalability ‣ don’t scale instead of improving performance ‣ keep scaling in mind during design @papa_fire
what can you optimize? @papa_fire
everything! @papa_fire
<front end> @papa_fire
High Performance Web Sites by Steve Souders @papa_fire
SPOF (single point of failure) @papa_fire
(in the world of web) service integrations @papa_fire
ngm.com @papa_fire
codercruise.com http://requestmap.webperf.tools @papa_fire
what happens when they fail? @papa_fire
http://www.webpagetest.org/ @papa_fire
3rd party = no control @papa_fire
one failure is enough @papa_fire
PSA services can be your own @papa_fire
tips @papa_fire
tips only connect to the services you need @papa_fire
tips only connect to the services you need avoid external connection on critical path @papa_fire
tips only connect to the services you need avoid external connection on critical path make external calls asynchronously @papa_fire
tips only connect to the services you need avoid external connection on critical path make external calls asynchronously trap errors and timeouts @papa_fire
tips only connect to the services you need avoid external connection on critical path make external calls asynchronously trap errors and timeouts have a fallback plan @papa_fire
</front end> @papa_fire
<database> @papa_fire
http://use-the-index-luke.com Markus Winand @papa_fire
n+1 @papa_fire
$blog = new Blog(‘leon’); $posts = $blog->get_all_posts(); foreach ($posts as $post) { $post->comments = $post->get_comments(); } @papa_fire
$dbh = new DB(…); $sth = $dbh->prepare(“select * from posts where author = ?”); $sth->execute(‘leon’); $posts = array(); foreach ($sth->fetchAll() as $p) { $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; array_push($posts, $post); } foreach ($posts as $post) { $dbh2 = new DB(…); $sth_comm = $dbh2->prepare(“select * from post_comments where post_id = ?”); $sth_comm->execute($post->post_id); foreach(sth_comm->fetchAll() as $c) { $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } } @papa_fire
$sth $dbh = new DB(…); $sth = $dbh->prepare(“select * from blogsposts where username = ?”); = $dbh->prepare(“select * from where author $sth->execute(‘leon’); $posts = array(); = ?”); foreach ($sth->fetchAll() as $p) { $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; array_push($posts, $post); } foreach ($posts as $post) { $dbh2 = new DB(…); $sth_comm = $dbh2->prepare(“select * from post_comments where post_id = ?”); $sth_comm->execute($post->post_id); foreach(sth_comm->fetchAll() as $c) { $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } } @papa_fire
$dbh = new DB(…); $sth = $dbh->prepare(“select * from posts where author = ?”); $sth->execute(‘leon’); $posts = array(); foreach ($sth->fetchAll() as $p) { $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; array_push($posts, $post); } foreach ($posts as $post) { $dbh2 = new DB(…); = $dbh2->prepare(“select ** from from post_comments where where post_id post_id = ?”); $sth_comm$sth_comm = $dbh2->prepare(“select post_comments $sth_comm->execute($post->post_id); foreach(sth_comm->fetchAll() as $c) { $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } } = ?”); @papa_fire
$dbh = new DB(…); $sth = $dbh->prepare(“select * from posts where author = ?”); $sth->execute(‘leon’); $posts = array(); foreach ($sth->fetchAll() as $p) { $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; array_push($posts, $post); } foreach ($posts $post) { foreach ($posts as as $post) { $dbh2 = new DB(…); = $dbh2->prepare(“select ** from from post_comments where where post_id post_id = ?”); $sth_comm$sth_comm = $dbh2->prepare(“select post_comments $sth_comm->execute($post->post_id); foreach(sth_comm->fetchAll() as $c) { $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } } = ?”); @papa_fire
1 post = 2 queries 10 posts = 11 queries 100 posts = 101 queries … @papa_fire
but that’s not all @papa_fire
$dbh == new $dbh newDB(…); DB(…); $sth = $dbh->prepare(“select * from posts where author = ?”); $sth->execute(‘leon’); $posts = array(); foreach ($sth->fetchAll() as $p) { $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; array_push($posts, $post); } foreach ($posts as $post) { $dbh2 = newDB(…); DB(…); $dbh2 = new $sth_comm = $dbh2->prepare(“select * from post_comments where post_id = ?”); $sth_comm->execute($post->post_id); foreach(sth_comm->fetchAll() as $c) { $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } } @papa_fire
@papa_fire
@papa_fire
the right way @papa_fire
$dbh = new DB(…); $sth = $dbh->prepare(“select [list columns you need] from posts p, post_comments c where p.id = c.post_id and p.username = ? order by p.id DESC”); $sth->execute(‘leon’); $posts = array(); $post = new Post(); $active_pid = 0; foreach ($sth->fetchAll() as $p) { if ($post->id != $active_pid) { // if this row is for the post we’re working on if ($post->id) { array_push($posts, $post); } $active_pid = $post->id; // getting comments for this post $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; } $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } @papa_fire
$dbh newnew DB(…); $dbh= = DB(…); $sth = $dbh->prepare(“select [list columns you need] from posts p, post_comments c $sth = $dbh->prepare(“select [list columns you need] where p.id = c.post_id and p.username = ? from posts p, post_comments c order by p.id DESC”); where p.id = c.post_id $sth->execute(‘leon’); and p.username = ? $posts = array(); $post = new Post(); order by p.id DESC”); $active_pid = 0; foreach ($sth->fetchAll() as $p) { if ($post->id != $active_pid) { // if this row is for the post we’re working on if ($post->id) { array_push($posts, $post); } $active_pid = $post->id; // getting comments for this post $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; } $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } @papa_fire
1 post = 1 query 10 posts = 1 query 100 posts = 1 query … @papa_fire
PSA Common Table Expressions (CTE) to further reduce a number of queries http://omniti.com/seeds/writable-ctes-improve-performance @papa_fire
$blog = new Blog(‘leon’); $posts = $blog->get_all_posts_with_comments(); @papa_fire
$dbh = new DB(…); $sth = $dbh->prepare(“select [list columns you need] from posts p, post_comments c where p.id = c.post_id and p.username = ? order by p.id DESC”); $sth->execute(‘leon’); $posts = array(); $post = new Post(); $active_pid = 0; $posts = $blog->get_all_posts_with_comments(); foreach ($sth->fetchAll() as $p) { if ($post->id != $active_pid) { // if this row is for the post we’re if ($post->id) { array_push($posts, $post); } $active_pid = $post->id; // getting comments for this post $post = new Post(); $post->id = $p->[‘id’]; $post->title = $p->[‘title’]; $post->author = $p->[‘author’]; $post->body = $p->[‘body’]; } $comment = new Comment(); $comment->id = $c->[‘id’]; $comment->author = $c->[‘author’]; $comment->body = $c->[‘body’]; array_push($post->comments, $comment); } @papa_fire
that’s awesome! (right?) @papa_fire
do you know what’s under the hood? @papa_fire
ORMs are evil @papa_fire
you have no idea how they work @papa_fire
“ think about ORM as the most junior developer you’ve ever worked with @papa_fire
select * from ( select ( bannerid, caption, client_url, image_file, sponsorid, weight from select V.bannerid, V.impressions, B.caption, B.client_url, B.image_file, s.sponsorid, s.weight, row_number() over (partition by s.sponsorid order by s.weight desc) ranking FROM ( — This level gives me a list of banners sorted by least seen,and then by highest weight select valid.bannerid, valid.totalweight, count(I.timestamp) as impressions FROM ( — This level gets me a list of banners that are valid for display select b.bannerid, — Add up the weight from 4 sources. Banner weight, and weight for each data item they match decode( decode(bitand(u.STATE_BM1,b.STATE_BM1),0,0,1) + decode(bitand(u.STATE_BM2,b.STATE_BM2),0,0,1) + decode(bitand(u.STATE_BM3,b.STATE_BM3),0,0,1), 0,0,b.STATE_WT ) + decode(bitand(u.AGE_BM,b.AGE_BM),0,0,b.AGE_WT)+ decode(bitand(u.GENDER_BM,b.GENDER_BM),0,0,b.GENDER_WT)+ b.weight as totalweight from tgif.tbl_users u, tgif.tbl_banners b, tgif.tbl_bannerstats bs where — I only care about ME! u.userid= 1 — Don’t show inactive banners and b.inactive != 1 — Only show banners that are currently running and sysdate < b.end_date and sysdate >=b.start_date — Only get the type of banner i’m looking for and b.type= 3 — Join on the total stats, and only display banners that haven’t reached their per banner maximums and b.bannerid = bs.bannerid and ( b.max_impressions IS NULL OR bs.total_impressions < b.max_impressions ) and ( b.max_clicks IS NULL OR bs.total_clicks < b.max_clicks ) and ( b.max_conversions IS NULL OR bs.total_conversions < b.max_conversions ) — Ignore any banners that don’t match their demographics (ie, male banner won’t go to females) and ( b.AGE_BM IS NULL OR b.AGE_BM = 0 OR bitand(u.AGE_BM, b.AGE_BM) != 0 ) and ( b.GENDER_BM IS NULL OR b.GENDER_BM =0 OR bitand(u.GENDER_BM, b.GENDER_BM) != 0 ) and ( b.STATE_BM1 IS NULL OR b.STATE_BM1 =0 OR bitand(u.STATE_BM1, b.STATE_BM1) != 0 ) and ( b.STATE_BM2 IS NULL OR b.STATE_BM2 =0 OR bitand(u.STATE_BM2, b.STATE_BM2) != 0 ) and ( b.STATE_BM3 IS NULL OR b.STATE_BM3 =0 OR bitand(u.STATE_BM3, b.STATE_BM3) != 0 ) — But don’t show me any banners that I have already signed up and b.bannerid NOT IN ( SELECT B.bannerid FROM tgif.tbl_bannerconversions C, tgif.tbl_banners B, tgif.tbl_sponsors sp WHERE C.USERID=1 AND C.bannerid=B.bannerid AND B.sponsorid=sp.sponsorid — unless they have a conversion interval, and that interval has expired AND ( sp.conversion_interval = 0 OR sysdate > C.timestamp+sp.conversion_interval ) ) — Don’t show me any banners that have SPONSORS that have reached their maximums and b.sponsorid NOT IN ( — I believe this would be better done using HAVING clauses, but I can’t figure it out — Take the banners for a sponsor in the bannerstats table, and get the totals per sponsor — return anything that has reached it’s maximum select sponsorid FROM ( SELECT S.sponsorid, S.max_impressions, S.max_conversions, S.max_clicks, sum(total_impressions) as imps, sum(total_conversions) as convs, sum(total_clicks) as clicks FROM tgif.tbl_sponsors S, tgif.tbl_banners B, tgif.tbl_bannerstats bs WHERE S.sponsorid=B.sponsorid AND B.bannerid=bs.bannerid GROUP BY S.Sponsorid, S.max_impressions, S.max_conversions, S.max_clicks ) exclude WHERE ( imps > max_impressions OR convs >= max_conversions OR clicks > max_clicks ) ) ) valid, tgif.tbl_bannerimpressions I where valid.bannerid=I.bannerid(+) and I.userid(+)=1 group by valid.bannerid, valid.totalweight — I want to see banners I haven’t seen yet, sorted by highest weight, so we sort by number — of times that this user has seen this particular banner, then we sort by weight order by impressions, totalweight DESC ) V, tgif.tbl_banners B, tgif.tbl_sponsors S where B.bannerid=V.bannerid and B.sponsorid=S.sponsorid and S.inactive != 1 and s.sponsorid not in ( ) valid, tgif.tbl_bannerimpressions I where valid.bannerid=I.bannerid(+) and I.userid(+)=1 group by valid.bannerid, valid.totalweight — I want to see banners I haven’t seen yet, sorted by highest weight, so we sort by number — of times that this user has seen this particular banner, then we sort by weight order by impressions, totalweight DESC ) V, tgif.tbl_banners B, tgif.tbl_sponsors S where B.bannerid=V.bannerid and B.sponsorid=S.sponsorid and S.inactive != 1 and s.sponsorid not in ( — Check the user impression cap to make sure it hasn’t been passed by the user select s.sponsorid from tgif.tbl_banners b, tgif.tbl_sponsors s, tgif.TBL_BANNERIMPRESSIONS i where s.sponsorid = b.sponsorid and b.bannerid = i.bannerid and i.timestamp >= sysdate - nvl(user_impression_cap_days,100) and userid = 1 group by s.sponsorid having count(*) >= max(nvl(user_impression_cap,1000000000)) ) — Make sure the sponsor is still in the valid table. This table is updated hourly — and contains the sponsors that have not gone over their sponsor level frequencies for — impressions/conversions/clicks and s.sponsorid in (select sponsorid from tgif.tbl_active_sponsors) ) where ranking=1 —Order the banners by sponsor weight, which is handled by the ranking —order by S.weight order by impressions, weight desc would you trust a junior with this query? ) where rownum <= 10; @papa_fire
object construction is very expensive @papa_fire
METHOD Base ORM REAL USER SYS PCPU 6.330 5.771 0.212 94.51 @papa_fire
METHOD REAL USER SYS PCPU Base ORM 6.330 5.771 0.212 94.51 SQL without objects 0.664 0.274 0.120 59.35 @papa_fire
METHOD REAL USER SYS PCPU Base ORM 6.330 5.771 0.212 94.51 SQL without objects 0.664 0.274 0.120 59.35 SQL with ORM objects 6.354 5.797 0.197 94.34 @papa_fire
get_all_* are the worst @papa_fire
@papa_fire
ORMs are bad for high-traffic production for replacing SQL skills @papa_fire
“ I have yet to find a tool built to avoid using SQL that doesn’t become more complicated to use than just learning SQL John Simone (@j_simone) @papa_fire
ORMs are not bad for prototyping for low-traffic systems for non-public traffic @papa_fire
PSA lazy loading doesn’t solve all your problems @papa_fire
</database> @papa_fire
if you can’t optimize the problem hide the problem @papa_fire
“ there are only two hard things in Computer Science: cache invalidation and naming things Phil Karlton @papa_fire
is stale content > slow content? @papa_fire
stale is relative @papa_fire
how valuable is 1 minute? @papa_fire
100 request/sec x served by 2 web servers x 1 minute cache (60 seconds) 2 “expensive” reqs/min 11,998 “cheap” reqs/min @papa_fire
① browser ② on-disk ③ in-memory @papa_fire
① browser ② on-disk ③ in-memory @papa_fire
remote @papa_fire
cache-control header @papa_fire
.htaccess # cache all pages for one month Header set Cache-Control “max-age=2628000, public” @papa_fire
.htaccess # cache static assets for one month <filesMatch “.(jpg|jpeg|png|gif|js|ico)$”> Header set Cache-Control “max-age=2628000, public” </filesMatch> @papa_fire
php # cache this page for one minute <? header(“Cache-Control: max-age=60”); ?> @papa_fire
you don’t control browsers @papa_fire
progressive web applications (service workers) https://www.youtube.com/watch?v=3ol3-kvCNeQ Patrick Meenan, Velocity @papa_fire
① browser ② on-disk ③ in-memory @papa_fire
local, persistant @papa_fire
read content from file @papa_fire
read content from file instead of … … external source (API, database) … processing content again @papa_fire
returnContent() { return generateContent(); } @papa_fire
returnContent() { if (has_cache($cache_file, $cache_ttl)) { return _read_cache($cache_file); } else { return _write_cache($cache_file); } } @papa_fire
@papa_fire
have to manage your own ttl @papa_fire
sub has_cache { my $cache_file = shift; my $cache_ttl = shift; #in seconds if (-f $cache_file) { # in seconds my $last_updated = (stat($cache_file))[9]; my $now = int (gettimeofday); my $diff = $now - $last_updated; return 1 if ($now - $last_updated <= $cache_ttl); } return 0; } @papa_fire
① browser ② on-disk ③ in-memory @papa_fire
memcache, redis @papa_fire
local or distributed @papa_fire
function returnContent($article_id) { $memcache = new Memcache; $cacheAvailable = $memcache->connect(MEMCACHED_HOST, MEMCACHED_PORT); if ($cacheAvailable) { $cached_article = $memcache->get(‘article-’.$article_id); if (!$cached_article) { $article = getArticle($article_id); $memcache->set(‘article-‘.$article_id, $article); return $article; } else { return $cached_article; } } } @papa_fire
function returnContent($article_id) { $memcache new Memcache; $memcache = =new Memcache; $cacheAvailable = $memcache->connect(MEMCACHED_HOST, MEMCACHED_PORT); if ($cacheAvailable) { $cached_article = $memcache->get(‘article-’.$article_id); $cached_article = $memcache->get(‘article-’.$article_id); (!$cached_article) { if if (!$cached_article) { $article = getArticle($article_id); $memcache->set(‘article-‘.$article_id, $article); $memcache->set(‘article-‘.$p->[‘article_id’], $article); return $article; } else { return $cached_article; } } } @papa_fire
returnContent() { $content = read_cache($cache_key); if (!$content) { $content = generateContent(); write_cache($cache_key, $content); } return $content; } @papa_fire
remember /get/articles/all ? (multi-nested ORM function) @papa_fire
METHOD Base call BASE LOAD 20x TRAFFIC SPIKE 5.782 s 12.127 s @papa_fire
BASE LOAD 20x TRAFFIC SPIKE Base call 5.782 s 12.127 s Base call with memcache 0.867 s 0.922 s METHOD @papa_fire
not just for db results @papa_fire
$memcache->set($uri, $html); @papa_fire
doesn’t have to be 100% static @papa_fire
don’t forget to purge @papa_fire
careful what you cache @papa_fire
“pretty” urls @papa_fire
“pretty” urls /my/profile @papa_fire
“ugly” urls @papa_fire
“ugly” urls /prodlist.php?SID=a3e9590a-6598-11e7-907b-a6006ad3dba0 @papa_fire
external cache @papa_fire
reverse HTTP proxy @papa_fire
Apache Traffic Server, Varnish @papa_fire
CDN Akamai, Fastly @papa_fire
rule-based controls @papa_fire
use as many or as few @papa_fire
remember the part about critical path? @papa_fire
anything not related to the immediate outcome should not be in the critical path @papa_fire
router.post(‘/register’, function(req, res){ var user = new User(req.body); user.save( function(err) { if (!err) { user.savePreferences(req.pref); user.sendWelcomeEmail(); res.render(‘first_time_welcome’, { title: ‘Welcome’ + user.name, errors: [err] }); } }); }); @papa_fire
router.post(‘/register’, function(req, res){ var user = new User(req.body); user.save( function(err) { if (!err) { user.savePreferences(req.pref); user.sendWelcomeEmail(); user.sendWelcomeEmail(); res.render(‘first_time_welcome’, { title: ‘Welcome’ + user.name, errors: [err] }); } }); }); @papa_fire
User.sendWelcomeEmail = function(callback){ if (!do_not_email(user.email) { var email = new Email(); email.subject = “Welcome ” + user.firstname; email.cta = getDealoftheDay(); email.secondary = getOffers(user); email.generateBody(user, function() { email.send(); }); } }; @papa_fire
does it have to be done now? @papa_fire
router.post(‘/register’, function(req, res){ var user = new User(req.body); user.save( function(err) { if (!err) { user.savePreferences(req.pref); sendToQueue(‘welcome email’,user); sendToQueue(‘welcome email’,user); res.render(‘first_time_welcome’, { title: ‘Welcome’ + user.name, errors: [err] }); } }); }); @papa_fire
queueing (v.) - putting things somewhere else to deal with later @papa_fire
where? 1. in-memory (RabbitMQ, SQS) 2. persistant (database) @papa_fire
so … @papa_fire
optimize everything (right?) @papa_fire
“ premature optimization is the root of all evil Donald Knuth @papa_fire
improvements should be … 1. needed @papa_fire
improvements should be … 1. needed 2. incremental @papa_fire
improvements should be … 1. needed 2. incremental 3. measurable @papa_fire
that said @papa_fire
always think about performance @papa_fire
parting tip there is an exception to every rule @papa_fire
thank you questions? @papa_fire