Bad Evidence

Paul Verbeek-Mast

workingatbooking.com

All A/B tests and data shown in this presentation are not based on real experiments. They are made up just for this presentation.

Bad Evidence

Base база

Base Variant база вариант

Base Variant база вариант

Base Variant база вариант

Base Variant база вариант

Base Variant база вариант

Base Variant вариант база

Base Variant 5234 searches вариант база

Base Variant 5234 searches 6252 searches вариант база

Base Variant 5234 searches 6252 searches +19.45% вариант база

Variant 6252 searches +19.45% вариант Base

5234 searches база

Base Variant база вариант

Base Variant Making the search box hotpink will result in more searches база вариант

Making the search box hotpink will result in more searches

Making the search box hotpink will result in more searches 6252 searches +19.45%

Making the search box hotpink will result in more searches 6252 searches +19.45% 242 bookings -4.7%

Making the search box hotpink will result in more searches

Making the search box hotpink will result in more searches ?

How much do you want to create

“Bad Evidence”? Насколько

вы

готовы

получить

доказательство

обратного ?

You don’t want to do something if it is going to go against your theory of the case.

Вы

не

хотите

делать

что

то

что

повредит

вашей

теории

Rather than trying to get to the truth, what you’re trying to do is build your case, and make it the strongest case possible. Вместо

того

чтобы

докопаться

до

истины

вы

пытаетесь

защитить

свою

версию , сделав

ее

доказательство " пуленепробиваемым ".

W h a t d o e s v e r ific a t i o n b i a s c a u s e y o u t o d o ? I g n o r e i t a n d p u s h i t t o t h e s i d e . Ч т о в ы б у д е т е д е л а т ь с о с в о е й п р е д в з я т о с т ь ю ? П р о с т о и г н о р и р у й т е е е .

Bad Evidence

Verification bias

Because of (why) we believe that changing (what) for (who) will result into (outcome)

Why Objective and based on data Because of (why) we believe

Why • Because of a gut feeling, we believe (…)

• Because I like it better, we believe (…)

• Because I saw it on another website, we believe (…) Bad examples Objective and based on data Because of (why) we believe

Why • Because of research described in article (…), we believe (…)

• After done user research, we believe (…) • Based on a previous experiment doing (…), we believe (…) Objective and based on data Good examples Because of (why) we believe

What An accurate, short description of your change Because of (why) we believe that changing (what )

What An accurate, short description of your change • changing it to pink • changing the title that is on the top of the first block on the home page to 16px Arial #FF0000 Bad examples Because of (why) we believe that changing (what )

What • changing the background of search box to pink • opening pictures in the search page in a lightbox when clicking on it Good examples An accurate, short description of your change Because of (why) we believe that changing (what )

Who A realistic, accurate description of your target group Because of (why) we believe that changing (what) for (who )

Who A realistic, accurate description of your target group • everyone • users booking a hotel in Novosibirsk, named Paul, from Amsterdam, with a big beard Bad examples Because of (why) we believe that changing (what) for (who )

Who A realistic, accurate description of your target group • users visiting the home page • users searching for a property in Novosibirsk

• users who are logged in Good examples Because of (why) we believe that changing (what) for (who )

Outcome measurable, expected changes Because of (why) we believe that changing (what) for (who) will result into (outcome)

Outcome measurable, expected changes • users feeling better • the site looking prettier • an increase in loyalty Bad examples Because of (why) we believe that changing (what) for (who) will result into (outcome)

Outcome • an increase in earnings • a decrease in returned products • an increase in sign-ups Good examples Because of (why) we believe that changing (what) for (who) will result into (outcome) measurable, expected changes

Because of (why) we believe that changing (what) for (who) will result into (outcome)

Because of user research we believe that changing (what) for (who) will result into (outcome)

Because of user research we believe that changing the background of the search box to pink for (who) will result into (outcome)

Because of user research we believe that changing the background of the search box to pink for users that visit the homepage will result into (outcome)

Because of user research we believe that changing the background of the search box to pink for users that visit the homepage will result into an increase in bookings

Because of user research we believe that changing the background of the search box to pink for users that visit the homepage will result into an increase in bookings

Because of user research we believe that changing the background of the search box to pink for users that visit the homepage will result into an increase in bookings

You can never be 100% confident that your test is correct

The more you measure, the higher
the chance some things are incorrect

clicks on button hover over button bookings visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card

visits on page clicks on button hover over button bookings scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card

logins clicks on button hover over button bookings visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card

bookings from Malaysia clicks on button hover over button bookings visits on page scrolled to button bookings from IE8 users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card

clicks on button hover over button bookings visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card

clicks on button hover over button bookings visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card +0.1% -0.2% +2.3% +0.3% +4.7% -3.1% +0.0% +3.5% -1.1% -2.1% +0.3% +2.1% -1.8% -0.3% +0.0% +0.5% +4.3% -0.2%

clicks on button hover over button bookings bookings from IE8 bookings from Malaysia time on page price of booking number of rooms booked language changes +2.3% +4.7% -3.1% +3.5% -1.1% -2.1% +2.1% -1.8% +4.3% visits on page scrolled to button users going to search results logins sign ups clicks on logo returning visitors calls to customer service buys with credit card +0.1% -0.2% +0.3% +0.0% +0.3% -0.3% +0.0% +0.5% -0.2%

bookings -1.8% clicks on button hover over button visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card +0.1% -0.2% +2.3% +0.3% +4.7% -3.1% +0.0% +3.5% -1.1% -2.1% +0.3% +2.1% -0.3% +0.0% +0.5% +4.3% -0.2%

bookings price of booking calls to customer service -0.2% -1.8% +4.3% clicks on button hover over button visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors number of rooms booked language changes buys with credit card +0.1% +2.3% +0.3% +4.7% -3.1% +0.0% +3.5% -1.1% -2.1% +0.3% +2.1% -0.3% +0.0% +0.5% -0.2%

Focus on your defined metrics , but also keep an eye on your health metrics

Be honest with yourself

Metrics that are not in hypothesis

“price is going up, so it must be doing well” “price is going down, so it must be a false negative” vs. Metrics that are not in hypothesis

Newly implemented metrics

“this new metric is positive, it’s working great!” “this new metric is negative, must be having a bug” vs. Newly implemented metrics

Sample size

“it’s positive after 5 days, let’s put it in production” “it’s negative after 5 days, let’s run it for another few days” vs. Sample size

• Number of visitors

• How big of a change you want to measure • How confident you want to be, that your test is correct How long should your run your A/B test?

Create a solid hypothesis , and stick to it

Make your decision based on data

There is no such thing as bad evidence , just a bad hypothesis

So stop building cases, and find the truth

Paul Verbeek-Mast

@_paulverbeek verbeek.p@gmail.com

Questions? Вопросы ?