A presentation at CodeFest 2017 in in Novosibirsk, Novosibirsk Oblast, Russia by Jayne Mast
Bad Evidence
Paul Verbeek-Mast
workingatbooking.com
All A/B tests and data shown in this presentation are not based on real experiments. They are made up just for this presentation.
Bad Evidence
Base база
Base Variant база вариант
Base Variant база вариант
Base Variant база вариант
Base Variant база вариант
Base Variant база вариант
Base Variant вариант база
Base Variant 5234 searches вариант база
Base Variant 5234 searches 6252 searches вариант база
Base Variant 5234 searches 6252 searches +19.45% вариант база
Variant 6252 searches +19.45% вариант Base
5234 searches база
Base Variant база вариант
Base Variant Making the search box hotpink will result in more searches база вариант
Making the search box hotpink will result in more searches
Making the search box hotpink will result in more searches 6252 searches +19.45%
Making the search box hotpink will result in more searches 6252 searches +19.45% 242 bookings -4.7%
Making the search box hotpink will result in more searches
Making the search box hotpink will result in more searches ?
How much do you want to create
“Bad Evidence”? Насколько
вы
готовы
получить
доказательство
обратного ?
You don’t want to do something if it is going to go against your theory of the case.
Вы
не
хотите
делать
то
что
повредит
вашей
теории
Rather than trying to get to the truth, what you’re trying to do is build your case, and make it the strongest case possible. Вместо
того
чтобы
докопаться
до
истины
вы
пытаетесь
защитить
свою
версию , сделав
ее
доказательство " пуленепробиваемым ".
W h a t d o e s v e r ific a t i o n b i a s c a u s e y o u t o d o ? I g n o r e i t a n d p u s h i t t o t h e s i d e . Ч т о в ы б у д е т е д е л а т ь с о с в о е й п р е д в з я т о с т ь ю ? П р о с т о и г н о р и р у й т е е е .
Bad Evidence
Verification bias
Because of (why) we believe that changing (what) for (who) will result into (outcome)
Why Objective and based on data Because of (why) we believe
Why • Because of a gut feeling, we believe (…)
• Because I like it better, we believe (…)
• Because I saw it on another website, we believe (…) Bad examples Objective and based on data Because of (why) we believe
Why • Because of research described in article (…), we believe (…)
• After done user research, we believe (…) • Based on a previous experiment doing (…), we believe (…) Objective and based on data Good examples Because of (why) we believe
What An accurate, short description of your change Because of (why) we believe that changing (what )
What An accurate, short description of your change • changing it to pink • changing the title that is on the top of the first block on the home page to 16px Arial #FF0000 Bad examples Because of (why) we believe that changing (what )
What • changing the background of search box to pink • opening pictures in the search page in a lightbox when clicking on it Good examples An accurate, short description of your change Because of (why) we believe that changing (what )
Who A realistic, accurate description of your target group Because of (why) we believe that changing (what) for (who )
Who A realistic, accurate description of your target group • everyone • users booking a hotel in Novosibirsk, named Paul, from Amsterdam, with a big beard Bad examples Because of (why) we believe that changing (what) for (who )
Who A realistic, accurate description of your target group • users visiting the home page • users searching for a property in Novosibirsk
• users who are logged in Good examples Because of (why) we believe that changing (what) for (who )
Outcome measurable, expected changes Because of (why) we believe that changing (what) for (who) will result into (outcome)
Outcome measurable, expected changes • users feeling better • the site looking prettier • an increase in loyalty Bad examples Because of (why) we believe that changing (what) for (who) will result into (outcome)
Outcome • an increase in earnings • a decrease in returned products • an increase in sign-ups Good examples Because of (why) we believe that changing (what) for (who) will result into (outcome) measurable, expected changes
Because of (why) we believe that changing (what) for (who) will result into (outcome)
Because of user research we believe that changing (what) for (who) will result into (outcome)
Because of user research we believe that changing the background of the search box to pink for (who) will result into (outcome)
Because of user research we believe that changing the background of the search box to pink for users that visit the homepage will result into (outcome)
Because of user research we believe that changing the background of the search box to pink for users that visit the homepage will result into an increase in bookings
Because of user research we believe that changing the background of the search box to pink for users that visit the homepage will result into an increase in bookings
Because of user research we believe that changing the background of the search box to pink for users that visit the homepage will result into an increase in bookings
You can never be 100% confident that your test is correct
The more you measure, the higher
the chance some things are incorrect
clicks on button hover over button bookings visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card
visits on page clicks on button hover over button bookings scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card
logins clicks on button hover over button bookings visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card
bookings from Malaysia clicks on button hover over button bookings visits on page scrolled to button bookings from IE8 users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card
clicks on button hover over button bookings visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card
clicks on button hover over button bookings visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card +0.1% -0.2% +2.3% +0.3% +4.7% -3.1% +0.0% +3.5% -1.1% -2.1% +0.3% +2.1% -1.8% -0.3% +0.0% +0.5% +4.3% -0.2%
clicks on button hover over button bookings bookings from IE8 bookings from Malaysia time on page price of booking number of rooms booked language changes +2.3% +4.7% -3.1% +3.5% -1.1% -2.1% +2.1% -1.8% +4.3% visits on page scrolled to button users going to search results logins sign ups clicks on logo returning visitors calls to customer service buys with credit card +0.1% -0.2% +0.3% +0.0% +0.3% -0.3% +0.0% +0.5% -0.2%
bookings -1.8% clicks on button hover over button visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card +0.1% -0.2% +2.3% +0.3% +4.7% -3.1% +0.0% +3.5% -1.1% -2.1% +0.3% +2.1% -0.3% +0.0% +0.5% +4.3% -0.2%
bookings price of booking calls to customer service -0.2% -1.8% +4.3% clicks on button hover over button visits on page scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors number of rooms booked language changes buys with credit card +0.1% +2.3% +0.3% +4.7% -3.1% +0.0% +3.5% -1.1% -2.1% +0.3% +2.1% -0.3% +0.0% +0.5% -0.2%
Focus on your defined metrics , but also keep an eye on your health metrics
Be honest with yourself
Metrics that are not in hypothesis
“price is going up, so it must be doing well” “price is going down, so it must be a false negative” vs. Metrics that are not in hypothesis
Newly implemented metrics
“this new metric is positive, it’s working great!” “this new metric is negative, must be having a bug” vs. Newly implemented metrics
Sample size
“it’s positive after 5 days, let’s put it in production” “it’s negative after 5 days, let’s run it for another few days” vs. Sample size
• Number of visitors
• How big of a change you want to measure • How confident you want to be, that your test is correct How long should your run your A/B test?
Create a solid hypothesis , and stick to it
Make your decision based on data
There is no such thing as bad evidence , just a bad hypothesis
So stop building cases, and find the truth
Paul Verbeek-Mast
@_paulverbeek verbeek.p@gmail.com
Questions? Вопросы ?
Criminal investigators have something called ‘bad evidence’ or confirmation bias. When they have a theory about a case, sometimes they tend to avoid evidence that goes against that theory. Unconsciously but also consciously.
This is a big problem we have with testing product hypothesis as well. We tend to ignore the data that go against our theory. And if we do have “bad data” we can’t go around, we test it a bit longer until this disappears.
Why do we do this and how do you deal with this? What other common pitfalls do we have? And is hypothesis testing really worth the time?