A presentation at Refresh Conf in in Groningen, Netherlands by Job van Achterberg
THE METHODS AND ETHICS OF AUTOMATED TURING TESTS HUMAN DETERMINATION
INTRODUCTION JOB VAN ACHTERBERG Tech lead, tenon.io @detonite Inclusive Design and Accessibility
THERE’S PLENTY OF ROOM AT THE BOTTOM (DECEMBER 29, 1959) LET ME REMIND YOU OF SOME OF THE PROBLEMS OF COMPUTING MACHINES. Richard P. Feynman
THERE’S PLENTY OF ROOM AT THE BOTTOM (DECEMBER 29, 1959) ▸“If I look at your face I immediately recognize that I have seen it before. “ ▸“Yet there is no machine which can take a picture of a face and say even that it is a man.”
THERE’S PLENTY OF ROOM AT THE BOTTOM (DECEMBER 29, 1959) ▸”If the face is changed; if I am closer to the face; if I am further from the face; if the light changes – I recognize it anyway. ” ▸“Now, this little computer I carry in my head is easily able to do that. The computers that we build are not able to do that.”
HARD AI PROBLEMS
COMPUTATIONAL DIFFICULTY HARD AI PROBLEMS ▸ “A problem of which AI researchers currently agree that it is (AI-) hard” ▸ Image recognition ▸ Integer factorisation ▸ Abstract reasoning ▸ “Common sense”
IMAGE RECOGNITION Neural network What makes a wolf? Distinguish dogs from wolves What makes a dog?
IMAGE RECOGNITION What makes a wolf?
IMAGE RECOGNITION What exists?
IMAGE RECOGNITION
INTEGER FACTORISATION 588 294 174 49 7 (1) /2 /2 /3 /7 /7 588 = 2 * 2 * 3 * 7 * 7 588 = 2^2 * 3 * 7^2
COMPUTATIONAL DIFFICULTY BONGARD PROBLEMS ▸ Pattern recognition puzzle ▸ Find the common factor ▸ Why is this AI-hard?
COMPUTATIONAL DIFFICULTY CLOZE TEST ‣ The head of the colony is called the queen bee. She is larger than the ? of the bees. (rest) ‣ Hal was walking his dog one morning. A cat ran across their path. Hal’s dog strained so hard, the leash broke! He chased the cat for several minutes. Finally, Hal lured him back to his side.
COMPUTATIONAL DIFFICULTY WINOGRAD SCHEMA The trophy would not fit in the brown suitcase because it was too big. What was too big? Answer 0: the trophy Answer 1: the suitcase
COMPUTATIONAL DIFFICULTY SEQUENCE PREDICTION ▸ Raven’s Matrices ▸ Abstract Reasoning ▸ Pattern recognition
HARD PROBLEMS OR HARD AI PROBLEMS? ▸ Problems that are nontrivial to solve by AI ▸ Problems that are trivial to solve by humans AI EASY HUMAN EASY AI HARD HUMAN HARD
WHAT IS A HUMAN? Are you human? Prove it. (by completing this simple exercise)
WHAT IS A HUMAN? “Valid” Human?
WHAT IS A HUMAN? Voting literacy tests ▸ USA ▸ 1890s to the 1960s
WHAT IS A HUMAN? Plato “A man is a featherless biped“
WHAT IS A HUMAN? Diogenes
WHAT IS A HUMAN? “Behold! I’ve brought you a man“
WHAT IS A HUMAN? Plato “…“
WHAT IS A HUMAN? Plato “A man is a featherless biped… with broad, flat nails“
WHAT IS A HUMAN? “A man is a featherless biped… with broad, flat nails“
VERIFICATION OF A HUMAN IN THE LOOP. MONI NAOR, 1996 “We propose using a “Turing Test” in order to verify that a human is the one making a query to a service over the web.“ ▸ Answer a “human-in-the-loop-challenge” ▸ Combat spam, “junk mail”
CAPTCHA: USING HARD AI PROBLEMS FOR SECURITY LUIS VON AHN, MANUEL BLUM, NICHOLAS J. HOPPER , AND JOHN LANGFORD, 2003 1 Completely Automated Public Turing test to tell Computers and Humans Apart
THE IMITATION GAME. ALAN TURING, 1950 “I propose to consider the question, “Can machines think?”“ ▸ “Turing Test” ▸ “Game” using Interrogator, Human & Machine ▸ Can we distinguish by interrogation?
CAPTCHA: USING HARD AI PROBLEMS FOR SECURITY LUIS VON AHN, MANUEL BLUM, NICHOLAS J. HOPPER , AND JOHN LANGFORD, 2003 1 “A CAPTCHA is a program that can generate and grade tests that: (A) most humans can pass, but (B) current computer programs can’t pass.“
WHAT IS A CAPTCHA? “A CAPTCHA is a program that can generate and grade tests that it itself cannot pass (much like some professors)“
CAPTCHA: USING HARD AI PROBLEMS FOR SECURITY LUIS VON AHN, MANUEL BLUM, NICHOLAS J. HOPPER , AND JOHN LANGFORD, 2003 1 ‣ Protect online polls ‣ Protect free e-mail services ‣ Protect search engine bots ‣ Prevent worms / spam ‣ Preventing dictionary attacks
PATENT NO.: US 6,195,698 B1, FEB. 27, 2001 “It is well known on the Internet that many agents are intentionally designed to behave in a malicious, destructive, or otherwise annoying “anti-Social’ manner. Therefore, Service providers would like to deny access by agents. “ ▸ First patent, invented 1997, issued 2001 ▸ Altavista add-url abuse prevention
PATENT NO.: US 6,195,698 B1, FEB. 27, 2001 ▸ Mentions audible version ▸ Accounts for OCR breaking ▸ “Relaxed checking” “The riddle does not necessarily need to be presented to the user as an image on a display device.“
ALTAVISTA CAPTCHA IMPLEMENTATION (NO LONGER EXISTS) https://www.semanticscholar.org/paper/A-New-Anti-Spam-Protocol-Using-CAPTCHA-Shirali-Shahreza-Movaghar-Rahimabadi/4c99afe69bd9ad5ee5f24ca4efe4df0a67b56de3
VARIATIONS ON THE THEME
VARIATIONS ON THE THEME Random variables to defeat automated solving ‣ Segmentation ‣ Rotation / Flipping ‣ Font weight / style ‣ Font size ‣ Noise
RECAPTCHA: CROWDSOURCING BOOK DIGITISATION ‣ Digitised books ‣ One known, one unknown ‣ Help OCR limits https://web.archive.org/web/20100611210703/http://recaptcha.net/
CULT OF THE CAPTCHA reddit.com/r/inglip
CULT OF THE CAPTCHA reddit.com/r/inglip
CULT OF THE CAPTCHA reddit.com/r/inglip
CAPTCHA HUMOR
CAPTCHA ART https://rachelmasseyart.wordpress.com/2013/04/09/new-artwork-submission-captcha/
VARIATIONS ON THE THEME
WHAAAAAAT
NOOOOO
WHAT DOES IT MEAN?
WHAT DOES IT MEAN?
WHAT DOES IT MEAN?
WHAT DOES IT MEAN?
WHAT DOES IT MEAN?
WHAT DOES IT MEAN?
WHAT DOES IT MEAN?
WHAT DOES IT MEAN?
WHAT DOES IT MEAN?
I SAW THE SIGN
WHAT DOES IT MEAN?
WHAT DOES IT MEAN?
PATENT PENDING Turing Test via Reaction to Test Modifications - Amazon
PATENT PENDING System and Method for Monitoring Human Interaction - Infosys
PATENT PENDING Client-Side Captcha Ceremony for User Verification - Microsoft
PATENT PENDING Image Based Turing Test - Uni Student
PATENT PENDING Method for Generating a Human Likeness Score - Ayah Inc.
PATENT PENDING Advertisement-based Human Interactive Proof - Microsoft
WHAT DOES IT MEAN? https://ashnavabi.files.wordpress.com/2015/03/motte-and-bailey.jpg
WHAT DOES IT MEAN?
PROSPECTS FOR COMPUTATIONAL HUMOUR, GRAEME RITCHIE “The overall message is that endeavouring to develop computational models of humour is a worthwhile enterprise both for artificial intelligence and for those interested in humour, but we are starting from a very meager foundation, and the challenges are significant.“
WHAT DOES IT MEAN?
RECAPTCHA V2 reCAPTCHA v2 ▸ Pattern recognition puzzle ▸ “what is like the other?” ▸ Train NN, like V1 reCAPTCHA ▸ Subjective ▸ Cognitive ▸ Meaning
LISTEN CLOSELY Audio CAPTCHAs
LISTEN CLOSELY Audio CAPTCHAs ‣ BotDetect
LISTEN CLOSELY Audio CAPTCHAs ‣ reCaptcha v3
WHY WE SEE SO WELL Lynne A. Isbell The Fruit, the Tree, and the Serpent Snake Detection Theory
WHY WE SEE SO WELL Snake detection
WHY WE SEE SO WELL Snake detection
WHY WE SEE SO WELL Snake detection
WHY WE SEE SO WELL Stereoscopic Vision
WHY WE SEE SO WELL Fovea
WHY WE SEE SO WELL Edge detection
WHY WE SEE SO WELL Cat Cortex https://www.youtube.com/watch?v=IOHayh06LJ4
WHY WE SEE SO WELL Edge detection
WHY WE SEE SO WELL Esotropia
WHY WE SEE SO WELL Eye Patching
WHY WE SEE SO WELL Pulvinar (Thalamus)
PATTERN MATCHING Face recognition Fusiform area Pareidolia
CHARACTERS OR FACES: A USER STUDY ON EASE OF USE FOR HIPS YONG RUI, ZICHENG LIU, SHANNON KALLIN, GAVIN JANKE, AND CEM PAYA, 2005 “Study results show that the users are almost equally divided in evaluating their overall ease of use.“
BEFORE A COMPUTER CAN DRAW, IT MUST FIRST LEARN TO SEE DERRALL HEATH AND DAN VENTURA, BRIGHAM YOUNG UNIVERSITY, 2016 Pareidolia in Neural Networks
MACHINE PAREIDOLIA: HELLO LITTLE FELLA MEETS FACETRACKER HTTP://URBANHONKING.COM/IDEASFORDOZENS/2012/01/14/MACHINEPAREIDOLIA-HELLO-LITTLE-FELLA-MEETS-FACETRACKER/ Pareidolia in Neural Networks
RECOGNISING FACES Prosopagnosia (face blindness)
HOW THE MIND CREATES LANGUAGE Steven Pinker The Language Instinct Language development
PIDGIN AND CREOLE Nicaraguan Sign Language Pidgin → Creole Critical Period; Genie case
MATH SENSE Numerical sense in babies
MEASURING ABSTRACT REASONING IN NEURAL NETWORKS DAVID G.T. BARRETT FELIX HILL ADAM SANTORO ARI S. MORCOS TIMOTHY LILLICRAP, 2018 *1 *1 *1 1 “Whether neural networks can learn abstract reasoning or whether they merely rely on superficial statistics is a topic of recent debate.“
MEASURING ABSTRACT REASONING IN NEURAL NETWORKS DAVID G.T. BARRETT FELIX HILL ADAM SANTORO ARI S. MORCOS TIMOTHY LILLICRAP, 2018 *1 *1 *1 1
MEASURING ABSTRACT REASONING IN NEURAL NETWORKS DAVID G.T. BARRETT FELIX HILL ADAM SANTORO ARI S. MORCOS TIMOTHY LILLICRAP, 2018 *1 *1 *1 1 “With important caveats, neural networks can indeed learn to infer and apply abstract reasoning principles […] and apply these principles to never-before observed stimuli.“
BY THE NUMBERS How good are humans at these anyway?
DESIGNING HUMAN FRIENDLY HUMAN INTERACTION PROOFS KUMAR CHELLAPILLA, KEVIN LARSON, PATRICE SIMARD AND MARY CZERWINSKI, MICROSOFT “To be effective, a HIP must be difficult enough to discourage script attacks by raising the computation and/or development cost of breaking the HIP to an unprofitable level.“
DESIGNING HUMAN FRIENDLY HUMAN INTERACTION PROOFS KUMAR CHELLAPILLA, KEVIN LARSON, PATRICE SIMARD AND MARY CZERWINSKI, MICROSOFT
DESIGNING HUMAN FRIENDLY HUMAN INTERACTION PROOFS KUMAR CHELLAPILLA, KEVIN LARSON, PATRICE SIMARD AND MARY CZERWINSKI, MICROSOFT “The sweet spot will decrease in size over time as computers get faster, attackers get more sophisticated, and HIPs are specifically targeted. Unfortunately, humans are unlikely to get better at solving HIPs in the same timeframe“
DESIGNING HUMAN FRIENDLY HUMAN INTERACTION PROOFS KUMAR CHELLAPILLA, KEVIN LARSON, PATRICE SIMARD AND MARY CZERWINSKI, MICROSOFT
DESIGNING HUMAN FRIENDLY HUMAN INTERACTION PROOFS KUMAR CHELLAPILLA, KEVIN LARSON, PATRICE SIMARD AND MARY CZERWINSKI, MICROSOFT
DESIGNING HUMAN FRIENDLY HUMAN INTERACTION PROOFS KUMAR CHELLAPILLA, KEVIN LARSON, PATRICE SIMARD AND MARY CZERWINSKI, MICROSOFT
DESIGNING HUMAN FRIENDLY HUMAN INTERACTION PROOFS KUMAR CHELLAPILLA, KEVIN LARSON, PATRICE SIMARD AND MARY CZERWINSKI, MICROSOFT “HIPs with thick foreground arcs are easily recognized at certain levels for humans, and yet these conditions remain extremely difficult for computer hackers to solve.“
HOW GOOD ARE HUMANS AT SOLVING CAPTCHAS? A LARGE SCALE EVALUATION ELIE BURSZTEIN, STEVEN BETHARD, CELINE FABRY, JOHN C. MITCHELL, DAN JURAFSKY, 2010 “[humans] agreed only 71% of the time on average.“ “[…] agreement by three humans only 31% of the time for audio captchas. [On] 33.6% everyone had a different answer“
HOW GOOD ARE HUMANS AT SOLVING CAPTCHAS? A LARGE SCALE EVALUATION ELIE BURSZTEIN, STEVEN BETHARD, CELINE FABRY, JOHN C. MITCHELL, DAN JURAFSKY, 2010 “Non-native speakers of English take longer to solve captchas, and are less accurate on captchas that include English words.“ “Ph.D.s are the best at solving audio captchas“
HOW GOOD ARE HUMANS AT SOLVING CAPTCHAS? A LARGE SCALE EVALUATION ELIE BURSZTEIN, STEVEN BETHARD, CELINE FABRY, JOHN C. MITCHELL, DAN JURAFSKY, 2010 “[…] it is more effective for an attacker to use Mechanical Turk to solve captchas than an underground service.“
HOW GOOD ARE HUMANS AT SOLVING CAPTCHAS? A LARGE SCALE EVALUATION ELIE BURSZTEIN, STEVEN BETHARD, CELINE FABRY, JOHN C. MITCHELL, DAN JURAFSKY, 2010
BREAK IT DOWN Breaking or subverting CAPTCHAs ▸ Solving Services ▸ Trained Neural Networks
UNDERSTANDING CAPTCHA-SOLVING SERVICES IN AN ECONOMIC CONTEXT MARTI MOTOYAMA, KIRILL LEVCHENKO, CHRIS KANICH, DAMON MCCOY, GEOFFREY M. VOELKER AND STEFAN SAVAGE “the use of human labor to solve captchas effectively side-steps their design point“ “To this day, no completely general means of solving captchas has emerged, nor is the cat-and-mouse game of creating automated solvers viable as a business model. In this regard, then, captchas have succeeded.“
UNDERSTANDING CAPTCHA-SOLVING SERVICES IN AN ECONOMIC CONTEXT MARTI MOTOYAMA, KIRILL LEVCHENKO, CHRIS KANICH, DAMON MCCOY, GEOFFREY M. VOELKER AND STEFAN SAVAGE “To this day, no completely general means of solving captchas has emerged“
BREAK IT https://www.youtube.com/watch?v=fsF7enQY8uI
USING MACHINE LEARNING TO BREAK VISUAL HUMAN INTERACTION PROOFS KUMAR CHELLAPILLA, PATRICE Y. SIMARD, MICROSOFT,
I’M NOT A HUMAN: BREAKING THE GOOGLE RECAPTCHA SUPHANNEE SIVAKORN, JASON POLAKIS, AND ANGELOS D. KEROMYTIS COLUMBIA UNIVERSITY, 2016 “We ran our captcha-breaking system against 2,235 captchas, and obtained a 70.78% accuracy“
A GENERATIVE VISION MODEL THAT TRAINS WITH HIGH DATA EFFICIENCY AND BREAKS TEXT-BASED CAPTCHAS DILEEP GEORGE,* WOLFGANG LEHRACH, KEN KANSKY, MIGUEL LÁZAROGREDILLA,* CHRISTOPHER LAAN, BHASKARA MARTHI, XINGHUA LOU, ZHAOSHI MENG, YI LIU, HUAYAN WANG, ALEX LAVIN, D. SCOTT PHOENIX, 2017 “RCN was effective in breaking a wide variety of CAPTCHAs with very little training data and without using CAPTCHA-specific heuristics“
UNCAPTCHA: A LOW-RESOURCE DEFEAT OF RECAPTCHA’S AUDIO CHALLENGE KEVIN BOCK, DAVEN PATEL, GEORGE HUGHEY, DAVE LEVIN, MARYLAND UNI, 2017 “We evaluate unCaptcha using over 450 re-Captcha challenges from live websites, and show that it can solve them with 85.15% accuracy in 5.42 seconds, on average.“
UNCAPTCHA: A LOW-RESOURCE DEFEAT OF RECAPTCHA’S AUDIO CHALLENGE KEVIN BOCK, DAVEN PATEL, GEORGE HUGHEY, DAVE LEVIN, MARYLAND UNI, 2017 (12/28/2018) “After we informed Google about unCaptcha, they updated their audio challenges to issue phrases instead of digits. unCaptcha v2 now breaks these new challenges, with even higher accuracy (around 90%) than before.“
IN (CYBER)SPACE BOTS CAN HEAR YOU SPEAK SAUMYA SOLANKI, GAUTAM KRISHNAN, VARSHINI SAMPATH, JASON POLAKIS CHICAGO UNIVERSITY, 2017 “[…] our AudioBreaker system is able to break all captcha schemes, achieving accuracies of up to 98.3% against Google’s Re-Captcha.“
IN (CYBER)SPACE BOTS CAN HEAR YOU SPEAK SAUMYA SOLANKI, GAUTAM KRISHNAN, VARSHINI SAMPATH, JASON POLAKIS CHICAGO UNIVERSITY, 2017
IN (CYBER)SPACE BOTS CAN HEAR YOU SPEAK SAUMYA SOLANKI, GAUTAM KRISHNAN, VARSHINI SAMPATH, JASON POLAKIS CHICAGO UNIVERSITY, 2017
RECAPTCHA V3 reCAPTCHA v3 ▸ Magic black box ▸ Passive analysis ▸ “Bot score”
HACKING GOOGLE RECAPTCHA V3 USING REINFORCEMENT LEARNING ISMAIL AKROUT , AMAL FERIANI , MOHAMED AKROUT, ANKOR AI, 2019 1 2 “Experiment results show that the [Reinforcement Learning] agent passes the reCAPTCHA test with 97:4 accuracy. To our knowledge, this is the first attempt to defeat the reCAPTCHA v3 using RL“
BIOMETRICS rtCaptcha ▸ Biometrics (audio + video) ▸ Solve captcha, STT check
BIOMETRICS What is a human, apart from a face?
KEEP RETREATING https://ashnavabi.files.wordpress.com/2015/03/motte-and-bailey.jpg
HOT TAKE CAPTCHA is the WRONG ANSWER to the WRONG QUESTION
HOT TAKE ACTION over ACTOR
ETHICALLY SPEAKING Are captchas ethical?
HI THERE ARISTOTLE VIRTUE ETHICS ‣ Treat like cases like ‣ Agency, not Action ‣ Are captchas ethical? ‣ What about abuse?
TELOS IS GREEK TO ME CONSEQUENTIALISM TELOS ‣ Action, not Agency ‣ Utilitarianism ‣ Egoism ‣ Altruism
IT KANT BE DEONTOLOGY ‣ “Rightness” ‣ Duty ‣ Kant’s categorical imperative ‣ Never treat people as means to an end ‣ Golden rule
I DON’T LIKE SPAM We’re dealing with an ABUSE problem.
WHAT TO DO? Post-hoc analysis ‣ Whitelist / blacklist ‣ Source filtering (VPN) ‣ Honeypot ‣ Timer ‣ Confirmation link ‣ 2-step confirmation ‣ reCAPTCHA v3?
HARSH TRUTH It’s a numbers game.
TICKET SCALPING
FAKE REVIEWS
WHAT ABOUT… Google Duplex?
WHAT ABOUT… Anonymous polls? Anonymous comments?
WHAT ABOUT… Shop scanners?
WRAP IT UP CONCLUSION
You’re familiar with CAPTCHAs getting in your way. But why are they such a ubiquitous security measure to begin with? Why are there different implementations, and which problem are they attempting to solve? During this talk you’ll learn how these “Human Interactive Proofs” came to be, how they’re still evolving, and why they are a bad solution to the wrong problem.
Here’s what was said about this presentation on Twitter.
Also the first time I have the honor to see @detonite give a talk. And he sooo kills it! Job is such a great, energetic and funny narrator. Learning sommuch about CAPTCHAs and machines 🤩 #refreshcon pic.twitter.com/o3Y5ygdxs4
— Christian Schaefer (@derSchepp) November 8, 2019
Down to our second Job and ninth speaker of the day. @detonite is doing a great job storytelling on stage, taking us along the methods and ethics of automated Turing tests. #refreshcon pic.twitter.com/BNYBPK8AXw
— Refresh Conference (@refreshcon) November 8, 2019