projet-oral-anglais/slides.md
2023-12-14 16:08:30 +01:00

351 lines
12 KiB
Markdown

---
marp: true
author: Laurent Fainsin, Clément Broutin
title: CAPTCHA
---
<style>
hidden {
visibility: hidden;
}
</style>
<style scoped>
video {
position: absolute;
right: 0;
top: 0;
width: 100%;
}
section {
color: #fdfdfd;
}
h1 {
margin: 0 auto;
z-index: 10;
color: white;
font-size: 5rem;
}
</style>
<video src="figs/background.mp4" autoplay loop></video>
# CAPTCHAs
<!-- Welcome dear fellow humans to our scientific presentation on CAPTCHAs -->
<!-- Dire plein de trucs bonus en cliquant en live sur les liens (en bleu) des slides -->
<!-- On en faisant le con sur les tests des captchas -->
<!-- Faire planer le doute tout du long si clément est réelement un humain -->
---
<header>
# What is a CAPTCHA ?
</header>
Definition:
* **C**ompletely **A**utomated **P**ublic [**T**uring](https://en.wikipedia.org/wiki/Alan_Turing) test to tell **C**omputers and **H**umans **A**part
* commonly, a third party software installed on the web pages
* /kæp.tʃə/
<hidden>
A bit of history:
- Introduced in 1997 by [AltaVista](https://fr.wikipedia.org/wiki/AltaVista)
- Term was coined in 2003 by [Luis von Ahn](https://en.wikipedia.org/wiki/Luis_von_Ahn), [Manuel Blum](https://en.wikipedia.org/wiki/Manuel_Blum), [Nicholas J. Hopper](https://www-users.cse.umn.edu/~hoppernj/), and [John Langford](https://www.microsoft.com/en-us/research/people/jcl/).
- Based on a [Reverse Turing test](https://en.wikipedia.org/wiki/Reverse_Turing_test)
- Created from open problems in AI.
</hidden>
<!-- So first of all, what is a captcha ? -->
<!-- By definition, CAPTCHAs are a completely automated... -->
<!-- So they are simply a tool for categorizing humans and non-humans -->
<!-- Turing was a brilliant famous mathematician of the last century, he is well known to be at founder of modern computers (turing machine...) -->
<!-- CAPTCHAs nowadays mostly present in your web browser (pretty much only place where you encounter them). -->
<!-- They a are what's called 3rd party software, meaning that they 99% of the time not dev by owner of site but by other organisation. This is due to the requirements that such a tool has. We'll talk a bit more about that in few seconds ! -->
<!-- And they are pronounced /kæp.tʃə/. -->
---
<header>
# What is a CAPTCHA ?
</header>
Definition:
- **C**ompletely **A**utomated **P**ublic [**T**uring](https://en.wikipedia.org/wiki/Alan_Turing) test to tell **C**omputers and **H**umans **A**part.
- commonly, a third party software installed on the web pages.
- /kæp.tʃə/
A bit of history:
* Introduced in 1997 by [AltaVista](https://fr.wikipedia.org/wiki/AltaVista).
* Term was coined in 2003 by [Luis von Ahn](https://en.wikipedia.org/wiki/Luis_von_Ahn), [Manuel Blum](https://en.wikipedia.org/wiki/Manuel_Blum), [Nicholas J. Hopper](https://www-users.cse.umn.edu/~hoppernj/) and [John Langford](https://en.wikipedia.org/wiki/John_Langford_(computer_scientist)).
* Based on a [Reverse Turing test](https://en.wikipedia.org/wiki/Reverse_Turing_test).
* Created from [open problems in AI](https://ai-forum.com/opinion/unsolved-problems-in-ai/).
<!-- Let's see where captchas come from -->
<!-- Introduced by AltaVista, a web engine company when they wanted to prevent unwanted addition by nefarious users to their search engine. Because at the time, if you wanted your website to be referenced in a search engine, so that it could be found easily, you add to manually add them to their system. -->
<!-- At the time, this preventive system was unnamed. the term captcha was coined by four mathematicians / computer scientists in 2003, namely Luis... -->
<!-- It's based on a reverse turing test ! first of all a turing test is method for determining whether a computer is capable of human-like thinking. So reverse turing test is a method for testing wether or not something is a human or not. -->
<!-- They concieved so that they are practically impossible for current computers to decipher, but they must be easy enough for real humans to do. -->
---
<header>
# What are CAPTCHAs for ?
</header>
## They filter out the real humans !
What is a non real human ?
* [Bots](https://en.wikipedia.org/wiki/Internet_bot) 🤖
* [Crawlers](https://en.wikipedia.org/wiki/Web_crawler) 🕷️
* [Scrappers](https://en.wikipedia.org/wiki/Web_scraping) 🐀
* Dogs 🐕 / Cats 🐈
* [Spammers](https://en.wikipedia.org/wiki/Spamming) 📨
* [Hackers](https://en.wikipedia.org/wiki/Hacker) 🏴‍☠️
* Clément ? 👨‍🦰
<!-- So captcha filter out non humans, this include -->
<!-- bots, a software application that runs automated tasks (scripts), usually with the intent to emulate human activity. They are fairly easy to code, and generally astonishly cheap. precisely who we want to restrict. -->
<!-- crawlers, an internet bot that browses the World Wide Web for the purpose of web indexing. They are most of the time used by search engines to better their search results, they mostly look at the metadata of pages (title, date, author, thumbnail, description, language, icons...), but they can also by used for more nefarious reasons, combined with scrappers for example. -->
<!-- scrappers, the automated extraction of data on websites via bots and crawlers, not just metadata anymore they are designed to gather a lot more data, phone numbers, emails, passwords (?), addresses, any precious info. They are generally badly viewed since they generally cause a lot of traffic on sites. -->
<!-- Dogs/cat KEKW -->
<!-- spammers, you don't want your contact form to be unprotected, or you'll soon receive email for special pills.. -->
<!-- hackers, they actually are humans, but they generally use all the tools from above (except cat/dog) and you want to at least slow them down. -->
<!-- clément ? 😳 -->
---
<header>
# Why are CAPTCHAs needed ?
</header>
![](https://www.imperva.com/blog/wp-content/uploads/sites/9/2021/04/Bad-Bod-Report-Fig-1-1024x466.png.webp)
Source: [Imperva](https://www.imperva.com/blog/bad-bot-report-2021-the-pandemic-of-the-internet/)
<!-- Why all the trouble, are bots really that common ? yes -->
<!-- a study from Imperva in 2020, estimate human traffic to only be about 60%, some other studies are even more aggressive (less than 45% sometimes). -->
<!-- good bots, search engines, monitoring bots, commercials crawlers, feed fetchers... -->
<!-- bad bots, every tools that we saw before, hackers, state spies... -->
<!-- You may understand why one may want to protect some areas of his website -->
---
<header>
# some CAPTCHA examples
</header>
<iframe
id=recaptcha_iframe
src="https://democaptcha.com/demo-form-eng/recaptcha-2.html"
scrolling="no"
frameborder="0"
height="100%"
width="100%"
></iframe>
<!-- In a way this type of challenge is relatively easy for computers to do nowadays, the difficulty of this captcha comes from the fact that attackers don't have the dataset that google has. (if you didn't know theses come from google street view) -->
---
<header>
# some CAPTCHA examples
</header>
<iframe
id=recaptcha_iframe
src="https://democaptcha.com/demo-form-eng/hcaptcha.html"
frameborder="0"
height="100%"
width="100%"
></iframe>
<!-- dataset comes from companies or individuals that need data to be classified, if you pay them and give them a 100 millions images, they will classify it for you (at a price). -->
---
<header>
# some CAPTCHA examples
</header>
<iframe
id=recaptcha_iframe
src="https://democaptcha.com/demo-form-eng/math-image.html"
frameborder="0"
height="100%"
width="100%"
></iframe>
<!-- simpler test, can still be effective, but will be surpassed very easily -->
---
<header>
# some CAPTCHA examples
</header>
<iframe
id=recaptcha_iframe
src="https://democaptcha.com/demo-form-eng/image.html"
frameborder="0"
height="100%"
width="100%"
></iframe>
<!-- same, simpler test -->
---
<header>
# some exotic CAPTCHA examples
</header>
![90% bg](https://www.ionos.fr/digitalguide/fileadmin/DigitalGuide/Screenshots/EN-Captcha-Spamschutz-9.png)
<!-- theses types of captchas are generally uncommon, but are generally insanely effective at stopping bots. They are cheap to create and manage/evolve. -->
<!-- Though they aren't well suited for any other platform that a desktop computer. I don't want to solve that using my phone. -->
---
<header>
# some exotic CAPTCHA examples
</header>
![44% bg](https://www.ionos.fr/digitalguide/fileadmin/DigitalGuide/Screenshots/EN-Captcha-Spamschutz-10.png)
<!-- theses types of captchas are generally uncommon, but are generally insanely effective at stopping bots. They are cheap to create and manage/evolve. -->
<!-- Though they aren't well suited for any other platform that a desktop computer. I don't want to solve that using my phone. -->
---
<header>
# some exotic CAPTCHA examples
</header>
![70% bg](https://www.ionos.fr/digitalguide/fileadmin/DigitalGuide/Screenshots/EN-Captcha-Spamschutz-11.png)
<!-- theses types of captchas are generally uncommon, but are generally insanely effective at stopping bots. They are cheap to create and manage/evolve. -->
<!-- Though they aren't well suited for any other platform that a desktop computer. I don't want to solve that using my phone. -->
---
<header>
# some exotic CAPTCHA examples
</header>
![100% bg](https://www.ionos.fr/digitalguide/fileadmin/DigitalGuide/Screenshots/EN-Captcha-Spamschutz-1.png)
<!-- audio is interesting, for blind people -->
---
<header>
# Possible attacks on CAPTCHAs ?
</header>
Quite difficult and costly:
* [Human Farms](https://www.netacea.com/blog/what-are-captcha-farms/)
* [Flying under the radars](https://github.com/ultrafunkamsterdam/undetected-chromedriver)
* Praying 🙏 ?
* [It's an arms race](https://github.com/dessant/buster)
* Man In The Middle Attack
<!-- Human farms, sound like matrix... but you can actually pay people, in third world countries, to click on your captchas. -->
<!-- Flying under the radars, you could try to optimize your techniques to be as less suspicous as possible, you'll get a bit further -->
<!-- praying ? -->
<!-- It's an rams race, people are building deep learning models to try an solve these captchas -->
<!-- MITM, simply infecting of normal people and making internet requests on their behalf, basically a botnet -->
---
<header>
# Alternatives to CAPTCHAs ?
</header>
Not much:
* [Honeypot](https://en.wikipedia.org/wiki/Honeypot_(computing))
* [SMS/email verification](https://en.wikipedia.org/wiki/Multi-factor_authentication)
* [Centralized sign-on](https://en.wikipedia.org/wiki/Central_Authentication_Service)
* Forced human interaction
* Motion-tracking
<!-- Honeypot, not a real alternative, but more a mindset, you want to trick bots into doing useless stuff -->
<!-- double authentification, your bank for example doesn't want you to be a robot -->
<!-- Centralized sign-on, the famous "connect with google/facebook/france connect" button, this way you don't actually do the process yourself, but trust a third party to filter out the bots for you. (spoiler: not that effective) -->
<!-- force human interaction, example des procurations lors des présidentielles -->
<!-- motion tracking, captchas are actually observing you even though you are not actively solving them, they look at you mouse movement, your keyboard strokes, and categorize you. For example when you initally click the I'm not a robot, the algorithm will observe this click an compare it to precedent cliks to detect if there is a pattern (did you click perfectly in the center each time ?) -->
---
<header>
# Drawbacks ?
</header>
* Annoying
* Accessibility
* Privacy
---
<script src="https://cdn.jsdelivr.net/npm/party-js@latest/bundle/party.min.js"></script>
<div class="yay" style="margin:0 auto;" onmousedown="party.confetti(this)">
🎉 Thank you for your attention 🎉
</div>