TABLE OF CONTENTS
Acknowledgments
Abstract
CHAPTER 1:- INTRODUCTION
1.1 What is Captcha
1.2 Captcha and Turing Test
CHAPTER 2:- Usage of Captcha
CHAPTER 3:- Working of captcha
3.1 How Captcha Works
3.2 Live Example of Website with Captcha
CHAPTER 4:- More About Captcha’s
4.1 Necessity of Captcha
4.2 Creation of Captcha
4.3 Breaking of Captcha
CHAPTER 5:- Types of Captcha’s
5.1 Text Captcha’s
5.1.1 Gimpy
5.1.2 Ez-Gimpy
5.1.3 Baffle Text
5.1.4 MSN Captcha
5.2 Graphics Captcha
5.2.1 BONGO
5.2.2 PIX
5.2.3 Audio
CHAPTER 6:- Applications
6.1 Advantages
6.2 Limitations
6.3 Online polls
6.4 Preventing Dictionary Attacks.
6.5 Search Engine Bots
6.6 Worms and Spam
6.7 E-Ticketing
CHAPTER 7:-Advantages and Difficulties with Captcha
7.1 Advantages
7.2Difficulties
CHAPTER 8 CONCLUSION
CHAPTER 9 REFERENCE
CHAPTER 1
INTRODUCTION
The term "CAPTCHA" based upon the word capture was coined in 2000 by Luis Von, Manuel Blum, Nicholas J. Hoppe ,John Langford.The word CAPTCHA is an acronym and
stands for
Completely
AutomatedPublicTuring test to tellComputers andHumansApart
AutomatedPublicTuring test to tellComputers andHumansApart
Actually CAPTCHA is used as a simple THE full form well defines the purpose of puzzle hurdle, which restricts various automated programs to sign-up E-mail accounts, cracking passwords, spam sending, privacy violation etc. This CAPTCHA actually challenges a particular automated program, which is trying to access some private zone. So, CAPTCHA helps in preventing access of personal mail accounts by some un-authorized automated spamming programs.
1.1 WHAT IS CAPTCHA
A CAPTCHA is a type of challenge-response test used in computing to ensure that the response is not generated by a computer. The process usually involves one computer a server asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. Thus, it is sometimes described as a reverse Turing test, because it is administered by a machine and targeted to a human, in contrast to the standard Turing test that is typically administered by a human and targeted to a machine. A common type of CAPTCHA requires that the user type letters or digits from a distorted image that appears on the screen.
CAPTCHA (initiated by researchers at Carnegie Mellon University and IBM in 2000) works by understanding the OCR methods and displaying text that can break them. Of course this a game two can play, so OCR designers can modify their methods to read the distorted text.
While CAPTCHAs started with text they have started using other images that maybe easy for a human to recognize but they baffle computers. Also, some of the new tests are not public so the technology is now called HIP that stands for
Human
Interaction
Proof
1. 2 Captcha and the Turing Test
CAPTCHA technology has its foundation in an experiment called the Turing Test. Alan Turing, sometimes called the father of modern computing, proposed the test as a way to examine whether or not machines can think -- or appear to think -- like humans. The classic test is a game of imitation. In this game, an interrogator asks two participants a series of questions. One of the participants is a machine and the other is a human. The interrogator can't see or hear the participants and has no way of knowing which is which. If the interrogator is unable to figure out which participant is a machine based on the responses, the machine passes the Turing Test.
Of course, with a CAPTCHA, the goal is to create a test that humans can pass easily but machines can't. It's also important that the CAPTCHA application is able to present different CAPTCHAs to different users. If a visual CAPTCHA presented a static image that was the same for every user, it wouldn't take long before a spammer spotted the form, deciphered the letters, and programmed an application to type in the correct answer automatically.
Most, but not all, CAPTCHAs rely on a visual test. Computers lack the sophistication that human beings have when it comes to processing visual data. We can look at an image and pick out patterns more easily than a computer. Ever see a shape in the clouds or a face on the moon? That's your brain trying to associate random information into patterns and shapes.
But not all CAPTCHAs rely on visual patterns. In fact, it's important to have an alternative to a visual CAPTCHA. Otherwise, the Web site administrator runs the risk of disenfranchising any Web user who has a visual impairment. One alternative to a visual test is an audible one. An audio CAPTCHA usually presents the user with a series of spoken letters or numbers. It's not unusual for the program to distort the speaker's voice, and it's also common for the program to include background noise in the recording. This helps thwart voice recognition programs.
Another option is to create a CAPTCHA that asks the reader to interpret a short passage of text. A contextual CAPTCHA quizzes the reader and tests comprehension skills. While computer programs can pick out key words in text passages, they aren't very good at understanding what those words actually mean.
CHAPTER 2
Usage of Captcha
Registration forms on Web sites often use CAPTCHAs. For example, free Web-based e-mail services like Hotmail, Yahoo! Mail or Gmail allow people to create an e-mail account free of charge. Usually, users must provide some personal information when creating an account, but the services typically don't verify this information. They use CAPTCHAs to try to prevent spammers from using bots to generate hundreds of spam mail accounts.
Yahoo uses alphanumeric strings rather than words as CAPTCHAs when you sign up for Yahoo! Account. Ticket brokers like Ticketmaster also use CAPTCHA applications. These applications help prevent ticket scalpers from bombarding the service with massive ticket purchases for big events. Without some sort of filter, it's possible for a scalper to use a bot to place hundreds or thousands of ticket orders in a matter of seconds. Legitimate customers become victims as events sell out minutes after tickets become available. Scalpers then try to sell the tickets above face value. While CAPTCHA applications don't prevent scalping, they do make it more difficult to scalp tickets on a large scale.
The most common form of CAPTCHA requires visitors to type in a word or series of letters and numbers that the application has distorted in some way. Some CAPTCHA creators came up with a way to increase the value of such an application: digitizing books. An application called reCAPTCHA harnesses users responses in CAPTCHA fields to verify the contents of a scanned piece of paper. Because computers aren't always able to identify words from a digital scan, humans have to verify what a printed page says. Then it's possible for search engines to search and index the contents of a scanned document.
CHAPTER 3
Working of Captcha
CAPTCHA is a simple verification system made up of the captcha image and the code that generates it, a form field into which the code one sees in the captcha image must be inserted, and separate server-side code which checks that the code inserted manually into the field was correct.
We use captcha mechanisms on various forms and applications in an attempt to weed out form spam attacks which are becoming more of a problem over time. The captcha approach does present usability problems for non-graphical browser users so the audience of the page using captcha should be considered before employing it.
3.1 How captcha works
Basically CAPTCHA works in the following manner:
1. Create Random Value: Some random string is generated, random values are often hard to guess and predict.
2.Generate an Image: Images are used as these are generally a lot harder to read for computers while being nice and readable to humans. This is also the most important step as simple text in images can be read (and CAPTCHA cracked) quite easily. To make it difficult for them, developers employ different techniques so that the text in the image becomes hard to read for computers. Some create zig-zag lines for background while others twist-and-turn individual characters in the image. Possibilities are many and new techniques are being developed all the time as crackers are always into finding ways to break them.
3. Store it: The random string generated (which is also in the image) is stored for matching the user input. The easiest way to do so is to use the Session variables.
4. Matching: After the above step, the CAPTCHA image is generated and shown on some form which we want to protect from being abused. The users fills in the form along with the CAPTCHA text and submits it. Now we have the following:
a. All submitted form data.
b. CAPTCHA string (from form), input by user.
c. CAPTCHA string (real one, generated by us), from session variable. Session variable is generally used as it can keep stored values across page requests. Here, we needed to preserve stored values from one page (form page) to another (action page-that receives form data).
5. If both match, it's okay otherwise not, in that case we can give the user a message that the CAPTCHA they had entered was wrong and their form could not be submitted. You could also ask them to verify it again.
3.2 Live Example of a CAPTCHA
Above shown is the official website for results of Gujarat technological university where if a user needs to check the result he/she need to fill the captcha code shown above into the above to the right and if the user inputs the code and if the same matches, then the query is processes or else the user is provided with another captcha.
CHAPTER 4
More about CAPTCHAs
4.1 Necessity of captcha
When the human user enters a captcha page and sees a code generated image - usually numbers and letters - he must insert this code, by typing the same numbers or letters in the image into a field and if he has copied correctly - he will be allowed to click the send button and either send the form or proceed to other web pages. Human eyeballs have a much more sophisticated system of recognizing complex images and still be able to extract the numbers and letters which are in the image.
The bot on the other hand, does not have such sophistication, and mostly is programmed to extract e-mail addresses from the html code of a web page.
But because even the bots have become more sophisticated and some of them have image recognition algorithms programmed into them and can read even images with numbers and letters, this is why captcha created images are usually also a bit distorted or have dots and lines along with the numbers to try to confuse the bot as much as possible.
But because even the bots have become more sophisticated and some of them have image recognition algorithms programmed into them and can read even images with numbers and letters, this is why captcha created images are usually also a bit distorted or have dots and lines along with the numbers to try to confuse the bot as much as possible.
This basically means that a human can pass through a captcha protected page or form but a bot will have a harder task. This because a bot has difficulty in reading and then typing in what it first saw on a graphical image. Please note - the keyword here is 'harder task'.
The bot, via the page body and page source code can figure out where the fields are etc, but it has a harder task of figuring out an image, which is not part of the code and hence captcha serves to fool or impede the bot in his task since if the bot has figured out all the form fields and manages to send the form without the correct captcha generated code - an error will be generated as the graphical image code is missing and the form will not get sent.
But...and this is a big but.... the system of creating a web form in pure HTML does NOT prohibit the bot in getting to the actual form page html source and finding the email address and hence if implemented this way, captcha can prevent the bot from sending the form BUT NOT from harvesting any e-mail addresses contained inside the page source.
4.2 Creation of captcha
The first step to creating a CAPTCHA is to look at the different ways humans and machines process information. Machines follow sets of instructions. If something falls outside the realm of those instructions, the machine isn't able to compensate. A CAPTCHA designer has to take this into account when creating a test. For example, it's easy to build a program that looks at metadata -- the information on the Web that's invisible to humans but machines can read. If you create a visual CAPTCHA and the image's metadata includes the solution, your CAPTCHA will be broken in no time.
Similarly, it's unwise to build a CAPTCHA that doesn't distort letters and numbers in some way. An undistorted series of characters isn't very secure. Many computer programs can scan an image and recognize simple shapes like letters and numbers.
Installing a CAPTCHA on your Web site is as easy as copying a few lines of code into your site's HTML page. And it won't even cost you a dime -- many CAPTCHA applications are free.
One way to create a CAPTCHA is to pre-determine the images and solutions it will use. This approach requires a database that includes all the CAPTCHA solutions, which can compromise the reliability of the test. According to Microsoft Research experts Kumar Chellapilla and Patrice Simard, humans should have an 80 percent success rate at solving any particular CAPTCHA, but machines should only have a 0.01 success rate [source: Chellapilla and Simard]. If a spammer managed to find a list of all CAPTCHA solutions, he or she could create an application that bombards the CAPTCHA with every possible answer in a brute force attack. The database would need more than 10,000 possible CAPTCHAs to meet the qualifications of a good CAPTCHA.
Other CAPTCHA applications create random strings of letters and numbers. You aren't likely to ever get the same series twice. Using randomization eliminates the possibility of a brute force attack -- the odds of a bot entering the correct series of random letters are very low. The longer the string of characters, the less likely a bot will get lucky.
Can You Hear Me Now?
In many ways, audible CAPTCHAs are similar to visual ones. In a database approach, the CAPTCHA creator must pre-record a person or computer speaking every series of characters and then match them with the right solution. With a randomized approach, the creator pre-records each character individually and the application strings the characters together randomly to create CAPTCHAs.
CAPTCHAs take different approaches to distorting words. Some stretch and bend letters in weird ways, as if you're looking at the word through melted glass. Others put the word behind a crosshatched pattern of bars to break up the shape of the letters. A few use different colors or a field of dots to achieve the same effect. In the end, the goal is the same: to make it really hard for a computer to figure out what's in the CAPTCHA.
Designers can also create puzzles or problems that are easy for humans to solve. Some CAPTCHAs rely on pattern recognition and extrapolation. For example, a CAPTCHA might include a series of shapes and ask the user which shape among several choices would logically come next. The problem with this approach is that not all humans are good with these kinds of problems and the success rate for a human user can drop below 80 percent.
Next, we'll take a look at how computers can break CAPTCHAs
4.3Breaking of captcha
The challenge in breaking a CAPTCHA isn't figuring out what a message says -- after all, humans should have at least an 80 percent success rate. The really hard task is teaching a computer how to process information in a way similar to how humans think. In many cases, people who break CAPTCHAs concentrate not on making computers smarter, but reducing the complexity of the problem posed by the CAPTCHA.Let's assume you've protected an online form using a CAPTCHA that displays English words. The application warps the font slightly, stretching and bending the letters in unpredictable ways. In addition, the CAPTCHA includes a randomly generated background behind the word.
A programmer wishing to break this CAPTCHA could approach the problem in phases. He or she would need to write an algorithm -- a set of instructions that directs a machine to follow a certain series of steps. In this scenario, one step might be to convert the image in grayscale. That means the application removes all the color from the image, taking away one of the levels of obfuscation the CAPTCHA employs.
Next, the algorithm might tell the computer to detect patterns in the black and white image. The program compares each pattern to a normal letter, looking for matches. If the program can only match a few of the letters, it might cross reference those letters with a database of English words. Then it would plug in likely candidates into the submit field. This approach can be surprisingly effective. It might not work 100 percent of the time, but it can work often enough to be worthwhile to spammers.
The spammer's bot will analyze both the body and source of this page, come to some sort of conclusion and decide if and how and with what to fill out some of the fields and even maybe hijack the form and send out thousands of spam e-mails.
CHAPTER 5
Types of CAPTCHAs
CAPTCHAs are classified based on what is distorted and presented as a challenge to the user. They are:
5.1 Text CAPTCHAs
These are simple to implement. The simplest yet novel approach is to present the user with some questions which only a human user can solve. Examples of such questions are:
- What is twenty minus three?
- What is the third letter in UNIVERSITY?
- Which of Yellow, Thursday and Richard is a colour?
- If yesterday was a Sunday, what is today?
Such questions are very easy for a human user to solve, but it’s very difficult to program a computer to solve them. These are also friendly to people with visual disability – such as those with color blindness.
Other text CAPTCHAs involves text distortions and the user is asked to identify the text hidden. The various implementations are:
5.1.1Gimpy:
Gimpy is a very reliable text CAPTCHA built by CMU in collaboration with Yahoo for their Messenger service. Gimpy is based on the human ability to read extremely distorted text and the inability of computer programs to do the same. Gimpy works by choosing ten words randomly from a dictionary, and displaying them in a distorted and overlapped manner. Gimpy then asks the users to enter a subset of the words in the image. The human user is capable of identifying the words correctly, whereas a computer program cannot.
5.1.2 Ez – Gimpy:
This is a simplified version of the Gimpy CAPTCHA, adopted by Yahoo in their signup page. Ez – Gimpy randomly picks a single word from a dictionary and applies distortion to the text. The user is then asked to identify the text correctly.
5.1.3BaffleText:
This was developed by Henry Baird at University of California at Berkeley. This is a variation of the Gimpy. This doesn’t contain dictionary words, but it picks up random alphabets to create a nonsense but pronounceable text. Distortions are then added to this text and the user is challenged to guess the right word. This technique overcomes the drawback of Gimpy CAPTCHA because, Gimpy uses dictionary words and hence, clever bots could be designed to check the dictionary for the matching word by brute-force.
finansourses
5.1.4 MSN Captcha:
Microsoft uses a different CAPTCHA for services provided under MSN umbrella. These are popularly called MSN Passport CAPTCHAs. They use eight characters (upper case) and digits. Foreground is dark blue, and background is grey. Warping is used to distort the characters, to produce a ripple effect, which makes computer recognition very difficult.
XTNM5YREL9D28229B
5.2 Graphic CAPTCHAs:
Graphic CAPTCHAs are challenges that involve pictures or objects that have some sort of similarity that the users have to guess. They are visual puzzles, similar to Mensa tests. Computer generates the puzzles and grades the answers, but is itself unable to solve it.
5.2.1Bongo:
Bongo Another example of a CAPTCHA is the program we call BONGO [2]. BONGO is named after M.M. Bongard, who published a book of pattern recognition problems in the 1970s [3]. BONGO asks the user to solve a visual pattern recognition problem. It displays two series of blocks, the left and the right. The blocks in the left series differ from those in the right, and the user must find the characteristic that sets them apart. A possible left and right series is shown in Figure 2.5
These two sets are different because everything on the left is drawn with thick lines and those on the right are in thin lines. After seeing the two blocks, the user is presented with a set of four single blocks and is asked to determine to which group the each block belongs to. The user passes the test if s/he determines correctly to which set the blocks belong to. We have to be careful to see that the user is not confused by a large number of choices.
5.2.2 PIX:
5.2.2 PIX:
PIX are a program that has a large database of labeled images. All of these images are pictures of concrete objects (a horse, a table, a house, a flower). The program picks an object at random, finds six images of that object from its database, presents them to the user and then asks the question “what are these pictures of?” Current computer programs should not be able to answer this question, so PIX should be a CAPTCHA. However, PIX, as stated, is not a CAPTCHA: it is very easy to write a program that can answer the question “what are these pictures of?” Remember that all the code and data of a CAPTCHA should be publicly available; in particular, the image database that PIX uses should be public. Hence, writing a program that can answer the question “what are these pictures of?” is easy: search the database for the images presented and find their label. Fortunately, this can be fixed. One way for PIX to become a CAPTCHA is to randomly distort the images before presenting them to the user, so that computer programs cannot easily search the database for the undistorted image.
5.2.3 Audio CAPTCHAs:
The final example we offer is based on sound. The program picks a word or a sequence of numbers at random, renders the word or the numbers into a sound clip and distorts the sound clip; it then presents the distorted sound clip to the user and asks users to enter its contents. This CAPTCHA is based on the difference in ability between humans and computers in recognizing spoken language. Nancy Chan of the City University in Hong Kong was the first to implement a sound-based system of this type. The idea is that a human is able to efficiently disregard the distortion and interpret the characters being read out while software would struggle with the distortion being applied, and need to be effective at speech to text translation in order to be successful. This is a crude way to filter humans and it is not so popular because the user has to understand the language and the accent in which the sound clip is recorded.
CHAPTER 6
Applications Of Captcha’s
CAPTCHAs have several applications for practical security, including
6.1 Preventing Comment Spam in Blogs:-
Most bloggers are familiar with programs that submit bogus comments, usually for the purpose of raising search engine ranks of some website (e.g., "buy penny stocks here"). This is called comment spam. By using a CAPTCHA, only humans can enter comments on a blog. There is no need to make users sign up before they enter a comment, and no legitimate comments are ever lost!
6.2 Protecting Website Registration:-
Several companies (Yahoo!, Microsoft, etc.) offer free email services. Up until a few years ago, most of these services suffered from a specific type of attack: "bots" that would sign up for thousands of email accounts every minute. The solution to this problem was to use CAPTCHAs to ensure that only humans obtain free accounts. In general, free services should be protected with a CAPTCHA in order to prevent abuse by automated programs.
6.3 Online Polls:-
In November 1999, http://www.slashdot.org released an online poll asking which was the best graduate school in computer science (a dangerous question to ask over the web!). As is the case with most online polls, IP addresses of voters were recorded in order to prevent single users from voting more than once. However, students at Carnegie Mellon found a way to stuff the ballots using programs that voted for CMU thousands of times. CMU's score started growing rapidly. The next day, students at MIT wrote their own program and the poll became a contest between voting "bots." MIT finished with 21,156 votes, Carnegie Mellon with 21,032 and every other school with less than 1,000. Can the result of any online poll be trusted? Not unless the poll ensures that only humans can vote.
6.4 Preventing Dictionary Attacks.:-
CAPTCHAs can also be used to prevent dictionary attacks in password systems. The idea is simple: prevent a computer from being able to iterate through the entire space of passwords by requiring it to solve a CAPTCHA after a certain number of unsuccessful logins.
6.5 Search Engine Bots:-
It is sometimes desirable to keep webpages unindexed to prevent others from finding them easily. There is an html tag to prevent search engine bots from reading web pages. The tag, however, doesn't guarantee that bots won't read a web page; it only serves to say "no bots, please." Search engine bots, since they usually belong to large companies, respect web pages that don't want to allow them in. However, in order to truly guarantee that bots won't enter a web site, CAPTCHAs are needed.
6.6 Worms and Spam:-
CAPTCHAs also offer a plausible solution against email worms and spam: "I will only accept an email if I know there is a human behind the other computer." A few companies are already marketing this idea.
6.7 E-Ticketing:- Ticket brokers like Ticket Master also use CAPTCHA applications. These applications help prevent ticket scalpers from bombarding the service with massive ticket purchases for big events. Without some sort of filter, it's possible for a scalper to use a bot to place hundreds or thousands of ticket orders in a matter of seconds. Legitimate customers become victims as events sell out minutes after tickets become available. Scalpers then try to sell the tickets above face value. While CAPTCHA applications don't prevent scalping; they do make it more difficult to scalp tickets on a large scale.
CHAPTER 7
Advantages And Difficulties with Captcha
7.1 ADVANTAGES
- No one can copy and paste the CAPTCHA in the input box.
- Internet Bots can’t access these, so can be protected by the spammers to a great extent.
- Prevent automated link spam on wikis and blogs.
7.2 Difficulties with CAPTCHA
CAPTCHAs are facing few limitations.
CAPTCHAs are facing few limitations.
1. A person having sight problem will not be able to respond to CAPTCHA problems efficiently.
2. Moreover CAPTCHA is not safe against spammers; they can take an advantage of simple puzzles to crack spam and viruses. Even spammers can also develop such programs which will be able to understand the patterns of puzzles written by a particular system of CAPTCHA. And then they will be able to crack hurdle of CAPTCHA.
3 Also, some CAPTCHA images are very difficult to interpret for humans also. This complex system can turn off some important clients or users of some CAPTCHA protected access point.
CHAPTER 8
CONCLUSION
Bots, and the damage they cause, are not the fault or responsibility of individual users, and it's totally unfair to expect them to take the responsibility. They're not the fault of site owners either, but like it or not they are our responsibility -- it's we who suffer from them, we who benefit from their eradication, and therefore we who should shoulder the burden. And using interactive authentication systems such as CAPTCHA effectively cheers and motivates us and our users.
Developers will try to come up with new and better tests, and spammers will continue to find ways of cracking them; it's very much a vicious circle. Perhaps, at some point in the future, somebody will come up with a test that is truly reliable and uncrackable -- something that identifies humans in a way that cannot be faked. Maybe biometric data such as fingerprints or retina scans could factor into that somewhere; perhaps we'll have direct neural interfaces that identify the presence of brain activity.
Chapter 9
References & Bibliography