CAPTCHA That Form Before It Gets Away!

This article was first published in the “Tips & Tricks” column in php|architect magazine.

Abuzz with discussions, arguments, and numerous opinions on solutions to the problem, the PHP community has been focused, lately, on how to prevent weblog comment spam and how to protect one’s forms in general—be they comment forms, e-mail forms, etc. The topic has graced the pages of blogs, and threads on the subject have adorned more than one mailing list. Some say it’s a PHP security problem; others blame the developers. But one thing is certain: it’s just plain annoying.

How can we combat comment spam or verify that those using our forms are actually doing so from our pages and not some remote script out there? I don’t pretend to have the definitive answer, and, in fact, this month’s Tips & Tricks column doesn’t attempt to provide a concrete solution, but I will point out a few erroneous practices, show how they leave forms vulnerable by providing examples of scripts that can misuse your forms, and provide a few “best practices” for securing your forms.

There are several popular methods out there for protecting Web forms. Almost all of them, however, aim to accomplish the same result, which is to determine the difference between a human and a computer (or automated script). Some scripts embed a token of some sort in the form and set a cookie or session variable. Others provide the user with a CAPTCHA (Completely Automated Turing test to tell Computers and Humans Apart) image of a word or phrase that the user must enter. Some check the Referer header. Still others implement some variant of each of these methods.

The problem is that any script can simulate a valid user (read “human”) interaction with a form, and some feel that, as long as the script is properly simulating a user session, it’s okay. Yet, if your forms are set up improperly, these user-simulating scripts can continually access your script using the same session, potentially flooding you with spam. This month’s Tips & Tricks examines three popular methods of “securing” forms and shows how to keep external scripts from posting to them.

The Embedded Token Method

The simplest and perhaps most user-friendly method to “securing” a Web form is to use what I’m referring to as the “embedded token” method.

The embedded token method is simple because it only requires a few lines of code to implement, and it’s user-friendly because it does not require any additional action from the user to validate their human identity (there is no word or phrase to type). It simply relies on the presence of a user agent (Web browser) visiting the form. The server either sets a session variable or asks the browser to set a cookie that is then checked against a hidden form field when the user submits the form. Listing 1 illustrates a very basic implementation of this method.

Listing 1.
<?php
// Embedded Token Method
session_start();
if (isset($_POST['message'])) {
if ($_POST['token'] == $_SESSION['token']) {
$message = htmlentities($_POST['message']);
echo '<h1>' . $message . '</h1>';
}
}
$token = md5(uniqid(rand()));
$_SESSION['token'] = $token;
?>
<form method="POST">
<input type="hidden" name="token" value="<?php echo $token; ?>" />
Message: <input type="text" name="message" /><br />
<input type="submit" />
</form>

The problem with the embedded token method in its most basic form is the assumption that only a Web browser can set a cookie or make use of sessions. This could not be further from the truth, since the Web server will send a Set-Cookie header to the user agent, which doesn’t necessarily have to be a browser. As long as something can parse and read HTTP response headers and send valid HTTP requests, it is a user agent—even if it’s a script (PHP or otherwise).

Listing 2 illustrates a script that uses the PEAR package HTTP_Request to send and receive valid HTTP headers, including the ability to capture the Set-Cookie header and send it back to the server in a valid request. For all intents and purposes, this script is a valid user, and the form treats it as such.

Listing 2.
<?php
require_once 'HTTP/Request.php';
$req =& new HTTP_Request('listing1-embedded_token.php');
/* Simulate a valid user and get a session */
$req->setMethod(HTTP_REQUEST_METHOD_GET);
$response = $req->sendRequest();
$regex = '/\<input type=\"hidden\" name=\"token\" value=\"(.*)\" \/\>/';
if (preg_match($regex, $req->getResponseBody(), $matches)) {
$token = $matches[1];
}
$cookies = $req->getResponseCookies();
foreach ($cookies as $cookie) {
if (strcmp($cookie['name'], 'PHPSESSID') == 0) {
$session_id = $cookie['value'];
}
}
/* POST to the form with the session */
$req->setMethod(HTTP_REQUEST_METHOD_POST);
$req->addCookie('PHPSESSID', $session_id);
$req->addPostData('message', 'I simulated a user!');
$req->addPostData('token', $token);
$response = $req->sendRequest();
echo $req->getResponseBody();
?>

This script, of course, assumes the presence of the “token” form field and assumes that this form field will never change in any way—the name will always be “token,” and it will always exist in its present form. It uses a regular expression to then grab the actual token from the form field to send it back in a subsequent POST action. Now, this regular expression could be much more complex to accommodate for changing parameters within the form field so that it is not so limited to the field that it must find.

However, as I see it, there must be a constant in order for this type of simulated form post to work: the field name. If the field name of the token is always constant, then this external post to the form will always work. If you work out a way to randomize the field name, then you can block external scripts from making use of yours. Randomizing the field name may seem like a superfluous extra step to block others from using your scripts, but it could save you from unnecessary spam, flooding, or even being used as a spam e-mail relay.

The Referrer Check Method

Another common approach to blocking scripts from using your forms is to check the Referer header using $_SERVER['HTTP_REFERER']. This is often a suggested method that many believe will completely block external scripts from using your forms. However, just about every server-side scripting language has the ability to modify the HTTP Referer header—and I’m told even some proxies will change it, as well.

Let’s take our example in Listing 1. It’s simple to modify the code to check the Referer. Just modify the if statement checking for the posted “message” field to include a second check against the Referer header, as shown here:

if (isset($_POST['message'])
    && preg_match(""/^http:\/\/benramsey.com/", $_SERVER['HTTP_REFERER'])) {

Now, the script will only process the form if the Referer matches any page from http://benramsey.com. Of course, if the Referer were from http://www.benramsey.com, it would fail, but the regular expression used here is simple; it can be made more complex to allow for other variations of domain names.

Just as the Referer check method is easy to implement, it’s similarly easy to fake a Referer header with PEAR::HTTP_Request. Adding the following line of code to the POST request in Listing 2 will trick the form into thinking that the POST it’s receiving is being sent from http://benramsey.com when, in reality, it could be sent from anywhere on the Web.

$req->addHeader('REFERER', 'http://benramsey.com');

The Referer header is not a good safeguard for your scripts. It’s too easy to manipulate, and this is not a fault of PHP—almost every scripting language can do this.

The CAPTCHA

CAPTCHAs are quickly becoming a preferred method of determining whether a form post is from a valid user or a script. Their popularity has also led to great annoyances caused by unfriendly user experiences due to the terrible readability of most CAPTCHA images. Nevertheless, the CAPTCHA seems here to stay.

For the most part, the CAPTCHA image is an effective means of blocking external scripts from using your forms. However, I have seen several implementations that leave much to be desired from the programmer.

For example, I have seen scripts that simply embed the actual CAPTCHA phrase in a hidden field. In this case, a script such as the one shown in Listing 2 can easily grab the phrase and return it in a post to the form. This form of security does nothing to hinder external scripts from using your forms. It merely gives the appearance of tighter control while aggravating your real users who must squint to guess at the CAPTCHA phrases. Never store your CAPTCHA phrase in a hidden field. If you must do so, use md5() and salt to disguise the word or phrase.

Listing 3 uses PEAR::Text_CAPTCHA to create a simple CAPTCHA test. Much like the example from Listing 1, it sets the phrase to a session variable for checking against the posted user input. Instead of placing the phrase in a hidden form field like the token, however, the user is required to enter the word or phrase here. Already, the security is increased because external scripts cannot request this page and grab the phrase from the code as shown in Listing 2. However, not everything is perfect here.

Listing 3.
<?php
session_start();
if (isset($_POST['phrase'])
&& isset($_SESSION['phrase'])
&& strcmp($_POST['phrase'], $_SESSION['phrase']) == 0
) {
echo '<h1>They match!</h1>';
} else {
require_once 'Text/CAPTCHA.php';
$captcha = Text_CAPTCHA::factory('Image');
$captcha->init(200, 80);
$_SESSION['phrase'] = $captcha->getPhrase();
?>
<form method="POST">
<img src="data:image/png;base64, <?php echo chunk_split(base64_encode($captcha->getCAPTCHAAsPNG())); ?>" /><br />
<input type="text" name="phrase" /> <input type="submit" />
</form>
<?php
}
?>

If a malicious user is feeling rather, well, malicious, he can manually access this form on your site through a Web browser and grab the session ID, which is automatically saved to a cookie on his machine. He can also make note of the CAPTCHA phrase and then leave your site without otherwise touching the form. Now, armed with a session ID and phrase, he can use the code in Listing 4 to simulate a normal user posting to your form and entering a proper CAPTCHA phrase. As long as the session ID remains active on the server, the CAPTCHA phrase will work.

Listing 4.
<?php
require_once 'HTTP/Request.php';
$req =& new HTTP_Request('listing3-CAPTCHA.php');
$req->setMethod(HTTP_REQUEST_METHOD_POST);
$req->addCookie('PHPSESSID', '1nkh91unrmh8d6fr4brri4tli1');
$req->addPostData('phrase', 'traiwrou');
$response = $req->sendRequest();
echo $req->getResponseBody();
?>

This may not seem like a big deal since it’s a lot of work for someone to go through simply to flood your site with posts, but it is an opportunity that you will want to close to outside scripts, and this is easy to do. All you must do is unset the session variable after processing the form.

unset($_SESSION['phrase']);

An external script will now be able to fool your CAPTCHA exactly once, but the phrase will no longer be valid in the session after its use, so the script cannot continue to post to your form.

This seems like a no-brainer, but I’m amazed at how often I see this simple step left out of code examples and actual production code. It’s not a hard thing to do, and it doesn’t take rocket science, but it’s an often-overlooked practice.

The Security Question

Throughout this column, I’ve been referring to these examples as being “insecure” and giving you tips on how to “secure” the code. In reality, these are not true security concerns. Left unchecked, your server or database will not be open to attacks. However, your web site forms may be open to spamming and flooding—and you could potentially be used as an e-mail relay, depending on how your forms are set up.

In general—and as a related aside—you should never use a “form mail” script that requires a hidden form field for a To address. Even if the script checks the Referer header, you are vulnerable as a spam relay—it’s happened to me. Instead, always set the To address from the server-side and within the actual PHP code.

Smart Programming

In this column—my debut effort for Tips & Tricks—I’ve given several examples of how external scripts can use your forms even when you’re sure they can’t. Plus, I’ve shown you how to use PEAR::HTTP_Request to simulate a valid user and act as a user agent. I’ve shown more tricks than I have tips, but in the end, being a smart programmer is the key. It is my hope that you’ll take these few tips and expound upon them as you program applications. Being a smart programmer means thinking through the problem and even considering how others may abuse your application. Only then will you be able to tackle real security problems head on.

Until next time, be sure to practice safe coding!