PECL Input Filter

Lately, there has been a good deal of discussion on php-general concerning filtering input. Richard Lynch even tossed out a few of his ideas concerning the use of a $_CLEAN superglobal variable that would merely serve as a reminder to programmers (through its constant use in the PHP manual) to filter input as a “best practice” (see here and here). Furthermore, on Chris Shiflett’s blog, Richard comments that “[s]urely our base solution for minimal Security should be a fundamental part of the PHP language, not some add-on second thought.”

I tend to agree with Richard, and that’s why I’ve been paying attention to the PECL Input Filter extension.

Back in October, Derick Rethans and Rasmus Lerdorf made their initial release of the PECL Input Filter extension. Since then, I’ve taken some time to play around with it, hack around it, and report a few of the bugs I’ve found, which have since been corrected in HEAD. I’m proud to say that they even used some of my patches. Nevertheless, I’m going to continue to tinker around with this extension to see what else I can break because I think it will be a good tool for promoting best practices to PHP programmers, and the more it’s tested, the better it will be.

Now, on to Richard’s point about security tools being “a fundamental part of the PHP language.” The Input Filter extension right now is only just that: an extension. Yet, recently (15 Nov), I noticed that Jani Taskinen (a.k.a. “sniper”) checked in some revisions with the comment “Prepare for including in PHP core.” This got me thinking, so I asked Derick, and Derick confirmed that the Input Filter extension will be a part of the PHP core in versions 5.1.1 and 6.0. So, there’s one of your built-in security tools right there.

So, now, let’s take a look at some code. Let’s assume that we have a form. On that form are four fields: name, age, email, and list. These are fairly self-explanatory. With name, we expect a string; with age, a number; email, an e-mail address; and with list a value of either 1, 0, yes, or no to determine whether you want to be on the mailing list (it’s a radio button, and, for the sake of argument, let’s assume that the values are “yes” and “no,” but they could be 1, 0, true, false, on, or off; any of these will filter as a BOOLEAN value).

Our processing form might start out like this:

$clean = array();
$clean['name'] = input_get(INPUT_POST, 'name', FL_REGEXP, array('regexp' => '^[\w ]+$'));
$clean['age'] = input_get(INPUT_POST, 'age', FL_INT);
$clean['email'] = input_get(INPUT_POST, 'email', FL_EMAIL);
$clean['list'] = input_get(INPUT_POST, 'list', FL_BOOLEAN);

The constants passed to the function determine the type of filtering, and if the input variable matches the filter, then it returns the raw and unchanged value. If it doesn’t match, then it returns NULL. So, at worst, $clean (in this implementation) will contain a NULL value.

You may also filter script variables and even perform some sanitizing. The following example will strip the HTML tags from $name and store the value “Ben Ramsey” to $clean['name'].

$clean = array();
$name = '<b>Ben Ramsey</b>';
$clean['name'] = filter_data($name, FS_STRING);

While I am not a big fan of sanitizing functions (I believe that programmers should use a whitelist approach and simply filter input for valid data and, on invalid data, require the user to enter valid data), I can definitely see the advantages of including these filtering functions in the core to promote best practices. It should be noted that it is just as easy to filter input without these built-in functions, but, perhaps, with the inclusion of these functions, it will encourage others to start properly filtering data.

Finally, I’d like to point out that the Input Filter extension is still in “beta” and should not be used in production environments. There are still some bugs and functionality to work out before it can be safe for production use.

UPDATE (19 Nov): Version 0.9.3, which includes several bug fixes, was released yesterday.