Currently browsing filter-input
Sat, 4 Mar 2006 19:40 UTC
A preview release of the Zend Framework is now available, and, so far, I must say that all looks well.
The one thing that I’m a bit curious about is the apparent removal of the Active Record implementation, Zend_Db_DataObject. The documentation for this object exists in the Programmer’s Reference Guide, but it’s nowhere to be found in the API. I wonder whether the implementation exists in a different form in Zend_Db, or was it scrapped altogether?
The Active Record implementation aside, one of the other features I was looking forward to was the Zend_InputFilter framework. I know that Chris will undoubtedly write much more about this, but I wanted to point out one very cool feature: the strict mode.
The strict mode works like this: you pass an array of tainted data (let’s say the $_POST array) to Zend_InputFilter to create a new object to access the data in a safe manner, and, then, by default, $_POST is set to NULL for the remainder of the script—you simply cannot access the raw, tainted data from $_POST. Here’s an example:
<?php
$filterPost = new Zend_InputFilter($_POST);
$username = $filterPost->isAlpha('username');
var_dump($username);
var_dump($_POST);
?>
This strict mode could be very useful in an environment with a team of application developers. Just set auto_prepend_file in php.ini to load up a script that grabs all autoglobal variables ($_POST, $_GET, $_COOKIE, etc.) and stores them to Zend_InputFilter objects, and you never have to worry about your team accessing raw data—they must always use the Zend_InputFilter object to get to the data. (There is a getRaw() method of this object, but I’ll let Chris discuss it in more detail.)
Finally, lots of folks are already talking about this. Here are some links:
No Comments
Permalink
Tags: filter-input, framework, php, security, zend, zend-framework
Thu, 17 Nov 2005 21:02 UTC
Lately, there has been a good deal of discussion on php-general concerning filtering input. Richard Lynch even tossed out a few of his ideas concerning the use of a $_CLEAN superglobal variable that would merely serve as a reminder to programmers (through its constant use in the PHP manual) to filter input as a “best practice” (see here and here). Furthermore, on Chris Shiflett’s blog, Richard comments that ”[s]urely our base solution for minimal Security should be a fundamental part of the PHP language, not some add-on second thought.”
I tend to agree with Richard, and that’s why I’ve been paying attention to the PECL Input Filter extension.
Back in October, Derick Rethans and Rasmus Lerdorf made their initial release of the PECL Input Filter extension. Since then, I’ve taken some time to play around with it, hack around it, and report a few of the bugs I’ve found, which have since been corrected in HEAD. I’m proud to say that they even used some of my patches. Nevertheless, I’m going to continue to tinker around with this extension to see what else I can break because I think it will be a good tool for promoting best practices to PHP programmers, and the more it’s tested, the better it will be.
Now, on to Richard’s point about security tools being “a fundamental part of the PHP language.” The Input Filter extension right now is only just that: an extension. Yet, recently (15 Nov), I noticed that Jani Taskinen (a.k.a. “sniper”) checked in some revisions with the comment “Prepare for including in PHP core.” This got me thinking, so I asked Derick, and Derick confirmed that the Input Filter extension will be a part of the PHP core in versions 5.1.1 and 6.0. So, there’s one of your built-in security tools right there.
So, now, let’s take a look at some code. Let’s assume that we have a form. On that form are four fields: name, age, email, and list. These are fairly self-explanatory. With name, we expect a string; with age, a number; email, an e-mail address; and with list a value of either 1, 0, yes, or no to determine whether you want to be on the mailing list (it’s a radio button, and, for the sake of argument, let’s assume that the values are “yes” and “no,” but they could be 1, 0, true, false, on, or off; any of these will filter as a BOOLEAN value).
Our processing form might start out like this:
<?php
$clean = array();
$clean['name'] = input_get(INPUT_POST, 'name', FL_REGEXP, array('regexp' => '^[\w ]+$'));
$clean['age'] = input_get(INPUT_POST, 'age', FL_INT);
$clean['email'] = input_get(INPUT_POST, 'email', FL_EMAIL);
$clean['list'] = input_get(INPUT_POST, 'list', FL_BOOLEAN);
?>
The constants passed to the function determine the type of filtering, and if the input variable matches the filter, then it returns the raw and unchanged value. If it doesn’t match, then it returns NULL. So, at worst, $clean (in this implementation) will contain a NULL value.
You may also filter script variables and even perform some sanitizing. The following example will strip the HTML tags from $name and store the value “Ben Ramsey” to $clean[‘name’].
<?php
$clean = array();
$name = '<b>Ben Ramsey</b>';
$clean['name'] = filter_data($name, FS_STRING);
?>
While I am not a big fan of sanitizing functions (I believe that programmers should use a whitelist approach and simply filter input for valid data and, on invalid data, require the user to enter valid data), I can definitely see the advantages of including these filtering functions in the core to promote best practices. It should be noted that it is just as easy to filter input without these built-in functions, but, perhaps, with the inclusion of these functions, it will encourage others to start properly filtering data.
Finally, I’d like to point out that the Input Filter extension is still in “beta” and should not be used in production environments. There are still some bugs and functionality to work out before it can be safe for production use.
UPDATE (19 Nov): Version 0.9.3, which includes several bug fixes, was released yesterday.
10 Comments »
Permalink
Tags: filter-input, pecl, php, security
Sun, 11 Sep 2005 0:49 UTC
From the introduction:
“You’ve heard a lot of buzz about security in PHP, lately, but you’re still confused about this ‘input filtering’ thing? Ben Ramsey lends a helping hand in part 2 of his mini-series on this technique.”
2 Comments »
Permalink
Tags: articles, filter-input, php, php-architect, security, tips-and-tricks
Fri, 19 Aug 2005 21:42 UTC
So, I decided to have a little bit of fun, and I was feeling creative one day. Thus, inspired by
George Schlossnagle, who’s made his own PHP T-shirts, and
a picture I took of
Chris Shiflett during his talk at OSCON, I decided to create PHP’s very own security mantra T-shirt bearing none other than the likeness of Chris Shiflett. I hope I don’t owe him money for this . . . maybe just a beer, but no money, Chris.
You can get your very own at CafePress.
2 Comments »
Permalink
Tags: filter-input, humor, php, security
Thu, 21 Jul 2005 21:27 UTC
From the introduction:
“This year has seen an increased focus on PHP security, and this is good for the language, developers, and business community. One phrase that comes to mind when discussing secure coding practices is Chris Shiflett’s mantra of ‘filter input, escape output.’ While we know what this means in a general sense, practical examples elude us. Ben Ramsey provides part one of his input filtering series, chock full of code examples.”
5 Comments »
Permalink
Tags: articles, filter-input, php, php-architect, security, tips-and-tricks
Tue, 2 Mar 2004 23:20 UTC
When relying on your end-user to supply information in the proper format, you’re S.O.L. when it comes to doing anything with that data. It’s best to use some proper PHP tools to validate your data and make sure it’s in a format your application can read before passing it along to other places. Follow me as I take a look at validating user-entered data in this first installment of what I hope to be a continuing series.
Most of the applications I work with have a Web-based front-end where users submit data. Other applications, also Web-based, involve data that has been imported from some other source, such as a text file. In both situations, the data that the application gathers is unreliable at best. I simply cannot trust a person or a text file (of data that was entered by a person) to enter data in the correct format that I require. In fact, since I’ve worked with text files that have been compiled from data entered by customer service reps, I’ve come to find that no two customer service reps enter data in the same way. In fact, most don’t enter data the same way twice!
Humans are unreliable. So, what is a programmer to do?
She must validate her data.
There are two places to validate data: the client-side and server-side. The client-side is a good choice since you’re able to check the data even as the user enters it, but you cannot rely on the user’s browser to work correctly. In fact, you cannot trust that there is a browser at all. Client-side validation may work for users who are following the proper means to enter data, but malicious hackers may send POST data through non-traditional means (not a browser). Thus, performing validation on the server-side adds an extra layer of security to your application. In short, don’t trust the user.
In this segment, I want to take a look at validating phone number data, and to valid phone numbers, we’ll be using regular expressions—or, more specifically, PCRE. There is another form of regular expression that PHP supports (POSIX extended), but I’m not going to touch on that at all. Consider POSIX extended regular expressions the weaker cousin of PCRE (I’ll likely be flamed for this). I feel that PCRE is very useful for many tasks, and you’ll see it turn up often when I talk about data validation.
The regular expression I’m using to validate phone numbers will take a US phone number and validate it according to almost any format in current use. This includes (555) 321-1234, 555-321-1234, 555 321-1234, 5553211234, 321-1234, and many more (even including formats with extensions). The expression looks like this:
/^[\(]?(\d{0,3})[\)]?[\s]?[\-]?(\d{3})[\s]?[\-]?(\d{4})[\s]?[x]?(\d*)$/
Wow! That’s Greek to me, says you. Indeed! Without turning this into a tutorial on the syntax of regular expressions, allow me to briefly examine each element of this expression.
/^ - asserts the beginning of data
[\(]? - 0 or 1 ('s
(\d{0,3}) - 0 to 3 numbers, the parentheses capture
the numbers [area code]
[\)]? - 0 or 1 )'s
[\s]? - 0 or 1 white space
[\-]? - 0 or 1 dash
(\d{3}) - exactly 3 numbers, captured [exchange]
[\s]? - 0 or 1 white space
[\-]? - 0 or 1 dash
(\d{4}) - exactly 4 numbers, captured [number]
[\s]? - 0 or 1 white space
[x]? - 0 or 1 "x"
(\d*) - 0 or more numbers, captured [extension]
$/ - asserts the end of data
Now that we have an expression that will work some magic and validate a phone number, let’s see how to put it into action.
Before we do that, though, let me digress and explain what’s going to happen in this validation process. We don’t just want to validate user input to ensure a user has entered a phone number in a proper format. If we wanted to do that, we could create three (or four) form fields (one for each element) and restrict the length of those fields. Then, on the server-side, we’d piece them together. The following method may still be used for that approach, but it is a powerful tool intended to validate phone numbers from sources that are well beyond control.
Take a text file of customer data from an electric utility, for example. I have no control over the phone number formats in the file, but I want to read the phone numbers, validate them against my regular expression, and break them up them into their individual components (area code, exchange, number, and extension). Once that’s done, I can put them back together in a unified format for entry into a database.
Here’s how to use the regular expression to do just that:
<?php
$phone_number = '(555) 321-1234';
$pattern =
'/^[\(]?(\d{0,3})[\)]?[\s]?[\-]?(\d{3})[\s]?[\-]?(\d{4})[\s]?[x]?(\d*)$/';
if (preg_match($pattern, $phone_number, $matches))
{
$phone_number = $matches[0];
$area_code = $matches[1];
$exchange = $matches[2];
$number = $matches[3];
$extension = $matches[4];
}
?>
Since $phone_number matches the pattern, preg_match() will return true. When preg_match() is true, it will pull those elements matched in the parentheses and place them into $matches, an array of the sub-pattern matches. The code listing shows what each element of the array stores. In the example, we would have the following result:
$phone_number = "(555) 321-1234"
$area_code = "555"
$exchange = "321"
$number = "1234"
$extension = empty
If $phone_number is “321-1234” without the area code, then $area_code will be blank. If the number is ”(555) 321-1234×4321,” then $extension will be “4321.” Go ahead and try the code with any US formatted number you can think of. I’m fairly certain it will work. If not, let me know; I’m always willing to revise my code.
So, the pattern matches certain elements in the phone number and separates them. You can now take each of these elements and format the number as you wish so that all the numbers in the database will follow the same pattern. Likewise, you could have a free form field—rather than three separate fields—on a Web page for users to enter a phone number. This code could validate that number and parse it for your application to use.
Validating information is an often overlooked, but essential, part of Web development. It is crucial to the integrity of data and to the security of an application to validate any input. I believe that most pass off validation out of laziness, but, as you can see, validating a phone number only adds a few extra lines of code to your application. Wrap this up in a reusable function, and you have something in your PHP toolkit that can be used over and over again with ease.
Related Web sites:
No Comments
Permalink
Tags: filter-input, pcre, phone-numbers, php, regular-expressions