Making It Valid: Telephone Numbers

When relying on your end-user to supply information in the proper format, you’re S.O.L. when it comes to doing anything with that data. It’s best to use some proper PHP tools to validate your data and make sure it’s in a format your application can read before passing it along to other places. Follow me as I take a look at validating user-entered data in this first installment of what I hope to be a continuing series.

Most of the applications I work with have a Web-based front-end where users submit data. Other applications, also Web-based, involve data that has been imported from some other source, such as a text file. In both situations, the data that the application gathers is unreliable at best. I simply cannot trust a person or a text file (of data that was entered by a person) to enter data in the correct format that I require. In fact, since I’ve worked with text files that have been compiled from data entered by customer service reps, I’ve come to find that no two customer service reps enter data in the same way. In fact, most don’t enter data the same way twice!

Humans are unreliable. So, what is a programmer to do?

She must validate her data.

There are two places to validate data: the client-side and server-side. The client-side is a good choice since you’re able to check the data even as the user enters it, but you cannot rely on the user’s browser to work correctly. In fact, you cannot trust that there is a browser at all. Client-side validation may work for users who are following the proper means to enter data, but malicious hackers may send POST data through non-traditional means (not a browser). Thus, performing validation on the server-side adds an extra layer of security to your application. In short, don’t trust the user.

In this segment, I want to take a look at validating phone number data, and to valid phone numbers, we’ll be using regular expressions – or, more specifically, PCRE. There is another form of regular expression that PHP supports (POSIX extended), but I’m not going to touch on that at all. Consider POSIX extended regular expressions the weaker cousin of PCRE (I’ll likely be flamed for this). I feel that PCRE is very useful for many tasks, and you’ll see it turn up often when I talk about data validation.

The regular expression I’m using to validate phone numbers will take a US phone number and validate it according to almost any format in current use. This includes (555) 321-1234, 555-321-1234, 555 321-1234, 5553211234, 321-1234, and many more (even including formats with extensions). The expression looks like this:

/^[\(]?(\d{0,3})[\)]?[\s]?[\-]?(\d{3})[\s]?[\-]?(\d{4})[\s]?[x]?(\d*)$/

Wow! That’s Greek to me, says you. Indeed! Without turning this into a tutorial on the syntax of regular expressions, allow me to briefly examine each element of this expression.

/^ - asserts the beginning of data
[\(]? - 0 or 1 ('s
(\d{0,3}) - 0 to 3 numbers, the parentheses capture
the numbers [area code]
[\)]? - 0 or 1 )'s
[\s]? - 0 or 1 white space
[\-]? - 0 or 1 dash
(\d{3}) - exactly 3 numbers, captured [exchange]
[\s]? - 0 or 1 white space
[\-]? - 0 or 1 dash
(\d{4}) - exactly 4 numbers, captured [number]
[\s]? - 0 or 1 white space
[x]? - 0 or 1 "x"
(\d*) - 0 or more numbers, captured [extension]
$/ - asserts the end of data

Now that we have an expression that will work some magic and validate a phone number, let’s see how to put it into action.

Before we do that, though, let me digress and explain what’s going to happen in this validation process. We don’t just want to validate user input to ensure a user has entered a phone number in a proper format. If we wanted to do that, we could create three (or four) form fields (one for each element) and restrict the length of those fields. Then, on the server-side, we’d piece them together. The following method may still be used for that approach, but it is a powerful tool intended to validate phone numbers from sources that are well beyond control.

Take a text file of customer data from an electric utility, for example. I have no control over the phone number formats in the file, but I want to read the phone numbers, validate them against my regular expression, and break them up them into their individual components (area code, exchange, number, and extension). Once that’s done, I can put them back together in a unified format for entry into a database.

Here’s how to use the regular expression to do just that:

<?php
$phone_number = '(555) 321-1234';
$pattern =
'/^[\(]?(\d{0,3})[\)]?[\s]?[\-]?(\d{3})[\s]?[\-]?(\d{4})[\s]?[x]?(\d*)$/';
if (preg_match($pattern, $phone_number, $matches))
{
// we have a match, dump sub-patterns to $matches
$phone_number = $matches[0]; // original number
$area_code = $matches[1]; // 3-digit area code
$exchange = $matches[2]; // 3-digit exchange
$number = $matches[3]; // 4-digit number
$extension = $matches[4]; // extension
}
?>

Since $phone_number matches the pattern, preg_match() will return true. When preg_match() is true, it will pull those elements matched in the parentheses and place them into $matches, an array of the sub-pattern matches. The code listing shows what each element of the array stores. In the example, we would have the following result:

$phone_number = "(555) 321-1234"
$area_code = "555"
$exchange = "321"
$number = "1234"
$extension = empty

If $phone_number is “321-1234” without the area code, then $area_code will be blank. If the number is “(555) 321-1234 x4321,” then $extension will be “4321.” Go ahead and try the code with any US formatted number you can think of. I’m fairly certain it will work. If not, let me know; I’m always willing to revise my code.

So, the pattern matches certain elements in the phone number and separates them. You can now take each of these elements and format the number as you wish so that all the numbers in the database will follow the same pattern. Likewise, you could have a free form field – rather than three separate fields – on a Web page for users to enter a phone number. This code could validate that number and parse it for your application to use.

Validating information is an often overlooked, but essential, part of Web development. It is crucial to the integrity of data and to the security of an application to validate any input. I believe that most pass off validation out of laziness, but, as you can see, validating a phone number only adds a few extra lines of code to your application. Wrap this up in a reusable function, and you have something in your PHP toolkit that can be used over and over again with ease.

Related Web sites: