All web developers should be familiar with the
POST methods. These are the primary methods used in everyday development on the Web. Even if you know nothing about HTTP, you’ve at least seen
form examples using either
post as the value of the
method attribute. All too often, though, I find that those who build web applications know far too little about the protocol that powers the Web: HTTP. I think all web developers should have at least a rudimentary understanding of the technology that earns their bacon.
Checking Out What’s Under the Hood
When you plug a URL into your browser’s address bar, what it really does is makes an HTTP request—many times, a series of them. So, let’s forget the browser for just a moment. Open up a command prompt and launch a telnet session to connect to
phpadvent.org. The command looks something like this:
$ telnet phpadvent.org 80 Trying 22.214.171.124... Connected to phpadvent.org. Escape character is '^]'.
This should leave your cursor sitting there, awaiting input. So, type the following:
GET / HTTP/1.1 Host: phpadvent.org
When you reach the end of that last line, return twice. You should see a response that comes back looking like this:
HTTP/1.1 302 Found Date: Thu, 18 Dec 2008 04:23:29 GMT Server: Apache/2.2.6 (Unix) X-Powered-By: PHP/5.2.5 Location: http://phpadvent.org/2008 Content-Length: 0 Connection: close Content-Type: text/html
Without going into too much detail about this response, I want to point out the status code and the
Location header. The status code in this case is
302 Found, which means that the resource requested—in this example, it’s
/—exists temporarily at another location, specified by the
Location header. The browser knows what to do with this, so it makes a second request for
http://phpadvent.org/2008. Through telnet, we can make the raw request like this:
GET /2008 HTTP/1.1 Host: phpadvent.org
What we get back is a
200 OK response with a full HTML body.
This is standard redirection. I make a
GET request, and the server tells me to request from a different location. The server itself does not perform the redirection. That is up to the client (browser).
I won’t spend too much time talking about what
POST mean. You already know these methods. The
GET method is used for retrieval, while the
POST method is used to indicate a resource on the server that should take care of processing some data that we’re sending to it.
What’s important to note here is that the HTTP specification clearly states that
GET “SHOULD NOT have the significance of taking an action other than retrieval.” (Steps up on the soap box.) Web developers violate this every time we create a link on a page that a user clicks on to rate something, increase a counter, purchase a book, etc. The fact of the matter is this: if it uses a
GET request to take any action other than retrieval, then it’s wrong.
“But why is it wrong?” you ask.
It’s not wrong because someone sat in their ivory tower and mandated that it is so. The HTTP designers designed
GET as a “safe” method, allowing browsers to represent
POST in a special way to make the user aware that they are requesting a potentially unsafe action. This does not mean that
GET cannot have side effects on the server, but it does mean that the user may not be aware that the request for those side effects was made and cannot be held responsible for it.
If web developers forced the use of
POST for these kinds of actions rather than using
GET requests, then the browser could at least notify the user that some action is about to be made, and they could confirm or cancel the request.
Idempotence: Not a Sexual Dysfunction
This leads to the concept of idempotence. Pronounce it at your own risk.
HTTP methods that are said to be idempotent are so termed because “(aside from error or expiration issues) the side effects of
N > 0 identical requests is the same as for a single request.” In layman’s terms, what this means is: when I make a request ten times, the side effects of that request are exactly the same on the tenth time as they are on the first time.
GET is considered idempotent, as are
DELETE, which I won’t be discussing in this post.
POST is not idempotent.
GET is considered “safe” and for retrieval only, then it is by nature idempotent. Every time I request a resource with
GET, it will always retrieve that resource with no side effects. We break this property of
GET when we attempt to make
GET do more than simple retrieval.
Consider the URL
http://example.org/count. Making a request to this URL should increment a counter and return the new value. The first time I request this URL, the value returned might be “1,” but the tenth time I make this request, the value would be “10.” The property of idempotence dictates that the value returned on the tenth request should be the same as that returned on the first request, provided the request is identical. Therefore, if I use a
GET request to retrieve the value stored at
http://example.org/count, it should always return the same value, provided no one makes a
POST request in the meantime to increment the counter and change its state, but that’s what
POST is for.
Practice Safe Web
Following the HTTP specification precisely and respecting the “safe” and idempotent nature of
GET, while using
POST to manipulate data, will make your web applications safer, but it’s not a security measure to protect your sites against attacks, so please don’t misunderstand. What your application gains in safety is the browser’s ability to notify the user that potentially unsafe actions are about to occur, while limiting an attacker’s ability to manipulate data through
Furthermore, you’ll feel better about yourself because you’re doing the Right Thing™ by following the standard.