Output is generally sent from calls to
Portable Document Markup Language (PDML) is a language used for creating PDF documents. What’s best: it’s implemented entirely in PHP, and it’s extremely simple to use. All a user must do is create a document with markup similar to HTML, include one line of PHP at the top of the file, and then the file will magically render a PDF document when called from a Web browser.
PDML is a remarkably lightweight package. It only requires that the user create a PDF using a simple markup language. After glancing at PDML, other PDF-creation packages written in PHP seem to introduce needless complexity to the process of creating a PDF on the fly. For example, Listing 1 shows a very simple “Hello, World” document using PDML. So, what makes it work?
The magic behind PDML: output buffering.
What is Output Buffering?
Normally, when output is echoed or printed, it is sent immediately to PHP’s output buffer. It cannot be retrieved or changed once this occurs, and all document headers must be set before echoing or printing output. This is not the case when using output buffering.
Output buffering, put simply, is the process of delaying the transmission of output to the client. During this delay, the script may access or modify the contents of the buffer before it is sent. What’s more, the script can send the buffer all at once or in chunks, which I’ll explain later on.
In the PDML example, the markup is never sent to the client. Instead, PDML uses
ob_start() to start buffering output. Meanwhile, it passes a callback function to
ob_start()—the custom function
ob_pdml(). Now, when the output is flushed to the client—in this case, when the script is finished processing—it will first pass through
ob_pdml(). What comes out is a PDF document.
I hope it is evident how this technique can be useful for any number of applications.
As mentioned, to start buffering content, one must place a call to
ob_start(). Any output echoed or printed previous to the
ob_start() call will not be stored in the internal buffer. That is, it has already been sent to the client, even though the sending of output will actually be delayed until the script has finished running (or the buffer is full). All output after the
ob_start() call will be in the script’s local output buffer.
Aside from starting the output buffer,
ob_start() also accepts a callback function parameter, also mentioned earlier. Using a custom callback function, one can use output buffering to create one’s own markup language (as is the case with PDML), perform customized content rewriting before sending the output to a client (e.g. URL rewriting, output escaping), or implement a custom templating engine.
Sending Compressed Content
PHP comes with a built-in output buffering callback function that can be used along with
ob_start() to send gzip-compressed data to browsers. In fact, it will even detect whether the browser requests gzipped content, and if so, how to send the data—compressed or uncompressed.
For example, my browser (Mozilla Firefox) sends an
Accept-Encoding header with most requests, the value of which is “gzip,deflate”. This tells the Web server that it can compress content before sending it to the browser, which saves on bandwidth and cuts down load times. Placing the following at the top of a script will force PHP to handle the compression, which can be helpful, especially if your Web server doesn’t compress responses:
<?php ob_start('ob_gzhandler'); ?>
Now, the response will include a
Content-Encoding header with a value of “gzip”. Please note, however, that this works only for browsers that request (and can read) compressed content. All other browsers will receive uncompressed content.
Accessing the Buffer
All data stored in the buffer may be easily accessed, provided the buffer has not yet been flushed. To get the contents of the buffer at any given time, simply use
ob_get_flush(). Both of these functions return a string representing all current output in the buffer. However,
ob_get_flush() returns the buffer string and then flushes the buffer, while
ob_get_contents() leaves the buffer unchanged.
Take, for example, the following:
<?php ob_start(); echo 'Hello, World!'; $output = ob_get_contents(); ob_end_clean(); ?>
This code, when run, will not output anything. Since I have turned on output buffering with
ob_start() and cleared the buffer, turning off buffering, with
ob_end_clean(), the echo doesn’t send anything to the client. Instead, the variable
$output contains the value “Hello, World!” I simply captured the contents of the buffer with
Had I used
ob_get_flush(), the contents of the buffer would have also been sent to the client. While “Hello, World!” would have displayed in the client output, the script would still have a chance to take action on all of the data stored in
$output, which, in this case, is only “Hello, World!”
Using this technique, it is possible to control all output from an application, running it through any number of functions and processing routines. At the top of the script, use
ob_start(), and at the bottom, get the contents with
ob_get_contents(), clear and close the buffer with
ob_end_clean(). Now, we can modify everything the script intended to output.
It is also important to note that, when using this technique, document headers may be sent at anytime until the buffer is flushed, which, depending on the methods used, may not be until the very end of the script. In the example above, it is possible to place a call to
header() after the echo. However, it is not possible to buffer headers sent with
header(). These headers are still sent immediately to PHP’s output buffer and cannot be changed. As in all cases, headers must be sent before any output. With output buffering, though, output is being delayed. This is why it is possible to set headers after calls to
Sending Chunked Responses
A chunked response is one that is broken up into smaller pieces and sent separately rather than all at once. In a typical process, all output is sent to PHP’s output buffer, which usually waits to send the data to the client until the script finishes. Then, when it is sent to the client, it includes a
Content-Length header specifying the exact length of the content.
Sometimes, however, it is necessary to send data to the client before the script finishes. This is especially the case when processing large amounts of data could lead to very long page load times. Output buffering can solve this problem by providing the means to immediately flush the contents of the buffer to the Web server itself, encouraging it to send the contents immediately. I say that it “encourages” the Web server because the Web server may not always do this, as is the case when Apache is using
mod_gzip or when using certain Web servers on the Microsoft Windows platform.
Nevertheless, when using a standard Apache installation without
mod_gzip, the ability to send chunked responses can greatly improve usability and decrease load times. Listing 2 shows an example that might be used in a real-world scenario.
For the sake of argument, let’s say the fictional table “foo” contains 20,000 records. Iterating over these records may take some time. Meanwhile, without a chunked response, the user waits on this data with no real feedback that the request is being processed. However, the example in Listing 2 uses output buffering to send a chunked response using
According to the PHP manual,
flush() flushes “the output buffers of PHP and whatever backend PHP is using (CGI, a web server, etc).” Thus, it “effectively tries to push all the output so far to the user’s browser.” As mentioned earlier, this is not always the case, however.
So, in Listing 2, the buffer is being explicitly flushed to the client after every 100 records. Thus, the user receives some feedback that the request is being processed and can begin viewing records while the remainder of the script continues to process and send more data to the client.
Note that the response now contains a
Transfer-Encoding header with a value of “chunked” in lieu of the
Not to be confused with Apache’s
mod_rewrite, PHP’s output buffering functionality allows users to “rewrite” URLs by dynamically appending querystring values to URLs and adding hidden form fields in output. This works in much the same way as the session ID with
session.use_trans_sid set in
For example, consider the following HTML:
<a href="foo.php">Link</a> <form action="bar.php" method="POST"> <input type="text" name="baz" /> </form>
Now, consider that a persistent variable of some sort—perhaps an authentication token—needs to exist throughout the script in all links and forms. Simply add the following at the top of the script (or above the content where the variable should be appended):
<?php output_add_rewrite_var('token', 'abc123'); ?>
Now, the link and form will be rewritten as such:
<a href="foo.php?token=abc123">Link</a> <form action="bar.php" method="POST"> <input type="hidden" name="token" value="abc123" /> <input type="text" name="baz" /> </form>
To clear the variable(s) that you set with
output_add_rewrite_var(), ensuring that they are not appended in later parts of the script, use
The behavior of this functionality is controlled by
Content Length and Fin
Finally, it is possible to get the length of the content in the buffer with
ob_get_length() for times when it is necessary to explicitly set the
Content-Length header at the script level, among other things.
Output buffering is a surefire way to take control of your output. Implementing these techniques in your scripts can help improve the performance and, in some cases, usability of your applications.
Until next time, happy coding!