Output Buffering

This article was first published in the “Tips & Tricks” column in php|architect magazine.

Output is generally sent from calls to echo or print, or from outside PHP code blocks, and once it’s sent, it’s gone. However, using PHP’s output buffering functionality, it is possible to capture this output and further manipulate it before sending to the client. In this month’s Tips & Tricks, I’ll show you why and how to control output with output buffering.

Portable Document Markup Language (PDML) is a language used for creating PDF documents. What’s best: it’s implemented entirely in PHP, and it’s extremely simple to use. All a user must do is create a document with markup similar to HTML, include one line of PHP at the top of the file, and then the file will magically render a PDF document when called from a Web browser.

PDML is a remarkably lightweight package. It only requires that the user create a PDF using a simple markup language. After glancing at PDML, other PDF-creation packages written in PHP seem to introduce needless complexity to the process of creating a PDF on the fly. For example, Listing 1 shows a very simple “Hello, World” document using PDML. So, what makes it work?

The magic behind PDML: output buffering.

Listing 1.
<?php require_once 'pdml.php'; ?>
<pdml>
<body>
<font face="Arial" size="16pt">Hello, World!</font>
</body>
</pdml>

What is Output Buffering?

Normally, when output is echoed or printed, it is sent immediately to PHP’s output buffer. It cannot be retrieved or changed once this occurs, and all document headers must be set before echoing or printing output. This is not the case when using output buffering.

Output buffering, put simply, is the process of delaying the transmission of output to the client. During this delay, the script may access or modify the contents of the buffer before it is sent. What’s more, the script can send the buffer all at once or in chunks, which I’ll explain later on.

In the PDML example, the markup is never sent to the client. Instead, PDML uses ob_start() to start buffering output. Meanwhile, it passes a callback function to ob_start()—the custom function ob_pdml(). Now, when the output is flushed to the client—in this case, when the script is finished processing—it will first pass through ob_pdml(). What comes out is a PDF document.

I hope it is evident how this technique can be useful for any number of applications.

Start Buffering

As mentioned, to start buffering content, one must place a call to ob_start(). Any output echoed or printed previous to the ob_start() call will not be stored in the internal buffer. That is, it has already been sent to the client, even though the sending of output will actually be delayed until the script has finished running (or the buffer is full). All output after the ob_start() call will be in the script’s local output buffer.

Aside from starting the output buffer, ob_start() also accepts a callback function parameter, also mentioned earlier. Using a custom callback function, one can use output buffering to create one’s own markup language (as is the case with PDML), perform customized content rewriting before sending the output to a client (e.g. URL rewriting, output escaping), or implement a custom templating engine.

Sending Compressed Content

PHP comes with a built-in output buffering callback function that can be used along with ob_start() to send gzip-compressed data to browsers. In fact, it will even detect whether the browser requests gzipped content, and if so, how to send the data—compressed or uncompressed.

For example, my browser (Mozilla Firefox) sends an Accept-Encoding header with most requests, the value of which is “gzip,deflate”. This tells the Web server that it can compress content before sending it to the browser, which saves on bandwidth and cuts down load times. Placing the following at the top of a script will force PHP to handle the compression, which can be helpful, especially if your Web server doesn’t compress responses:

<?php
ob_start('ob_gzhandler');
?>

Now, the response will include a Content-Encoding header with a value of “gzip”. Please note, however, that this works only for browsers that request (and can read) compressed content. All other browsers will receive uncompressed content.

Accessing the Buffer

All data stored in the buffer may be easily accessed, provided the buffer has not yet been flushed. To get the contents of the buffer at any given time, simply use ob_get_contents() or ob_get_flush(). Both of these functions return a string representing all current output in the buffer. However, ob_get_flush() returns the buffer string and then flushes the buffer, while ob_get_contents() leaves the buffer unchanged.

Take, for example, the following:

<?php
ob_start();
echo 'Hello, World!';
$output = ob_get_contents();
ob_end_clean();
?>

This code, when run, will not output anything. Since I have turned on output buffering with ob_start() and cleared the buffer, turning off buffering, with ob_end_clean(), the echo doesn’t send anything to the client. Instead, the variable $output contains the value “Hello, World!” I simply captured the contents of the buffer with ob_get_contents().

Had I used ob_get_flush(), the contents of the buffer would have also been sent to the client. While “Hello, World!” would have displayed in the client output, the script would still have a chance to take action on all of the data stored in $output, which, in this case, is only “Hello, World!”

Using this technique, it is possible to control all output from an application, running it through any number of functions and processing routines. At the top of the script, use ob_start(), and at the bottom, get the contents with ob_get_contents(), clear and close the buffer with ob_end_clean(). Now, we can modify everything the script intended to output.

For example, regular expression matching with the Perl-Compatible Regular Expression (PCRE) library is often used on buffered data to replace certain content, such as HTML or Javascript in output. For that matter, the full content of $output may be passed through htmlentities() or htmlspecialchars().

It is also important to note that, when using this technique, document headers may be sent at anytime until the buffer is flushed, which, depending on the methods used, may not be until the very end of the script. In the example above, it is possible to place a call to header() after the echo. However, it is not possible to buffer headers sent with header(). These headers are still sent immediately to PHP’s output buffer and cannot be changed. As in all cases, headers must be sent before any output. With output buffering, though, output is being delayed. This is why it is possible to set headers after calls to echo or print.

Sending Chunked Responses

A chunked response is one that is broken up into smaller pieces and sent separately rather than all at once. In a typical process, all output is sent to PHP’s output buffer, which usually waits to send the data to the client until the script finishes. Then, when it is sent to the client, it includes a Content-Length header specifying the exact length of the content.

Sometimes, however, it is necessary to send data to the client before the script finishes. This is especially the case when processing large amounts of data could lead to very long page load times. Output buffering can solve this problem by providing the means to immediately flush the contents of the buffer to the Web server itself, encouraging it to send the contents immediately. I say that it “encourages” the Web server because the Web server may not always do this, as is the case when Apache is using mod_gzip or when using certain Web servers on the Microsoft Windows platform.

Nevertheless, when using a standard Apache installation without mod_gzip, the ability to send chunked responses can greatly improve usability and decrease load times. Listing 2 shows an example that might be used in a real-world scenario.

For the sake of argument, let’s say the fictional table “foo” contains 20,000 records. Iterating over these records may take some time. Meanwhile, without a chunked response, the user waits on this data with no real feedback that the request is being processed. However, the example in Listing 2 uses output buffering to send a chunked response using flush().

According to the PHP manual, flush() flushes “the output buffers of PHP and whatever backend PHP is using (CGI, a web server, etc).” Thus, it “effectively tries to push all the output so far to the user’s browser.” As mentioned earlier, this is not always the case, however.

So, in Listing 2, the buffer is being explicitly flushed to the client after every 100 records. Thus, the user receives some feedback that the request is being processed and can begin viewing records while the remainder of the script continues to process and send more data to the client.

Note that the response now contains a Transfer-Encoding header with a value of “chunked” in lieu of the Content-Length header.

Listing 2.
<?php
ob_start();
try
{
$i = 0;
$dbh = new PDO('mysql:host=localhost;dbname=test', $user, $pass);
foreach ($dbh->query('SELECT * FROM foo') as $row) {
print_r($row);
$i++;
if ($i % 100 == 0) flush();
}
flush();
$dbh = NULL;
} catch (PDOException $e) {
print 'Error: ' . $e->getMessage();
}
?>

URL Rewriting

Not to be confused with Apache’s mod_rewrite, PHP’s output buffering functionality allows users to “rewrite” URLs by dynamically appending querystring values to URLs and adding hidden form fields in output. This works in much the same way as the session ID with session.use_trans_sid set in php.ini.

For example, consider the following HTML:

<a href="foo.php">Link</a>
<form action="bar.php" method="POST">
    <input type="text" name="baz" />
</form>

Now, consider that a persistent variable of some sort—perhaps an authentication token—needs to exist throughout the script in all links and forms. Simply add the following at the top of the script (or above the content where the variable should be appended):

<?php
output_add_rewrite_var('token', 'abc123');
?>

Now, the link and form will be rewritten as such:

<a href="foo.php?token=abc123">Link</a>
<form action="bar.php" method="POST">
    <input type="hidden" name="token" value="abc123" />
    <input type="text" name="baz" />
</form>

To clear the variable(s) that you set with output_add_rewrite_var(), ensuring that they are not appended in later parts of the script, use output_reset_rewrite_vars().

The behavior of this functionality is controlled by url_rewriter.tags in php.ini.

Content Length and Fin

Finally, it is possible to get the length of the content in the buffer with ob_get_length() for times when it is necessary to explicitly set the Content-Length header at the script level, among other things.

Output buffering is a surefire way to take control of your output. Implementing these techniques in your scripts can help improve the performance and, in some cases, usability of your applications.

Until next time, happy coding!