Introducing ramsey/uuid

The Boys With The Striped Pyjamas

It seems quite absurd for me to introduce ramsey/uuid, a library that saw its 1.0.0 release on July 19, 2012, and is now at version 3.4.1, having had 35 releases since its first, but what’s even more ludicrous is that I haven’t once blogged about this library. I mention it only in passing in my “Dates Are Hard” post. So, allow me to introduce you to perhaps a familiar face, an old friend, the ramsey/uuid library for PHP.

Beginnings

I’ve been asked on more than one occasion why I created ramsey/uuid. Why was it needed? Why did I open source it?

It all began with Composer. In 2012, Composer was taking off, and there was a lot of excitement around creating userland PHP packages and distributing them for others to use. I had contributed a number of times to open source projects, but I had never maintained one of my own. So, it began as an experiment. I wanted to experience what it was like to manage an open source project and accept pull requests, feedback, and bug reports from others.

Once I had resolved to create a package for this little experiment, I needed something to work on that presented a problem I felt others in the PHP community had not yet sufficiently solved. I also looked to other programming language communities to see what problems they had solved that PHP could benefit from.

At some point, I stumbled across the Java and Python UUID implementations, both of which provide rich interfaces for generating UUIDs. Aside from the PECL uuid package and a handful of small libraries generating UUIDs with mt_rand(), I couldn’t find a PHP userland implementation providing functionality similar to that of the Java and Python libraries.

I had found a problem in need of a PHP userland solution! I set to work right away, quickly releasing a 1.0.0 version. Little did I know this marked the beginning of a long road for a small package that would become popular and widely-used.

What is a UUID?

UUID is an acronym for universally unique identifier. A UUID is a 128-bit integer with some special formatting rules based on its variant and version. When presented as a string, a UUID looks something like this:

379dae82-5a2b-4c4b-8193-b8e7749a3495

A UUID aims to be practically unique such that information may be uniquely identified across distributed systems, without central coordination of identifiers. There are 1632 possible UUIDs, so it’s highly unlikely that there will be a duplicate. According to Wikipedia, for randomly generated UUIDs, “only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.”

RFC 4122 defines a specific variant of UUIDs having five versions:

  • Version 1 is constructed from the current timestamp and local machine MAC address
  • Version 2 is the DCE Security version; it is similar to version 1, but RFC 4122 does not explicitly define it, so it is left out of most implementations
  • Version 3 is constructed from a namespace and an MD5 hash of a name; given the same namespace and name, the UUID generated will always be the same
  • Version 4 is randomly-generated and is probably the most common version used
  • Version 5 is the same as version 3, but it uses SHA-1 hashing instead; it is the preferred version for name-based UUIDs

What’s In a Name?

What’s in a name? That which we call a rose
By any other name would smell as sweet.

Juliet Romeo and Juliet

If you used ramsey/uuid before the 3.x series, you’ll recall that this library began its life with the vendor name Rhumsaa. There were several problems with this name. It was too close to Ramsey, so many people assumed that’s what it was. As a result, in conversation, it was referred to as Ramsey UUID, and folks searching for it would use the search terms “Ramsey UUID.” It became very confusing, and those who knew it was Rhumsaa and not Ramsey didn’t know how to pronounce it.

I received complaints about not remembering how to spell it. I heard from others who didn’t realize I was the package maintainer. One developer even encountered a problem and assumed it was a result of my “GitHub username change;” they thought I had changed my GitHub username from rhumsaa to ramsey, breaking the location of the package in Packagist.

So, what does rhumsaa mean, after all? As I wrote in my “Wild Garlic” post, “[T]he word rhumsaa is a Manx word derived from the Old Norse hrams-á, meaning ‘wild garlic river.’ In English, this word is ramsey.”

I was attempting to be clever with my vendor name, and it caused a lot of confusion.

As we were deep in the middle of development on the 3.x series, I decided a vendor name change from 2.x to 3.x might be a good idea and mitigate a lot of this confusion. I asked on Twitter and opened Issue #48 to solicit feedback from the community for the name change. In the end, I made the decision to change the vendor name to Ramsey. I first updated my other Rhumsaa packages (i.e. ramsey/twig-codeblock, ramsey/vnderror, etc.) and then I changed the name of ramsey/uuid for the 3.x series.

I tried to make the transition as easy as possible. I’m sure there are better ways to handle changes like this, and in retrospect, I probably should have forked the package to allow projects to use both rhumsaa/uuid and ramsey/uuid together, similar to how the Guzzle project addressed a similar namespace and package name change. Nevertheless, I’ve only heard from a handful of those who’ve encountered problems with the upgrade or couldn’t upgrade yet due to other packages using the older 2.x series.

When UUIDs Collide

Shortly before the GA release of 3.0.0, I received a troublesome bug report. Issue #80 purported to show that version 4 UUID collisions were occurring on a regular basis even in small-scale tests, and as I mentioned earlier this should not be probable. After several more corroborating reports, we were faced with a conundrum.

Since I couldn’t reproduce the issue, and no one could produce a sufficient reproducible script, the issue sat around for a long time. Every couple of weeks or so, someone would chime in to ask the status or confirm they had seen collisions. It began to scare people, and I was worried that community confidence in the library was degrading. I was actually stressed by the whole situation; I wanted my library to be useful and dependable.

Finally, after many months attempting to identify the culprit—I was certain it wasn’t inside the library’s code, since ramsey/uuid relies on external random generators—I had a conversation with Willem-Jan Zijderveld and Anthony Ferrara in the #phpc channel on Freenode IRC. Willem-Jan pointed us to the OpenSSL random fork-safety issue, where the OpenSSL project explains:

Since the UNIX fork() system call duplicates the entire process state, a random number generator which does not take this issue into account will produce the same sequence of random numbers in both the parent and the child (or in multiple children), leading to cryptographic disaster (i. e. people being able to read your communications).

OpenSSL’s default random number generator mixes in the PID, which provides a certain degree of fork safety. However, once the PIDs wrap, new children will start to produce the same random sequence as previous children which had the same PID.

They go on to say:

OpenSSL cannot fix the fork-safety problem because it’s not in a position to do so.

OpenSSL was the culprit. More specifically, the use of openssl_random_pseudo_bytes() when using PHP in forked child processes, as is the case when using PHP with Apache or PHP-FPM. The processes were wrapping, so the children would produce the same random sequences as previous children with the same process IDs.

Discovering this launched discussions on what to do about OpenSSL for the paragonie/random_compat library. After that project decided to drop the use of OpenSSL as a fallback for generating random bytes, I decided to require paragonie/random_compat as a dependency and use random_bytes() as the default random generator for ramsey/uuid. I then released versions 2.9.0 and 3.3.0 to provide versions in both 2.x and 3.x to solve this problem.

It’s interesting to note that @SwiftOnSecurity picked up on the issue and posted about it:

3.4.1 and Beyond

ramsey/uuid has undergone many changes since its 1.0.0 release. That very first release had some severe limitations placed on it, due to the math involved. It also had some grievous bugs because of that math. I required that everyone using the library must use a 64-bit system, and I failed to factor in the unsignedness of the integers. Since all PHP integers are signed, this led to some serious problems in generating UUIDs.

The 2.x series of the library followed about seven months later, supporting both 64-bit and 32-bit systems and accounting for the unsignedness of UUID integers through the use of a BC math wrapper library, moontoast/math. We—for it really was a community effort—made many improvements and enhancements over the course of the 2.x series, but it was clear that more flexibility was desired, and this led to the changes in the 3.x series.

The 3.x series ushered in a great deal of flexibility through interfaces and dependency injection. While the standard public API was left unchanged, all the guts of the library were completely transformed to allow anyone to use their own random generator, time provider, MAC address provider, and more.

Now, as the library matures beyond the 3.4.1 version, I’m looking ahead to the 4.x series, and how it will further improve the library with more flexibility and closer adherence to RFC 4122, while providing some facilities to optimize UUIDs in databases, and more.

Here are a handful of the issues I’m considering for 4.0.0. You can read more and may submit your own from the ramsey/uuid GitHub issues page.

How To Use It

The ramsey/uuid library provides a static interface to create immutable UUID objects for RFC 4122 variant version 1, 3, 4, and 5 UUIDs. The preferred installation method is Composer:

composer require ramsey/uuid

After installation, simply require Composer’s autoloader (or use your own, or one provided by your framework of choice) and begin using the library right away, without any setup.

$uuid = \Ramsey\Uuid\Uuid::uuid4();
echo $uuid4->toString();

The library will make some decisions about your environment and choose the best choices for generating random or time-based UUIDs, but these are configurable. For example, if you wish to use Anthony Ferrara’s RandomLib library as the random generator, you may configure the library to do so:

$factory = new \Ramsey\Uuid\UuidFactory();
$factory->setRandomGenerator(new \Ramsey\Uuid\Generator\RandomLibAdapter());
\Ramsey\Uuid\Uuid::setFactory($factory);
$uuid = \Ramsey\Uuid\Uuid::uuid4();

If you wish to provide your own random generator, you may do so by implementing Ramsey\Uuid\Generator\RandomGeneratorInterface and setting your object as the random generator to use.

Likewise, the library supports the ability to configure the time provider. If you’d like to use the PECL uuid package, for example, to generate time-based UUIDs, this is possible.

$factory = new \Ramsey\Uuid\UuidFactory();
$factory->setTimeGenerator(new \Ramsey\Uuid\Generator\PeclUuidTimeGenerator());
\Ramsey\Uuid\Uuid::setFactory($factory);
$uuid = \Ramsey\Uuid\Uuid::uuid1();

There are a variety of other ways to configure ramsey/uuid. This example configures the library to generate a version 4 COMB sequential UUID with the timestamp as the first 48 bits.

$factory = new \Ramsey\Uuid\UuidFactory();
$generator = new \Ramsey\Uuid\Generator\CombGenerator($factory->getRandomGenerator(), $factory->getNumberConverter());
$codec = new \Ramsey\Uuid\Codec\TimestampFirstCombCodec($factory->getUuidBuilder());
$factory->setRandomGenerator($generator);
$factory->setCodec($codec);
\Ramsey\Uuid\Uuid::setFactory($factory);
$uuid = \Ramsey\Uuid\Uuid::uuid4();

Thanks

I couldn’t wrap up this post without thanking a few key project contributors. Were it not for the efforts of these folks, ramsey/uuid would not be the great library it is today.

I want to first thank Marijn Huizendveld. Marijn submitted the first pull requests to ramsey/uuid and contributed the Doctrine ORM integration that I later split out into the separate ramsey/uuid-doctrine library. It was Marijn’s participation that got me excited about collaborating on an open source project and continuing the work.

I owe a debt of gratitude to Thibaud Fabre for his instrumental work in taking ramsey/uuid to version 3. He set out to re-architect the library, providing the interfaces and structure for codecs, generators, providers, and more. I’ve learned a lot about organizing software, object-oriented programming, and dependency injection from his involvement.

Most recent, Jessica Mauerhan has been a force for improving our test suite, improving overall test coverage and adding tests for internal bits that were covered but not fully tested. I’ve learned a great deal about testing from her contributions.

Last but definitely not least, there are many more without whose contributions ramsey/uuid would be a lesser library. I am grateful to you all for your hard work and help in making ramsey/uuid an awesome library.

Cheers!