A rev="canonical" Rebuttal

There’s a lot being said about rev="canonical". Others have already explained what it is and stated the arguments for it, so I won’t go into all of that, but I would like to offer a rebuttal – to play devil’s advocate, so to speak – in hopes that we’ll all slow down and think about what we’re doing before we jump all-in and start implementing something that may not be a good standard for the Web, leading to more problems down the road.

First, let’s look at rev="canonical" from the perspective of a purist. HTML 5 does not include the rev attribute on the link or a tags. It was dropped, and there has been a lot of discussion about this, so to reintroduce it at this point is a fruitless effort. The community has already decided against it. Why bring it back to the table?

Furthermore, I thought we had moved beyond encouraging people to break the standards. Rather, we want to encourage people to follow standards and not make their own. Creating your own standards leads to differentiation and specialization in client applications (browsers), and some browsers will end up supporting the new features, while others will not. The frustrations faced by client-side developers attempting to program for multiple clients have taught us that this is not desirable.

That said, if the microformats and HTML 5 communities are interested in revisiting this and considering rev for inclusion in HTML 5, then this is solved, but I still think there are some grievous pragmatic problems with rev="canonical".

The first is this: rev is too damned confusing to understand. If it takes a two-hour conversation on IRC to explain what rev="canonical" means, then something is wrong. Developers should be able to understand the semantic meaning of rev="canonical" at first sight and without the need to dig through multitudes of documentation and blog posts to grok the concept of rev.

I think this confusion will ultimately lead to problems that render the value of rev="canonical" as meaningless to clients and search engines. What rev="canonical" really means is this: “I (the URL of the current document) am the canonical URL for that URL over there (the one specified in the href of the link tag).” What I think will happen, though, is that people will misunderstand this, as previous usage of rev has shown. This misunderstanding could lead to the following improper implementations.

Let’s say, for example, that the current document’s canonical URL is http://example.org/2009/04/10/a-rebuttal-for-rev-canonical. A shortened form might be http://example.org/revcanonical. Knowing this, the correct implementation for rev="canonical" would be:

<link rev="canonical" href="http://example.org/revcanonical" />

Thus, when a client reads http://example.org/2009/04/10/a-rebuttal-for-rev-canonical, it sees this link and understands that the current URL is the canonical URL for http://example.org/revcanonical.

I foresee that implementers could easily misunderstand this and implement it like this:

<link rev="canonical" href="http://example.org/2009/04/10/a-rebuttal-for-rev-canonical" />

In this case, the link is self-referential, and the value of rev="canonical" is lost, since no short form is specified. However, this won’t lead to any problems, since the default URL for a document not containing rev="canonical" is the original URL, according to RevCanonical.

What will lead to problems is when people misunderstand rev, thinking it to mean rel – after all, rev isn’t in the HTML 5 spec, so maybe rel will work (or so the thinking may go) – so they implement it like this:

<link rel="canonical" href="http://example.org/revcanonical" />

This means something entirely different. This tells Google (and maybe other search engines) that the canonical URL of the current document is the value of the href. It is the inverse of rev="canonical". This might not lead to a problem quite as drastic as the linkrot apocalypse, but it might lead to inaccurate URLs being stored in search engines and could negatively affect your SEO.

Finally, earlier I said that rev="canonical" means “I am the canonical URL for that URL over there.” In this case, “that URL over there” just happens to be a shorter one, but semantically, that’s not what rev="canonical" means, and here is another problem with this approach.

A canonical URL is just that: canonical. It is the primary URL used to refer to the resource. All other URLs referring to the resource are secondary and unimportant (except insomuch as they direct us to the primary URL). In fact, there could be infinite secondary URLs that direct clients to the canonical one. By specifying rev="canonical", you assign importance to the link identified by the href, but you don’t express why it is important, except to say that it is another URL that points to this canonical one. In fact, you could have hundreds of rev="canonical" links for any particular document. How would an implementer choose the proper one to use as the short URL?

This is why I think better semantics are necessary. I see no need to specify that “this is the canonical URL for that URL over there.” If things are set up properly, then “that URL over there” will properly tell search engines and clients that it isn’t the canonical URL by responding with a 301 or 302 redirect. Instead, the canonical URL should tell clients that there is a preferred shorter form of the URL that may be used if desired, and I think the best way to do that is with a rel attribute, specifying an alternate URL form for the current document. The RevCanonical folks also identify this form:

<link rel="alternate shorter" href="http://example.org/revcanonical" />

All of the aforementioned problems are solved with this usage: it doesn’t break the HTML 5 standard, it isn’t confusing and can be understood by developers without the need for long discussions, and it doesn’t imply that the current document is attempting to identify all URLs for which it is the canonical one.

So, why does RevCanonical specify two forms that serve the same purpose, and why do they advocate for the one that violates the HTML 5 spec and is confusing as hell to explain when there is already a form (suggested by themselves) that doesn’t violate the standard and is easy to understand?

Next steps: I’ve already added the “shorter” rel attribute to the WHATWG wiki, and I’ve mentioned it on the #microformats IRC channel. It will be an big uphill battle to get them to reconsider adding rev back to the HTML 5 spec, but I think the low-hanging fruit is in getting the “shorter” rel type added, and I think there’s a good case for adding it. The danger now is in how many early adopters implement rev="canonical" in the meantime. It looks like people are starting to add it, and that worries me a bit.