My conferences on Lanyrd

Currently browsing urls


Specifying rev=”canonical” With HTTP

Sun, 12 Apr 2009 6:15 UTC

It looks like there’s a lot of momentum behind rev=”canonical” now—and all built up within the span of about forty-eight hours. So, while I disagree with the use of “canonical” for semantic reasons and rev for the potential of mass misunderstanding and improper implementation, I think I’ll bite the bullet on this one for now, but time will tell what the community ultimately decides.

So, I’ve decided to eat my own dog food. Like Simon Willison, I bought my own short URL, and now all blog posts on my site include a link tag that covers all the bases similar to the following:

<link rev=”canonical” rel=”alternate shorter” href=”http://brtny.me/382” />

And all without any special WordPress plugin. For the ID, I’m simply using the WordPress post ID, which means that, if I change to another blogging tool in the future, I will need to maintain my indexes. Note that I’ve implemented it with both the popular rev=”canonical” and my preferred rel=”alternate shorter”.

Chris Shiflett posts about the need for an HTTP header. I think this is also a good idea, for the same reasons he mentions. Chris’s original recommendation is for an X-Rev-Canonical header, but Stephen Paul Weber mentions the Link header. I think the Link header is the right way to go about this, since it offers an HTTP analogue of the HTML link element.

Link is an IETF proposal by Mark Nottingham that is still in the Internet-Draft stage, going through the IETF process for standardization, but it’s current, which is a good thing. If the community chooses to use it, though, it’s interesting to note that that it states:

Applications that don’t merit a registered relation type may use an extension relation type. An extension relation type is a URI that, when dereferenced, SHOULD yield a document describing that relation type.

This means that one should not simply put rev=canonical as the value for rev. Instead, it would be more proper to use something like rev=”http://revcanonical.appspot.com/#canonical” until “canonical” is accepted as a registered IANA relation type. I wonder if the same is technically true of the rel and rev attributes in HTML.

For now, I’ve decided to cover all bases in my HTTP headers, as well, and you can see this by making a HEAD request to any blog post on my website, as seen in the following:

HEAD /archives/summarizing-my-revcanonical-argument/ HTTP/1.1
Host: benramsey.com

HTTP/1.1 200 OK
Date: Sun, 12 Apr 2009 05:24:34 GMT
Link: <http://brtny.me/382>; rev="http://revcanonical.appspot.com/#canonical"; rel="alternate http://revcanonical.appspot.com/#shorter"
X-Rev-Canonical: http://brtny.me/382
Content-Type: text/html; charset=utf-8

Comments 10 Comments »  Permalink Permalink  Tags Tags: , , , , , , , , ,


Summarizing My rev=”canonical” Argument

Sat, 11 Apr 2009 17:49 UTC

I think my central argument against rev=”canonical” in my previous post was lost due to the fact that my post was so long. So, I’ll try to summarize my points in a very concise way.

Let’s get away from the argument about rev not being in HTML 5. That’s not the point. If a case can be made to add it back into HTML 5, then fine. Though, I still think it’s a confusing attribute, and perhaps not needed, but that is not what I am arguing. I am arguing that rev=”canonical” is not the best way to indicate to clients a shorter form for the current URL. Rather, I think using rel=”alternate shorter” is a better way.

What I would like to focus on is why people seem to think rev=”canonical” is better than rel=”alternate shorter”. Is it? If so, why?

Consider this obviously fictional and tongue-in-cheek dialogue between a client and a requested resource:

“Hi, Resource! How’s it shakin’ today? I like what you have to offer, but what’s this rev="canonical" link you have? Am I not already requesting your canonical URL?”

“Oh. Hi, Client. Yes, this is my canonical URL, but that’s another URL that refers to my canonical URL.”

“Why would I want to know what other URLs refer to your canonical one? Is not the point of having a canonical URL that the other URLs don’t matter? Besides, they all point here anyway.”

“True, but that’s a shorter URL that you might want to use instead of my canonical one.”

“Well, how was I supposed to know that? All it tells me is that it’s another URL that points to your canonical one.”

“It’s implied.”

“Implied?”

“Yeah. A community agreed that rev="canonical" would mean that the URL identified by the href is a shorter form for my canonical one.”

“Why didn’t they just make it explicit with something like a rel="alternate shorter" attribute instead being ambiguous about it?”

And that’s the crux of my argument.

A couple of other notes…

On my previous post, Matt Cutts makes an excellent point about the danger in allowing documents to claim canonical-ness over other URLs. But if we do like Google did with rel=”canonical” and restrict rev=”canonical” to specify only URLs that are in the same domain as the canonical one, then we lose the value of being able to specify shorter domains, as we often do with shorter URLs. In fact, in Simon Willison’s post about his own rev=”canonical” implementation, he mentions that he bought a new domain name for his links.

Bradley Holt makes an interesting point on Twitter about the use of the alternate keyword. Alternate is supposed to refer to another representation of the current document with the same content. Can it also refer to another representation of the URL that points to the current document?

Comments 5 Comments »  Permalink Permalink  Tags Tags: , , , , , , ,


A rev=”canonical” Rebuttal

Sat, 11 Apr 2009 4:23 UTC

There’s a lot being said about rev=”canonical”. Others have already explained what it is and stated the arguments for it, so I won’t go into all of that, but I would like to offer a rebuttal—to play devil’s advocate, so to speak—in hopes that we’ll all slow down and think about what we’re doing before we jump all-in and start implementing something that may not be a good standard for the Web, leading to more problems down the road.

First, let’s look at rev=”canonical” from the perspective of a purist. HTML 5 does not include the rev attribute on the link or a tags. It was dropped, and there has been a lot of discussion about this, so to reintroduce it at this point is a fruitless effort. The community has already decided against it. Why bring it back to the table?

Furthermore, I thought we had moved beyond encouraging people to break the standards. Rather, we want to encourage people to follow standards and not make their own. Creating your own standards leads to differentiation and specialization in client applications (browsers), and some browsers will end up supporting the new features, while others will not. The frustrations faced by client-side developers attempting to program for multiple clients have taught us that this is not desirable.

That said, if the microformats and HTML 5 communities are interested in revisiting this and considering rev for inclusion in HTML 5, then this is solved, but I still think there are some grievous pragmatic problems with rev=”canonical”.

The first is this: rev is too damned confusing to understand. If it takes a two-hour conversation on IRC to explain what rev=”canonical” means, then something is wrong. Developers should be able to understand the semantic meaning of rev=”canonical” at first sight and without the need to dig through multitudes of documentation and blog posts to grok the concept of rev.

I think this confusion will ultimately lead to problems that render the value of rev=”canonical” as meaningless to clients and search engines. What rev=”canonical” really means is this: “I (the URL of the current document) am the canonical URL for that URL over there (the one specified in the href of the link tag).” What I think will happen, though, is that people will misunderstand this, as previous usage of rev has shown. This misunderstanding could lead to the following improper implementations.

Let’s say, for example, that the current document’s canonical URL is http://example.org/2009/04/10/a-rebuttal-for-rev-canonical. A shortened form might be http://example.org/revcanonical. Knowing this, the correct implementation for rev=”canonical” would be:

<link rev=”canonical” href=”http://example.org/revcanonical” />

Thus, when a client reads http://example.org/2009/04/10/a-rebuttal-for-rev-canonical, it sees this link and understands that the current URL is the canonical URL for http://example.org/revcanonical.

I foresee that implementers could easily misunderstand this and implement it like this:

<link rev=”canonical” href=”http://example.org/2009/04/10/a-rebuttal-for-rev-canonical” />

In this case, the link is self-referential, and the value of rev=”canonical” is lost, since no short form is specified. However, this won’t lead to any problems, since the default URL for a document not containing rev=”canonical” is the original URL, according to RevCanonical.

What will lead to problems is when people misunderstand rev, thinking it to mean rel—after all, rev isn’t in the HTML 5 spec, so maybe rel will work (or so the thinking may go)—so they implement it like this:

<link rel=”canonical” href=”http://example.org/revcanonical” />

This means something entirely different. This tells Google (and maybe other search engines) that the canonical URL of the current document is the value of the href. It is the inverse of rev=”canonical”. This might not lead to a problem quite as drastic as the linkrot apocalypse, but it might lead to inaccurate URLs being stored in search engines and could negatively affect your SEO.

Finally, earlier I said that rev=”canonical” means “I am the canonical URL for that URL over there.” In this case, “that URL over there” just happens to be a shorter one, but semantically, that’s not what rev=”canonical” means, and here is another problem with this approach.

A canonical URL is just that: canonical. It is the primary URL used to refer to the resource. All other URLs referring to the resource are secondary and unimportant (except insomuch as they direct us to the primary URL). In fact, there could be infinite secondary URLs that direct clients to the canonical one. By specifying rev=”canonical”, you assign importance to the link identified by the href, but you don’t express why it is important, except to say that it is another URL that points to this canonical one. In fact, you could have hundreds of rev=”canonical” links for any particular document. How would an implementer choose the proper one to use as the short URL?

This is why I think better semantics are necessary. I see no need to specify that “this is the canonical URL for that URL over there.” If things are set up properly, then “that URL over there” will properly tell search engines and clients that it isn’t the canonical URL by responding with a 301 or 302 redirect. Instead, the canonical URL should tell clients that there is a preferred shorter form of the URL that may be used if desired, and I think the best way to do that is with a rel attribute, specifying an alternate URL form for the current document. The RevCanonical folks also identify this form:

<link rel=”alternate shorter” href=”http://example.org/revcanonical” />

All of the aforementioned problems are solved with this usage: it doesn’t break the HTML 5 standard, it isn’t confusing and can be understood by developers without the need for long discussions, and it doesn’t imply that the current document is attempting to identify all URLs for which it is the canonical one.

So, why does RevCanonical specify two forms that serve the same purpose, and why do they advocate for the one that violates the HTML 5 spec and is confusing as hell to explain when there is already a form (suggested by themselves) that doesn’t violate the standard and is easy to understand?

Next steps: I’ve already added the “shorter” rel attribute to the WHATWG wiki, and I’ve mentioned it on the #microformats IRC channel. It will be an big uphill battle to get them to reconsider adding rev back to the HTML 5 spec, but I think the low-hanging fruit is in getting the “shorter” rel type added, and I think there’s a good case for adding it. The danger now is in how many early adopters implement rev=”canonical” in the meantime. It looks like people are starting to add it, and that worries me a bit.

Comments 20 Comments »  Permalink Permalink  Tags Tags: , , , , , , ,