Tuesday, 13 October 2015

Are hashbang URLs a recommended practice?

No, hashbang URLs are not a recommended practice. But it's important to be absolutely clear about what this means, so read on.

http://example.com/#foo is a hash URL. The #foo portion of the URL is never sent to the server, and will cause the page to automatically scroll to the first element with an id of "foo" (or the first <a> element with a name of "foo"). This is a perfectly good thing, and is a recommended practice for linking to specific sections within a page.

However, http://example.com/#foo could also be interpreted by JavaScript to indicate a particular state. Perhaps the "foo" state means "make an Ajax request to get the dictionary definition of the word 'foo' so we can display it to the user".

Now you've got a problem, because while this logic will work just fine as long as the JS runs, nothing at all will happen if the JS doesn't run. Furthermore, if you ever decide to change your URL structure to use a real URL like http://example.com/foo instead, then you'll either have to break all existing hash URLs that link to your site, or you'll need to keep that hash-handling JavaScript on your page forever so it can redirect users to the new URL.

In addition, search bots always ignore the hash portion of a URL when indexing a page. Always, with one exception: that one exception is that the GoogleBot (and only the GoogleBot) has some special and convoluted logic that allows to it recognize hashbang URLs like http://example.com/#!foo.

But there's a massive, massive caveat: the hashbang itself doesn't do a damn thing for you. All it does is tell the GoogleBot "hey, this website claims to support the Google Ajax Crawling Scheme". But the Ajax Crawling Scheme requires some pretty complicated server-side logic as well. You don't get it for free just by changing a # to a #!. So unless you've actually implemented the server-side logic necessary to support the Ajax Crawling Scheme, that #! in your URL is quite possibly damaging your Google rankings instead of helping them.

Furthermore, since Google is the only search engine that currently supports the Google Ajax Crawling Scheme (hint: the "Google" in the name of the scheme means "we just came up with this nonsense ourselves and didn't bother actually asking any other search engines if they thought it was a good idea"), your hashbang URLs will only be indexed by Google, even if you jump through all the hoops required to make this work properly.

So, to sum up:

  • "Hash" URLs and "hashbang" URLs aren't the same thing, although "hashbang" has unfortunately become the generic name for "hash URLs that trigger JavaScript-based logic".
  • Hash URLs are never sent to the server, so they're useless without JavaScript if you depend on them to trigger application logic.
  • Since hash URLs require JS, you're doomed to either break all existing URLs or maintain a JS URL handler forever if you ever decide to change your URL scheme.
  • "Hashbang" URLs don't automatically make your page indexable by search engines. You still need to do a lot of server-side work to make that happen. Even if you support the full Google Ajax Crawling Scheme correctly, that only helps you with Google. You're still screwed with the other search engines.

In short: relying on hash URLs for application logic should be an absolute last resort. If at all possible, you should avoid it.

Hey, if there's anybody I can help out there, just let me know. Thanks ;)

No comments:

Post a Comment