Canonicalization. What is that?  Sounds pretty scary.  Right?

It can be if your website is not set up correctly.  The good news is that this is a relatively easy fix once you understand what it is that your fixing.  :)

The main idea of this post is that there should only be one URL for each page on your site. You may not realize this but even though both URLs below take you to the home page of SavvySite.com if the Canonical Formatting is not set up correctly search engines may actually interpret both URLs as representing completely different websites.

Why is Improper Canonicalization Bad For Your  Website?

It may not seem like a big deal, however, not having your domain set up correctly can dilute the equity of your site and can lead the search engines to believe that your site has duplicate content … from your own site!

Dilution of link Authority

First, you lose link authority. If visitor 1 comes to ‘www.savvysite.com and links to that page, visitor 2 lands on ‘http://savvysite.com’ and links to that URL, and visitor 3 lands on ‘http://www.savvysite.com/index.html’ and links to that page, Googlebot sees three links to three different pages, and applies 1 ‘vote’ to each one.

These three links could have sent three authoritative signals to Googlebot for my site’s home page. Instead, they’re split into three weaker individual votes for three different pages. It’s as if Ross Perot or Ralph Nader were sitting in front of my site, siphoning off votes. It’s link love mayhem.

If SavvySite.com was set up so that it’s home page ‘lived’ at one unique URL – ‘http://SavvySite.com’. Then all 3 visitors would have linked to that page, and Googlebot would instead apply all three votes to a single page.

Don’t Make it Hard for Search Engines to Crawl Your Website

Search engines frequently crawl your site, looking for new content, however, if they find multiple pages that have have multiple URLs then visiting search bots won’t waste time tracking down all of those different versions. Search engines allocate resources for each crawl and no one knows exactly how, but it’s safe to say Googlebot won’t just wander around your site until its found every page. At some point, it gives up and leaves.  That’s time they could spend crawling other unique pages, instead.

So fewer unique pages of my site end up in the search index, and I have fewer chances to rank.

Duplicate Content

Having one page available via multiple URL’s can also mislead search engines by making them believe that you are stealing someone else’s content when it actually your own!

Let’s say that you are a real estate agent and you just posted an article on ways that a seller can improve their chances of selling their home.  If your website’s canonical formatting is set up correctly then the search engines will see the URLs below as two different sites with the same content!

You worked hard for your keyword-rich relevent content.  The last thing that you need is to have a search engine think that it is duplicate content and penalize you for it.

  • Set your preferred domain
  • Specify the canonical link for each version of a page
  • Indicate your canonical (preferred) URLs by including them in a Sitemap
  • Indicate how you would like Google to handle dynamic parameters

Fix That Canonicalization!

You can avoid the heartbreak of bad canonicalization, or at least minimize it, by doing a few simple things:

Use 301 redirection to ensure that your home page is only found at one URL.

– Link consistently to your home page from within your own site. Use a single URL for your home page. Don’t mix in instances of ‘http://savvysite.com/index.html’ with ‘http://savvysite.com’. If you aren’t doing this properly right now, a quick change may have a big impact on SEO.

– Don’t use tracking IDs in internal site navigation. A lot of sites add stuff like ‘?source=blog’ in their navigation. That lets them use their analytics reports to track user movement within, to and from their site. Instead, learn to use your web analytics referrer and navigation path reports. If you must use tracking IDs, change your software to use a hash mark (a ‘#’ sign) instead of a question mark. Search engines ignore everything after the hash, so you’ll avoid confusion.

– Don’t use tracking IDs in organic links from other sites. If you get a link on another site, and want it to help with your SEO, don’t put a tracking ID in that, either.

– Be careful with pagination. Many sites have pagination, where visitors can click a 1, 2, 3 etc. to jump to later pages in search results, product lists or articles. That’s fine, but make sure that the each page has a single URL. For example, if page 1 of the article is ‘http://savvysite.com/article.html’ when I click the article link from the home page, make sure that the number ’1′ in the pagination takes me there, too, instead of to ‘http://savvysite.com/article.html?page=1′.

– Set up preventative redirects. Make sure that ‘http://Savvysite.com’ 301 redirects to ‘http://www.savvysite.com’.

– Exclude ‘e-mail a friend’ pages. Most content management systems that have ‘e-mail a friend’ options direct the user to a unique page that has the same form and content. But every instance of that page has a unique URL like ‘ID=123′, to tell the server which product or article to forward. It’s canonical higgeldy-piggeldy. Use robots.txt and the meta robots tag to exclude these from search engine crawls.

What about rel=canonical?

The canonical tag is a neat little gadget that’s supposed to let you tell search engines the correct URL for any page. So, by adding <link rel=”canonical” href=”http://www.savvysite.com”> to any page, I could tell visiting search bots to index just that version, and to direct all link authority to that one URL. It sounds ideal.

It’s not. First, Yahoo! and Bing don’t yet have confirmed support for it. Second, you can’t rely on tags of this nature, as search engines may change their minds later. Google’s done it. So don’t stake your SEO strategy on it. Third, why not do it right the first time? In addition to SEO benefits, a canonically clean site should:

  • run faster
  • present fewer maintenance headaches
  • place less load on server and bandwidth resources