If you are a company with a multi-language and multi-country approach (what I call the global reach, local touch model), then localizing websites which share a common language presents a particular set of challenges. These challenges have come into sharp focus since Google announced new markup in early December, the rel-alternate-hreflang tag, so that it can more easily serve users what it might otherwise deem duplicate content, in the right language and for the right country.
Duplication is an issue for same-language country websites because much of their contents are the same throughout. Depending on where the user is, Google will serve up only one version of that duplicated content; the challenge lies in serving the right content, for instance, the .com/uk/ product page in the UK rather than the same product page that sits on .com.
Now, Google has long used several methods to determine which content to serve to users in different countries. You’re pretty safe if your content is on a country code Top Level Domain (ccTLD); for Google, that’s the strongest indication that your content is targeted to users of that country. You’re also pretty safe if your content has been translated (and not machine translated – that would be considered spam) into another language. And Google does look at whether the content itself has been localized: for spelling, local currencies, local contacts and addresses, prices, etc. There are other factors as well: IP addresses, and geotargetting tools in Webmaster central, for instance.
It can get complicated
But what if you don’t have a ccTLD (many companies don’t)? Or if you have a very large number of sub-directories or folders – and little translated content? Of, quite commonly, your .com site dwarfs all other sites in terms of traffic and inbound links? In these instances, it’s quite possible that Google will not serve the page you intended for your specific language and country.
The rel-alternate-hreflang tag seems designed to address these issues. As Pierre Far, Webmaster Trends Analyst at Google, explained in a blog post by Gemma Birch on the Multilingual Search blog:
The aim of rel-alternate-hreflang is to help us show the most relevant page on your site to our searchers based on language and, optionally, country. Forming page-level relationships like this goes well beyond simply geotargeting whole sites to specific countries, and allows for smarter handling.
The word is still out as to how widely adopted this markup will be. Apparently, few companies have yet to implement it (3M has – check the comments of the Google Webmaster post. I tested a couple of their pages and it does seem to work, though their sites are on their own ccTLDs).
One issue seems to be that, although the solution does seem to work, it also seems to be….a lot of work, since it involves adding a lot of code to an awful lot of pages. Understandably, many companies are still in a wait and see mode, especially as Google is still refining its approach. But it’s definitely something that companies who are having trouble getting the right content to users in the right markets should look into.
A few quick tips
Use country-code top-level domain names. Yes, you need to purchase them, and it does require more infrastructure, but then again, if you plan to expand globally, do you really want other companies to squat your domain name?
Use geotargetting settings. You can specify country using webmaster central tools, but these are inappropriate if you are providing your site in multiple languages, without specifically targeting countries.
Make it easy to change country/language throughout the site if you think there’s a chance people may end up on the wrong page.
Do not machine translate your content and expect Google to find it. Google will interpret this as spam. It suggests offering your users the ability to machine translate the content themselves; at least, things are clear and users are not misled as to the expected quality of the content.
Localize your content. Use geographically specific information on your pages: contacts, prices, currencies, spelling, different keywords…Your content should be 30% different.
I’ve been reading up quite a bit on this subject recently, because it’s been an issue on recent assignments, and I’m not an expert. If you want to take it further, here are some resources you might want to look at:
Videos from Matt Cutts, Head of Google’s Webspam team
Does translated content cause a duplicate content issue? No, unless you just throw it into Google Translate. Auto-generated translations are likely to be considered as spam.
Is the same content posted under different TLDs a problem? If you’re on different ccTLDs, no, but if you’re using folders or directories, it may or may not. It depends on whether you have exactly the same content on a 45 folders or in 4 or 5 ; if you’ve localized the content somewhat to account for different spelling, currencies, contacts, prices, etc.
Contenu dupliqué : Définition, types de contenus dupliqués, solutions (in French; use Google Translate if necessary
For really technical discussion on implementation, I found this Google+ discussion hub hosted by Google’s Pierre Far.