Google’s original Anti-Spam Hero Matt Cutts stated in a webmaster video that 25-30 percent of the Web is duplicate content. Clearly, this is quite a challenge when indexing the Web, even for the world’s largest search engine. Are you making Google’s job of indexing and ranking your web pages easier by ensuring you only have single URL paths to each page of content on your website? Consider your tracking URLs, filtered/faceted URLs, etc.
Duplicate URL paths are often the cause of duplicate content, which was discussed above. This section focuses specifically on the URL-path part of the problem (as opposed to manufacturer product descriptions and other duplicate content issues).
How to Assess
Dan Kern wrote a fantastic guide to identifying and fixing “thin” and duplicate content on eCommerce sites. Check it out for more detailed assessment instructions. For now, you just need to get a general idea of whether or not this is a problem for you. Check out the areas below. If anything concerns you, give your team a few example URLs and ask what they’re doing about them. Some of them may already be fixed in various ways, including rel canonical tags, robots.txt disallows and robots noindex meta tags.
If product category pages, blog archives and other sections of your site have multiple paginated pages of results, you should look into whether there is a duplicate content issue and how it is being handled. Is there a rel canonical tag on the paginated pages? Are they using Rel Next and Rel Prev tags to indicate the relationship between pages? If these pages are blocked in the robots.txt file, you may be keeping Google from crawling your site efficiently, which cause a variety of problems. Likewise, if they’re using the “nofollow” meta tag on those pages. Note: Use of “noindex” would be fine, as long as it’s not on the main page (i.e., the first page in the series).
Look into this as much as you feel comfortable doing. All you really need are a few example URLs for paginated pages that you can send to the development team to ask how they are being handled.
Sorting and Filtering
This is a group of features that have similar purposes and cause similar issues. It encompasses everything from dynamic, faceted navigation to re-sorting the results based on ratings or price. It is likely your store incorporates one or more of these features. While the topic is too large to go into detail here, we’ve provided links to a couple of resources below.
Though it was published 2011, this post by Mike Pantoliano on the MOZ Blog remains one of the best resources on the Internet for getting a solid understanding of the issues and potential solutions when building faceted navigation (that doesn’t suck).
Google: Faceted Navigation Best (and 5 of the worst) Practices
This post from Feb. 12, 2014, on the Google Webmaster Central blog is written with SEOs and developers in mind. It would be a good resource to send your team just to make sure you’re following Google’s guidelines unless there is a good reason not to.
To sum up the issue of sorting and filtering category/search results on an eCommerce site, you’ll want to make sure that Google has a clear, accessible path to all of your products (not including sitemaps). At the same time, you need to make sure that every category doesn’t have a bunch of different URLs indexed by Google with the same basic content (with different results showing).
When in doubt, express your concerns to the SEO and development team and seek their input.
Products in multiple categories (or directories)
How content management systems (CMS) handle URL structures where products are placed in multiple categories of a taxonomy can get tricky. For example, if a product is placed in both category A and category B, and if category directories are used within the URL structure of product pages, then the CMS could potentially create two different URLs for the same product.
As one can imagine, this can lead to devastating duplicate content problems for product pages, which are typically the highest converting pages on an eCommerce website. Common approaches to fixing this are:
Use root-level product page URLs
This type of URL structure would look like http://www.store.com/product-page/ or http://www.store.com/product-page.html. Lack of a clear directory structure makes it more difficult for search engines to understand. The flatter site architecture does provide a shorter click-path to each product page, but we recommend at least separating them from the root by one directory (see next section).
With some eCommerce systems (e.g. Shopify) this is going to be the best option without incurring significant costs (which wouldn’t be worth it). Shopify is a very strong eCommerce system, one of three that we often recommend (along with Magento and Bigcommerce).
You just need to get a little more creative in how the “product” section of your website can be segmented in your analytics packages, including Google Search Console, Bing Webmaster Tools, Google Analytics, Omniture, etc. This is because it has no folder separating it from the rest of the site.
Shopify does give store owners the option of putting the category structure in the URL, but we don’t recommend that for reasons stated above. Here’s a discussion about it on Shopify’s site if you want to learn more about your options for that system.
Use /product/ URL directories for all products
This format offers the ability of all products to be tracked in Analytics software. Troubleshooting and SEO analysis become easier with the use of advanced Google searches like site:YourStore.com inurl:product to see how many product pages are indexed. Is that more or fewer than your total SKU count?
It also makes the most logical sense, and search engines are all about logic.
Pro Tip: Use product URLs built upon category URL structures, but ensure each product page URL has a single, designated canonical URL.
It may not be possible for you to change the product URL structure without a significant investment. One option may be to use rel canonical tags to indicate which version of the multiple types of product URLs available you would like Google to display in their search results.
Even when there are easy checkboxes to click in the eCommerce system’s back-end, existing websites need to be sure that all the old URLs will be “301 redirected” to the new version. By all means, new websites should get it right the first time. But existing eCommerce websites that have been around for more than a few months, should weigh the costs/benefits of rewriting all of the URLs.
http://website.com vs. http://www.website.com
Technically, your developers may say that “www” is a superfluous subdomain. Let’s not get into this debate. For now, just know that you need to choose one or the other. That choice should be represented in your website’s navigation, internal links, branding (e.g., product packaging, ads, etc.), redirects and/or rel canonical tags.
If you’re not sure how this is being handled and can access both versions of the URL in your browser, bring it up with your developers and seek input.
Http vs. Https
As with www vs. non-www, whether you choose to use the secure HTTPS or the standard HTTP protocol is up to you and your unique circumstances. It certainly does seem that the Web is moving toward HTTPs for a variety of good reasons. However, moving an entire website to HTTPs does come with the risks associated with any website migration.
To make a long story short, your main site should only be accessible from one or the other. This excludes all shopping cart or checkout pages, which should always be only accessible from HTTPs.
Trailing slash / vs. non-trailing slash
For now, just know that you need to choose one or the other. That choice should be represented in your website’s navigation, internal links, branding (e.g., product packaging, ads, etc.), redirects and/or rel canonical tags.
Rel canonical tags
This tag can solve a lot of problems, but it can create some big ones too if not used properly. The big thing to avoid is a “self-referencing rel canonical” on all pages. In other words, if URL A and URL B each have a rel canonical tag that say it’s the canonical (primary, main, true, etc.) version of the page — Google gets very confused by the conflicting statements.
Curious to find out if your site has damaging duplicate URL paths? Get access to our comprehensive SEO Risk Assessment Workbook and know where your eCommerce site stands.