Editor’s Note: This post was originally published in November of 2013. While a lot of the original content still stands, algorithms and strategies are always changing. So our team has updated this post for 2018 and we hope that it will continue to be a helpful resource.
For eCommerce SEO professionals, issues surrounding duplicate content, and also thin/low-quality content, can spell disaster in the search engine rankings. As Google, Bing and other search engines become more sophisticated, they are rewarding websites who present only quality, unique content to their search bots for indexation. In this resourceful guide, we dig into a wide variety of duplicate content scenarios commonly found on eCommerce websites.
Table of Contents
What is Duplicate/Thin Content & Why Does it Matter?
Google began taking duplicate, scraped and thin content very seriously on February 24th, 2011 when they launched their first Panda algorithm update. According to their Content Guidelines, Google defines duplicate content as:
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:
- Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
- Store items shown or linked via multiple distinct URLs
- Printer-only versions of web pages
According to their Affiliate Programs page, Google offers insight as to what they consider “thin content” within the context of affiliate websites, which can also be applied to eCommerce websites:
Google believes that pure, or “thin,” affiliate websites do not provide additional value for web users, especially if they are part of a program that distributes its content to several hundred affiliates. These sites generally appear to be cookie-cutter sites or templates with no original content. Because a search results page could return several of these sites, all with the same content, thin affiliates create a frustrating user experience.
Some examples of thin affiliates include:
- Pages with product affiliate links on which the product descriptions and reviews are copied directly from the original merchant without any original content or added value.
“Wait a second,” you might ask. “Didn’t you just create duplicate content by copying and pasting this text from Google’s own web pages?”
Not so fast. Let me explain. Duplicating portions of content is a natural part of the web. Whether a journalist is block-quoting text from another article (like I did above) or an eCommerce site is using the same product name as hundreds of other eCommerce websites, a small bit of duplicate content is inevitable.
What we should be worried about is having a large number of web pages on our websites that are mostly duplicate content, or product pages with such short product descriptions that the content can be deemed thin, and thus, not valuable (to neither Google, nor the reader).
Our job as website publishers and content managers is to ensure that we are providing the most robust information possible to our readers. When we take this approach, we are rewarded by Google since this meets their quality guidelines.
But not all duplicate content is editorially created. There is a wide range of technical situations which can lead to duplicate content issues for which Google is likely to penalize your website. We’ll dive into many of these situations within this chapter so that you’re fully prepared to avoid duplicate content across your entire website.
We can help you spot and fix issues on your website that are harming your overall ranking. Contact us here.
Internal Duplicate Content (On-Site)
Duplicate content can exist internally on an eCommerce site in a plethora of ways, both due to technical and editorial causes. We’ll dive into some of the more popular instances where internal duplicate content can rear its ugly head.
Internal “Technical” Duplicate & Low-Quality Content
Canonical URLs, help search engines understand that there is only a single version of the page’s URL that should be indexed no matter what other URL versions are rendered in the browser, linked to from external websites, etc. Canonical URLs are extremely important in the case of tracking URLs, where tracking code (i.e. – affiliate tracking, social media source tracking, etc.) is appended to the end of a URL on the site (i.e. – ?a_aid=, ?utm_source, etc.). They are also very helpful in fine tuning indexation of category page URLs on eCommerce websites in instances where sorting, functional and filtering parameters are added to the end of the base category URLs to produce different ordering of products on a category page (i.e. – ?dir=asc, ?price=10-, etc.). Ensuring that the Canonical URL (in the <head> of the source code) is the same as the base category URL will prevent search engines from indexing these duplicate URLs.
|URL/Page Type||Visible URL||Canonical URL|
|Base Category URL||http://www.domain.com/page-slug||http://www.domain.com/page-slug|
|Social Tracking URL||http://www.domain.com/page-slug?utm_source=twitter||http://www.domain.com/page-slug|
|Affiliate Tracking URL||http://www.domain.com/page-slug?a_aid=123456||http://www.domain.com/page-slug|
|Sorted Category URL||http://www.domain.com/page-slug?dir=asc&order=price||http://www.domain.com/page-slug|
|Filtered Category URL||http://www.domain.com/page-slug?price=-10||http://www.domain.com/page-slug|
It might also be beneficial to disallow crawling of the commonly used URL parameters via the /robots.txt file, in order to maximize crawl budget. Example:
User-agent: * Disallow: *?dir=* Disallow: *&order=* Disallow: *?price=*
User-agent: * Disallow: *?sid=*
Shopping Cart Pages
When users add products to their cart on your eCommerce website, and views their cart, most CMS systems implement URL structures that are specific to the shopping cart experience. They might have “cart,” “basket,” or some other word as the unique identifier within these shopping cart URLs. It’s important to realize that these are not the types of pages that search engines wish to index, so identifying them and then setting them to “noindex,nofollow” via a meta robots tag or X-robots tag (and also disallowing crawling of them via the /robots.txt file) will help prevent search engines from indexing this low quality content.
Internal Search Results
Internal search result pages are produced when someone conducts a search using an eCommerce website’s internal search feature. They have no unique content, only repurposed snippets of content from other pages on your eCommerce website. Google’s own Matt Cutts has clearly stated that they do not want to send users from their search results to your search results (source). Instead, they want to send users to true content pages (product pages, category pages, static site pages, blog posts and articles). This is an extremely common issue with eCommerce websites. Many CMS systems do not set internal search result pages to “noindex,follow” by default, so a developer will need to apply this rule in order to fix this problem. It’s also recommended to disallow search bots from crawling internal search result pages within the /robots.txt file once all of your internal search result pages are removed from the index or before any of the pages do get indexed. It’s an easy fix, yet an important one since it can lead to ranking penalties under Google’s Panda algorithm if there are too many internal search results in Google’s index.
Duplicate URL Paths
How CMS systems handle URL structures where products are placed in multiple categories of a taxonomy can get tricky. For example, if a product is placed in both category A and category B, and if category directories are used within the URL structure of product pages, then the CMS could potentially create two different URLs for the same product.
As one can imagine, this can lead to devastating duplicate content problems for product pages, which are typically the highest converting pages on an eCommerce website. Common approaches to fix this are:
- Use root-level product page URLs (unfortunately this removes keyword-rich, category-level URL structure benefits and also limits trackability in Analytics software).
- Use /product/ URL directories for all products (which at least offers grouped trackability of all products in Analytics software).
- Use product URLs built upon category URL structures, but ensure that each product page URL has a single, designated canonical URL).
In some instances, this situation can also arise with sub-Category URLs where the products displayed might be exactly the same, or close to it. For example, a “Flashlights” sub-category might be placed under both /tools/flashlights/ and /emergency/flashlights/ on an Emergency Preparedness eCommerce website, and have mostly the same products. Taxonomy opinions aside, the same approach can be applied in these situations as with product pages. Also, ensuring that robust intro descriptions exist atop the category pages would help ensure that each similar sub-category page has unique content.
Product Review Pages
Many CMS systems come with built-in review functionality. Oftentimes, separate “review pages” are created to host all reviews for particular products, yet some (if not all) of the reviews are placed on the product pages, themselves. This can create duplicate content between the product pages, themselves, and the corresponding product review pages. These “review pages” should either be canonicalized to the main product page or set to “noindex,follow” via a meta robots or X-robots tag. The canonicalization method is preferred, just in case a link to a “review page” occurs on an external website, which will pass the link equity to the product page.
It’s also critical to ensure that review content is not duplicated on external sites when using 3rd party product review vendors. For a deep dive into this topic, please read Product Review Vendors—Solutions to Fit Your eCommerce SEO Needs.
WWW vs. Non-WWW URLs & Uppercase vs. Lowercase URLs
Just as the Post Office would consider 123 Race Avenue and 123 Race Street different home addresses, search engines consider http://www.domain.com and http://domain.com different web addresses. Therefore, it’s critical that one version of URLs is chosen for every page on the eCommerce website. 301 redirecting the non-preferred version to the preferred version is the recommended solution to avoid these technically created duplicate URLs, per Google.
Tip: Google also allows webmasters to set up both the www and non-www version of domains within Webmaster Tools, and to set the preferred domain.
Uppercase and lowercase URLs need to be handled in the same manner. If both render separately, then search engines can consider them different. It’s important to choose one format and 301 Redirect one version to the other. We have a helpful article that offers instruction on how to do this: How to Redirect Uppercase URLs to Lowercase URLs Using Htaccess.
I noticed there was some confusion around trailing slashes on URLs, so I hope this helps. tl;dr: slash on root/hostname=doesn’t matter; slash elsewhere=does matter (they’re different URLs) pic.twitter.com/qjKebMa8V8
— John ☆.o(≧▽≦)o.☆ (@JohnMu) December 19, 2017
Trailing Slashes on URLs
Similar to www and non-www URLs, search engines consider URLs that render both with a trailing slash and without, to be different URLs. As an example, duplicate URLs are created when URLs such as /page/ and /page/index.html, or /page and /page.html, render the same content. It is especially problematic when /page and /page/ show the same content since, technically speaking, these two pages aren’t even in the same directory. Common approaches to fixing this problem are to either canonicalize both to a single version or 301 redirect one version to the other.
HTTPS URLs: Relative vs. Absolute Path
HTTPS (secure) URLs are typically created after a user has logged into an eCommerce website. Most times, search engines have no way of finding these URLs. However, there are instances where this is possible, such as when a logged in Administrator is updating content and navigational links. In this scenario, it’s common for the Administrator not to realize that embedded URLs include HTTPS instead of HTTP in the URLs. When relative path URLs (excluding the “http://www.domain.com” portion) are also used on the site (either in content or navigational links), it makes it all too easy for search engines to quickly crawl hundreds, if not thousands of HTTPS URLs, which are technically duplicates of the HTTP versions. The most common solutions to fix this consist of using absolute path URLs (including the “http://www.domain.com” portion) coupled with ensuring that canonical URLs always use the HTTP version. Using 301 redirects in these cases could easily break the user-login functionality, as the HTTPS URLs would not be able to be rendered.
Internal “Editorial” Duplicate Content
Shared Content Between Products
It’s easy to take shortcuts with product descriptions on eCommerce websites, especially with similar products. However, consider that Google is judging the content of eCommerce websites similar to regular content sites. That alone should be enough to make a professional SEO realize that product page descriptions should be unique, compelling and robust–especially for mid-tier eCommerce websites who don’t have enough Domain Authority to compete with bigger competitors. Every little bit counts. Sharing short paragraphs, specifications and other content between product pages increases the likelihood that search engines will decrease their perception of a product page’s content quality and subsequently, ranking position.
Category pages on eCommerce websites typically include a title and product grid. This means that there is no unique content on these pages. The common solution to combat this is to add unique descriptions at the top of category pages (not the bottom, where content is given less weight by search engines) that describes what types are featured within the category. There is no magic number of words or characters to use, however the more robust the content is, the better chance the page will be able to maximize traffic from organic search results (due to long-tail keyword traffic). A benchmark of 100-300 words is common. It’s important to understand screen resolutions of your visitors and ensure that the product grid is not pushed below the fold on their browsers. Doing so could limit user discoverability of the product grid upon visiting the category page.
Tip: Intro descriptions on category pages offer a great opportunity to build deep links to related sub-category pages, related article content that may exist on the site, and popular products that deserve attention and link equity.
Home Page Duplicate Content
Every SEO should know that home pages typically have the most amount of incoming link equity, and thus serve as highly rankable pages in search engines. What many SEOs forget is that a homepage should be treated like any other page on an eCommerce website, content-wise. Always ensure that unique content fills the majority of home page body content, as a homepage consisting merely of duplicated product blurbs offers little contextual value to search engines to rank the home page as highly as possible for target keywords in search engines.
Tip: Online marketers also commonly use the homepage’s descriptive content in directory submissions and other business listings on external websites. Ensure that unique content is provided to these external websites instead. If this has already been done to a large extent, rewriting the home page descriptive content is the easiest way to fix the preexisting issue.
External Duplicate eCommerce Content (Off-Site)
Duplicate content that exists between an eCommerce website and other eCommerce websites (and potentially even content websites) has become a real pain point in recent years. As Google clearly moves towards ranking websites more based on inbound link metrics (such as Domain Authority), websites with less inbound link equity are finding it extremely difficult to rank well in search engines when external duplicate content exists. Let’s dive into some of the most common forms of external (off-site) duplicate content that prevent eCommerce websites from ranking as well as they could in organic search.
Manufacturer Product Descriptions
When eCommerce websites copy product descriptions, supplied by the product manufacturer, and place them on their own product pages, they are put at an immediate disadvantage. In the search engines’ algorithmic analysis, these websites aren’t offering any unique value to users, so they choose to rank the big brand websites (who have more robust, and higher quality inbound link profiles), who may also be using the same product descriptions, higher instead. The only way to fix this is to embark upon the extensive task of rewriting existing product descriptions in addition to ensuring any new products are launched with completely unique descriptions. In our experiences, we’ve seen lower-tier eCommerce websites increase organic search traffic by as much as 50-100% by simply rewriting product descriptions for half of the website’s product pages–with no manual link building efforts.
For eCommerce websites whose products are very time-sensitive, meaning they come in and out of stock as newer models are released, a better approach can be to simply ensure that new product pages are only launched with completely unique descriptions. This ensures that internal staff time is used most wisely, and the highest ROI is received from these efforts. Rewriting the description for a product, which is going to be removed from the website in the near future, typically provides less return on investment than ensuring new products have unique descriptions for their full lifespan on the website. These are important considerations to take into account when planning out a product rewrite project.
Other ways of filling product pages with unique content include multiple photos (preferably unique photos, if possible), enhanced descriptions that offer more detailed insight into product benefits, product demonstration videos (users love videos), schema markup (to enhance SERP listings) and user-generated reviews.
Staging, Development or Sandbox Websites
Time and time again, Development teams forget, give little consideration to, or simply don’t realize that testing sites can be discovered and indexed by search engines, oftentimes creating exact duplicates of a live eCommerce website. Luckily, these situations can be easily fixed through different approaches:
- Adding a “noindex,nofollow” meta robots or X-robots tag to every page on the test site.
- Blocking search engine crawlers from crawling the sites via a “Disallow: /” command in the /robots.txt file on the test site (don’t use this if your “duplicate” content has already been indexed).
- Password-protecting the test site, to prevent search engines from crawling it.
- Setting up these test sites separately within Webmaster Tools and using the “Remove URLs” tool in Google Search Console, or the “Block URLs” tool in Bing Webmaster Tools, to quickly get the entire test site out of Google and Bing’s index.
SEO Tip: want to update, (eg. noindex) a set of pages quickly? Create a HTML list with all URLs, Fetch as Google in Search Console -> Crawl this URL and its direct linkss #seo pic.twitter.com/s5SDRFVHRF
— Jan-Willem Bobbink (@jbobbink) December 5, 2017
When search engines already have a test website indexed, using a combination of these approaches can yield the best results. One approach is to add the “noindex,nofollow” meta robots or X-robots tag, remove the entire site from search engines’ indexes via Webmaster Tools, and then add a “Disallow: /” command in the /robots.txt file once the content has been removed from the index.
For good reason, eCommerce websites see the value in extending their products onto 3rd party shopping websites in order to extend their potential sales reach. What many eCommerce website marketing managers don’t realize is that this is creating duplicate content across these external domains. Oftentimes, an eCommerce website’s own products on 3rd party websites will end up outranking its own product pages when products are fed onto 3rd party websites with more authoritative inbound link profiles.
Consider the popular scenario where a product manufacturer, with its own eCommerce website (to sell its own products direct to consumers), feeds its products to Amazon to greatly increase sales. This scenario is highly plausible for revenue reasons. From an SEO perspective, serious problems have just been created, as Amazon is one of the most authoritative websites in the world and the product pages on Amazon are almost guaranteed to outrank the product pages on the manufacturers eCommerce website. Some may view this is as revenue displacement, but it clearly is going to put an in-house SEO’s job, or an SEO agency’s contract, in jeopardy when organic search traffic (and resulting revenue) plummets for the eComme