What Google says about indexable site search results:
“Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.”
A lot can change in ten years when it comes to SEO. A lot can change in a week. So should you allow your internal search results to get indexed? Going back to these guidelines in 2017, we see possible gray areas with regard to Automatically Generated Content and pages with Little or No Original Content. When it comes to their guidelines on Doorway Pages, however, the answer is unequivocal.
Here are some examples of doorways: Having multiple domain names or pages targeted at specific regions or cities that funnel users to one page Pages generated to funnel visitors into the actual usable or relevant portion of your site(s) Substantially similar pages that are closer to search results than a clearly defined, browseable hierarchy
Google Quality Guidelines
Hey Google Copyeditors:
But Those Pages Make Money!
What if, like many eCommerce sites, you check analytics to find that those internal site search results pages account for a big chunk of revenue?
You’d be fired for blocking these URLs in the Robots.txt file, right?
We’ve heard this more than once, and most of the time further analysis uncovered a more nuanced truth.
First of all, what percentage of total revenue on the site is coming from people finding site search pages directly from search results?
Using last-click attribution (i.e. user did a search and clicked on your search result) drill into Organic Search Traffic by Landing Page, apply a filter to see only internal site search URLs. What is the total revenue coming from those. Is it even 1% of total revenue from organic search? 2%? If it’s any higher than that you may have a content quality problem on product pages, and/or a crawlability problem on standard taxonomy/category pages.
Now that we know what percentage of revenue for the site as a whole is comprised of Organic Search Traffic landing on these internal site search URLs, we know the absolute worst case scenario.
The thing is: In nearly every case you’ll find organic search traffic and revenue will increase for many of the other pages on the site. Not only that, but they’ll probably convert better.
Here’s an exercise based on an simplistic example site:
You have 1,000 internal search results indexed (e.g. site:domain.com inurl:catalogsearch) = ~1,000
You have 100 internal site search result URLs receiving at least 1 Transaction as a Landing Page from Organic Search.
The other 900 indexed site search URLs either show 0 Transactions as a Landing Page from Organic Search, or do not show up in Analytics at all.
Now calculate the percentage of indexed internal search result URLs compared to every URL indexed by Google (site:yourdomain.com). In this example we find that:
10% of Internal Site Search URLs account for 100% of the revenue from organic search into this page type.
Although the example is simple, the results are similar to what we find in the real world. Often it will end up being around 80% – 99% of traffic and revenue going into 1% – 20% of indexed internal search result pages.
This means you can scale a custom solution. You don’t have to lose that revenue!
You could 301 redirect these to a “real” category URL within the site taxonomy or to curated, optimized product grid page. Or you could keep them where they are in the unlikely event that the internal site search results convert higher for organic search traffic than taxonomy pages, or even custom landing pages.
Just because you’re getting traffic and revenue out of these low-quality pages that Google has advised webmasters not to index doesn’t mean you should leave them alone. Do the research and take your findings to the powers that be. Let them come to the same conclusion.
There are technical SEO specifics of how to get these pages out of the index. which depends on other issues, like crawl budget and navigation paths to discover products. For example, if you’re going to redirect some or leave them live, you’ll have to “Allow” them in the Robots.txt file, or at least wait until the redirects have been seen by Google before blocking them along with the others.
Get in touch if this sounds like a project you’d like to explore. We’re here to help.