Fixing Index Bloat: Why Deleting Website Pages Is Great for SEO in 2021


42 Comments
  1. Hi Chris, I have been exploring the idea of pruning useless pages and reducing index bloat.

    I had a question. Once you have identified such pages, do you simply add a noindex tag to them? Are these pages deleted? Redirected?

    BTW, really nice article- I’ll be using the Cruft Finder tool 😉

    • Hey Gurbir… Good question. There are lots of ways to “prune” a page once you’ve identified it isn’t a good current target for your SEO efforts. Our older article on Moz about pruning eCommerce sites is a good place to start. It also contains a link to again a bit older but still relevant overview of our content audit process. We plan to update both these articles soon.

  2. Its extremely good and very helpful for me.Thanks for sharing this great post.

  3. Chris, Thanks for the article. I had a question about low-value pages and whether they should be kept. Some argue that the larger a website, the better. With user engagement now a ranking factor, it seems logical that poor performing pages be removed.

    • Hi John,
      As with many things SEO, I’d say it depends on a number of factors. Also ‘removing’ a page can mean a couple different things. For example, eCommerce sites wouldn’t want to 404 product pages if the product was still in stock. Low performing product pages are typically candidates for a noindex tag. There could also be the case of near-duplicate product pages if a product were available in multiple sizes and/or colors, and the CMS didn’t have the proper functionality to combine multiple URLs into one. In this case we might use canonical tags if there weren’t sufficient search volume to justify different pages for every size/color.

      If the poor performing page is strategic content (blog post or similar), we’ve had success by using the “Remove, Improve or Consolidate” strategy. If there are multiple pieces of content on the same or similar topic, they can sometimes be consolidated into a single, more authoritative post. We also often improve posts, which include things like improving keyword targeting and expanding the length of the content (don’t forget to update the DateModified meta tag). Finally, we have rolled out a strategy to 404 pages that just aren’t performing (little to no traffic over a period of time) – for one client we 404’d 90% of their blog posts that received little traffic and saw a boost in rankings, traffic and revenue to their foundational content (category & product pages).

      Hope this helps!

      • Hello Chris, about your 404 strategy, I thought that one should 410 these pages with no traffic, to basically tell Googlebot to not visiting them again? I’m about to do some much needed pruning on my website, with a mix of noindex and 400 errors. My website has 800 pages and I estimate I should delete 200 pages and noindex 100 more. Thank you!

  4. I am not sure whether to use a redirect or not, but I am going to consolidate some pages on my website.

    • Hi Janice,
      It’s usually a good idea to 301-redirect URLs that have been consolidated to the URL to which you consolidated – especially if those pages were indexed and/or received any traffic. Don’t forget to also update your internal links so they don’t point to redirected URLs!

  5. Hi there,

    I have a client with a few thousand blog pages of thin content, 300 words or less. According to GA the vast majority do not get any traffic (or something like one visit a year).

    The client is an agency with only about 50 substantive pages, and thousands of “deadwood” blog posts, some dating back 10 years.

    About 1% of these thin blog pages get minuscule traffic and yet another 1% have a small handful of links pointing at them, but none of any quality

    So I’m just curious if you think it makes more sense to noindex all of these thin pages or to remove and 410 them.

    Might be nice to retain a few of the pages that have minimal activity but I don’t think it will reallly help or hurt much.

    Would you foresee any issues of taking this agency site from a few thousand indexed pages down to about 100 or less?

    Greatly appreciate your thoughts here and keep up the quality content!

    Thanks!

    • Hi John, good question. Typically we pull a years worth of GA data AND backlink data from AHrefs or some backlink tool. It’s important to ensure any pages you’re going to prune – whether 404/410 or using the noindex tag – don’t have backlinks pointing to them and haven’t driven (much) traffic / revenue.

      A few thousand low/thin quality content pages sounds ripe for the pruning. There may be value in a couple other options…

      – Republishing posts with enhanced content and better keyword targeting
      – Consolidating similar posts into a more authoritative post
      – Combine multiple ‘date based’ URLs into more of an evergreen page, for example, if there’s a post about the XYZ conference from 2015/2016/2017 etc. making it more evergreen could build the strength of the page(s) over time
      – Remove (404/410/noindex); noindex is a popular option for eCommerce sites where there are many products that aren’t receiving traffic, or maybe have the manufacturers description and need rewritten – but the product is still for sale. If the page doesn’t have links and isn’t driving traffic, noindex doesn’t make as much sense. Just get rid of it.

      But to answer your overarching question – these pages that aren’t driving any traffic aren’t providing the site with much value, and very well could be holding the ~100 quality pages back.

      I agree, it could be worth keeping some pages that drive traffic – and I would make sure those visits aren’t leading to leads/revenue – if so you’d want to keep them… it just depends on how much traffic you’re willing to give up. Hope this helps!

  6. Hi Chris
    I run a page where we sell self-written articles by users. We have around 26.000 papers online now, and I see of course that google does not like ALL of them. I would say that about 3000-4000 do not receive visits from google. We look for unique content and plagiate check everything of course, but some content is just not very interesting in googls eyes (sometimes not interesting just now, but later)
    Do you think I should simply NOINDEX pages that for example did not get a view in the last 3 months?

    I sometimes see, that a page that had no interest/visitors for lets say 2-3 years, suddenly receives a lot more visitors from google (lets say the thematic becomes interesting again, like an article about the “2004 hurricane” so I am nervous about “DELETING” these articles completely. I see the internet also as an “archive” for older things, not just an archive about the last 2-4 years. (website is 12 years old now)

    Any suggestion what to do with pages that have unique content but receive no visitors for several months/years? I could NOINDEX the article page, but could put them together on a “group of more articles”, where 40-50 of them are put together (but that overview page does not make sense at all, just overview pages of noindexed articles)

    • Hi Bodo,
      Here are some thoughts and questions…

      – How are you making money, from advertising? Do you sell products / services? Have these 3-4k pages that do not receive any traffic generated you any profit?
      – What guidelines are users given for these self-written articles? Is keyword targeting considered at all? Can the keyword targeting on the posts be improved?
      – Not knowing what the site and articles are about, or how the site is structured, one possibility could be to “topic bucket” the articles, putting them into categories and sub-categories. Then, build out pages with unique content that target these topics and link to all the related articles.

      (copied from above reply to John)
      A few thousand low/thin quality content pages sounds ripe for the pruning. There may be value in a couple other options…

      – Republishing posts with enhanced content and better keyword targeting
      – Consolidating similar posts into a more authoritative post
      – Combine multiple ‘date based’ URLs into more of an evergreen page, for example, if there’s a post about the XYZ conference from 2015/2016/2017 etc. making it more evergreen could build the strength of the page(s) over time
      – Remove (404/410/noindex); noindex is a popular option for eCommerce sites where there are many products that aren’t receiving traffic, or maybe have the manufacturers description and need rewritten – but the product is still for sale. If the page doesn’t have links and isn’t driving traffic, noindex doesn’t make as much sense. Just get rid of it.

      Let me know what other questions you have. Hope this helps!
      Chris

  7. Hi Chris,

    Does this tool work only on PHP based ecommerce sites? I am trying to run against java based ecommerce sites(on platforms like IBM WebSphere, Hybris, ATG) and it doesnt seem to work as expected. Thoughts?

    Thanks,
    NP

  8. Hi, this post is great and the comments have given me some actionable tasks to start cleaning up old content on my site, GameSkinny.com. We are a news and reviews site for video games. As such, we have a wide swath of content types: news, reviews, lists, opinions, and features. We have staff writers and freelance writers and have been around d since 2012.

    Some of the content, as has been suggested, could be consolidated, such as X beat horror games for X year, etc. However, some pieces, such as some news pieces for example, can’t really be updated as they were, say, on a release of a get or a the closing of a studio, etc. Some have very few total views, say in the 100s. Should these be culled completely or updated as best they can be?

    Over the past two months, we’ve updated our interlinking strategy as well, writing to get content and writing copy with interlinking to other articles in mind. I think it is too soon to say if the strategy has been working, but it seems logical.

    I suppose my main question is this: as a media site that has an extensive archive, what would be the best way to cull underperforming articles so Google doesnt penalize the site? Is it worth culling old, out of date news pieces, for example? What would you suggest? Thanks!

    • Hi Jonathan,
      There’s a lot here! And without more information it’s tough to give one specific recommendation.

      The pages you mention certainly could be worth pruning. There are sometimes a benefit to older news style pieces – if people are searching for that type of historical information – if they aren’t, they are probably good candidates fro pruning. You could also try to breathe new life into the pages by updating them, adding content, consolidating similar pages into one and updating the LastUpdated meta tag when you do. You could also “topic bucket” the pages – perhaps by game name – and build ‘hub’ style pages for specific games or groups of games. This would help the pages become more crawl-able, and in conjunction with things mentioned above, could help drive more traffic to those pages. But if the articles are old, out of date and nobody is searching for the content they contain, yes they are likely worth pruning.

      Be sure the pages don’t have links before you prune! Hope this helps.
      Chris

  9. My question is when I deleted my post search console show 404 technical issue. how I overcome this issue. can I disavow the all link and remove it. I don’t find any option. can you please answer this brodlye.

    Your post was really helpful.

    • Hi Jemes,
      It’s OK to have 404s showing in search console, this won’t raise any red flags. No need to disavow anything. You should, though, make sure the URLs returning a 404 weren’t receiving much traffic or have any inbound links. If they do, 301-redirect them to the closest related page.

  10. Does google penalize your website if you simply delete posts without deindexing first?

    • not at all. if you deleted posts without redirecting, meaning the user would receive a 404 error, that is totally normal and acceptable assuming it was on purpose.

  11. Great post, exactly what I was looking for. I’m in the position where I followed that advice of writing a blog post every week or so for an e-commerce site and years later have lots of similar posts on the topic that aren’t great. So I’m just culling out all the pages that don’t get traffic or have external links. My question is should I stagger out removing the pages over a period of time or should I just ‘unpublish’ them all at once? Would be approx 60-70 pages out of 200 total. Thanks!

    • Hi Mike,
      If the pages aren’t getting any traffic or conversions there shouldn’t be too much risk in pruning the pages at once.
      Hope that helps!
      Mike

  12. I have a “deal blog” that was running 20-25 deals per day of very thin, time sensitive deals. It’s a wonder I rank for anything on Google, actually.

    Of the 24k posts I am guessing that 23,500 need to be deleted.

    Is there a tool that can help me make a list of which posts to delete that you know of?

    I have a developer that can maybe even do it in one fellow swoop if I can tell him which ones i want gone.

    Any ideas, greatly appreciated!

    • Hi Kate,
      It sounds like what you are looking for is more of a content audit tool than a CRUFT finder tool. Your coupon pages we probably wouldn’t consider true CRUFT but if they weren’t getting traffic and have no incoming links they need to be pruned. I’d recommend conducting a site crawl that pulls in GA, GSC, and link data all into one spreadsheet. Then I would set a threshold for traffic and links and prune any page that is under that threshold. You can find our basic process outlined here: https://moz.com/blog/content-audit that should help.

      Thanks!
      Mike

  13. Hello Chris. This is my first time on your blog. I was looking especially for this topic.
    Here is my situation: I have an affiliate website with around 300 pages. I think at least 30% of those pages never received search engine traffic. I am thinking about deleting all of them or merging them with other pages.

    What would you do in my case? In either way I plan to 301 redirect them. However I see you 404d the pages of your client. Why choose 404 instead of 301? I see only disadvantages doing 404.

  14. I’d like to mention that my pages are all above 700 words long. I don’t think this counts as thin content.

  15. Also I have this weird problem: My site is authoritative enough. Whenever I write about a topic, the article will appear in the top50 at least. For any keyword. Even of the hardest two word keyword. However I have an article that doesn’t tank for it’s keyword. Which is really weird because content-wise it’s the highest quality post from all. It’s long enough. I simply cannot find the problem why my article doesn’t rank. The keyword difficulty is easy enough.

    What is even weirder: another one of my pages ranks on #44 place. But that page doesn’t even have the keyword in it, only the company name. What do you think the problem could be? Should I completely rewrite the article?

    Also there is no duplicate content or noindex or anything else. Page speed is also fine. Page is in sitemap and indexed in search console. I am only linking internally to the page that is not ranking.

    Sorry for offtopic

  16. Hi Chris, You talk about low quality pages negatively affecting ranking. So is it the case that low quality pages with a very low page authority due to no traffic or backlinks and poor content negatively affects the overall domain authority? Would de-indexing low quality pages with a low PA help to increase the overall DA, therefore increasing ranking in google searches?

    • Hi Phil,
      If a page has no traffic or backlinks AND it has poor content it is a good candidate for removal from the index. Doing this does not increase your DA, but it does help improve your overall site quality and crawl budget.

  17. I have just removed lots of pages from the index which are no longer relevant (which I have deleted from the site, old category pages and paginated versions).

    no other big changes have happened to the site in months.

    However, since doing this, within a couple of weeks we have just had a 30% reduction in organic traffic. Is it possible that fewer pages in the index has triggered a threshold for an algorithmic penalty? for example, we always had spammy links arrive at our site (like everyone does), but never disavowed because there was never an impact on rankings.

    Perhaps now we have fewer pages in the index, the spammy links make up a larger proportion in ratio to the pages in the index and therefore an algorithmic penalty has occurred?

    Any help on this appreciated

    • Hi Robin,
      I don’t think that is quite how it works re: spammy links making up a larger proportion. If you removed paginated pages from the index it is possible Google no longer has a click path to your products. In addition, if you deleted pages that had good links and or traffic that could also lead to a decline in traffic.

  18. Hi Mike,

    Great article. I really enjoyed reading this. Which brings me to a question…

    If the website is showing offers. (In my case – cottages website). The business has seasonal and off-season prices. and each package has an individual page. For e.g.: Winter deals – 20% off.

    Since it’s seasonal, does it mean the page should be unpublished when the season has passed or still keep it on the site all the time?

    Than you!

    • Hi Karan,
      Ideally you wouldn’t need two different pages for this. A single “evergreen” page where the content changes would probably be the best approach. If that won’t work, you’ll probably want to leave both pages published and indexable year round.

  19. Hi Mike

    Great article, thank you! I have identified thin content posts on my website going back 10 years. I use wordpress and I have switched the posts to draft. Is that a good idea, or not?

    Many thanks, Julien

  20. Hi Rob,
    We typically choose 404 when there isn’t any valuable content on the page to merge with another page. Otherwise it probably makes sense to merge pages and 301 redirect.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About The Author

Chris Hickey (Alumni)

With 10 years of SEO experience under his belt, as well as experience in software engineering, Chris is uniquely equipped to wear a lot of hats and to give honest, knowledgeable advice on SEO tactics and their ramifications.

View Author’s Profile

Request a Proposal

Send this to a friend