Inflow - Attract Convert Grow

Inflow: eCommerce Marketing Agency

  • Services
    • SEO
    • PPC
    • CRO
    • Paid Social
  • Clients
    • Case Studies
  • About Us
    • Contact Us
  • Insights Blog
  • Request Proposal
  • Services
    • SEO
    • PPC
    • CRO
    • Paid Social
  • Clients
    • Case Studies
  • About Us
    • Contact Us
  • Insights Blog
  • Request Proposal

Home > eCommerce Digital Marketing Blog > SEO > Technical SEO > Do You Really Need a Fresh Crawl of ALL 3,435,198 Pages?

Do You Really Need a Fresh Crawl of ALL 3,435,198 Pages?

Posted By Everett Sizemore on March 3, 2015

  • 26shares

Is it necessary to crawl an entire enterprise site?

TL;DR. Not usually.

Tell me if this sounds familiar. You’ve adjusted the settings on your crawler tool to throttle the crawl speed, respect robots.txt and robots meta tag directives, etc. You have saved the crawl and restarted it, rebooted your machine, verified with IT that your user-agent isn’t being throttled, performed something similar to a rain-dance to please the crawl gods, used a proxy, done a shot to calm the nerves…

And you just can’t get the &*d D#m! thing to finish crawling the site so you can start your audit!

So you begin to ask yourself, “Do I really need to crawl the entire site, or do I have the data I need already?”

choose your weapon: axe or scalpel
After a certain point, it’s easier to take the hatchet approach and fix major technical issues before doing a content audit or technical SEO audit with a page-by-page scalpel.

In statistics, there is the idea of a reaching a level of “statistical confidence” in which, after a certain point, you aren’t gaining very much additional certainty by increasing the sample size (e.g. polling more people). The same is true when it comes to major technical issues on a website, such as indexable internal search results and account access pages, or endless faceted navigation URLs.

Statistical Confidence Image

Even if the problem isn’t technical but one of scale, at a certain point, the law of diminishing returns comes into play. For example, let’s use a site with millions of unique product pages. It is likely that not all of them should be indexed, in which case, those patterns will show up within the first thousand pages crawled. Outliers can be caught in round two, once you’ve fixed the 1 percent of issues resulting in 80 percent of the superfluous URLs – or whatever the ratio happens to be.

Assuming the site really does have millions of unique product pages that should be indexed (again, that’s doubtful), you still wouldn’t need to crawl every one of them. The top product pages tend to be linked to the most often and are higher up in sequence on category pages. This means they are among the first product pages to be crawled. After a certain point, every new product page URL being picked up is going to be about 99 percent likely to have no external links, no sales or traffic from organic search, no rankings to speak of, no social shares…etc. You don’t need to look at them each individually. You just need to fix the problem that is causing them in the first place, which should be easy to identify after a certain point.

So What Does the Process Look Like?

Diagnose the Big Problem (URL Bloat): When you find yourself waiting too long for a Screaming Frog crawl to finish, pause the crawl, save it and dig around the different Screaming Frog tabs looking for big issues that can be addressed with simple code changes, like the addition of a robots noindex meta tag.

Screaming Frog Exclude List Configuration
1). Identify the issue/s 2). Record & Recommend 3). Filter them out re-crawl

Record and Recommend: Record those issues and your recommendations for fixing them somewhere. It could be Evernote, email, a text file or a deliverable document. Then go into Configuration —> Exclude and write the expressions needed to keep those pages from being crawled. In other words, assume for the sake of saving time, that the issue will be fixed upon implementation of your recommendations and therefore you have no reason to crawl the rest of them.

Recrawl: Either start from scratch or continue on with your crawl, but now limiting the URLs to ones you need to analyze.

eCommerce Content Audits Toolkit
  • 26shares

0 Comments on "Do You Really Need a Fresh Crawl of ALL 3,435,198 Pages?"

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Posts

  • eCommerce Category Page Optimization for SEO & CRO (Conversion) Building links into product category pages and acquiring email addresses prior to purchase are two of the most common roadblocks for eCommerce marketers tasked with increasing traffic and […]
  • eCommerce Shopping Cart Optimization Trends for Checkout Sales & Conversions Inflow’s Conversion Optimization team has compiled and maintains a list of Best-in-Class eCommerce websites. This is part of a series based on the findings from our most recent review of […]
  • Target Audience Case Study: How Gaia Stopped Promoting & Started Connecting The Client: Gaia Gaia, formerly Gaiam TV, is a lifestyle media hub for those embracing and promoting health and growth of their body, mind and soul. The site offers several channels […]
Everett Sizemore - Denver SEO

Everett Sizemore

View Author’s Profile

Related Categories

  • Content Marketing (13)
  • Link Building (16)
  • On-Page SEO (14)
  • SEO Strategy (10)
  • Technical SEO (33)
  • Most Popular Posts:

    eCommerce Marketing Automations Systems Compared
    Technical Mobile Best Practices for SEO and Usability
    Expanding the Horizons of eCommerce Content Strategy
    Thin & Duplicate Content: eCommerce SEO
    5 Ways eCommerce Content Audits Can Increase Revenue
    Want to get content like this straight to your inbox? Subscribe to our weekly content alerts and monthly Inflow Insights newsletter now.

    Categories

    • SEO
      • Content Marketing
      • Link Building
      • On-Page SEO
      • SEO Strategy
      • Technical SEO
    • Paid Advertising
      • Goal Metrics and Analytics
      • Paid Search
      • Paid Search Shopping
      • Paid Social
    • Conversion Rate Optimization
      • A/B Testing
      • eCommerce Page CRO
      • Mobile Conversion Optimization
      • Tools and Plugins
      • Usability
    • Case Studies
    • eCommerce Strategy
      • KPIs and Reporting
    • Digital Marketing Trends in eCommerce
    • Inflow News

    Request a Proposal

    We'll build a custom proposal to meet your goals. Get the process started now.

    Google Premier PartnerInflow is a facebook-certified-creative-strategy-professional Moz Recommended Company Inc 5000 Inflow Clutch Profile
    • Services
      • SEO
      • PPC
      • Conversion
    • Case Studies
      • SEO
      • PPC
      • Conversion
    • Insights Blog
    • Resources
    • More
      • Contact
      • Careers
      • Press Info
      • Privacy Policy
    REQUEST A PROPOSAL
     
    CALL US AT 303-905-1504
    Monday - Friday, 8 a.m. - 6 p.m. (MST)
     
    facebook twitter linked-in linked-in rss

    Send this to a friend