Log File Analysis Basics: How to Analyze Log Files for SEO

What is Log File Analysis for SEO?

Log file analysis has many applications outside of SEO, such as site security. In terms of search engine optimization, the process usually involves downloading the file from your server and importing it into a log file analysis tool, where all the information about every “hit” on the site (whether bot or human) can be analyzed to inform SEO decisions and learn about previously unknown issues.

Log file analysis is an arduous process that frequently results in the discovery of critical technical SEO problems that could be found no other way. Log files contain incredibly accurate data that allow a brand to better understand how search engines are crawling their site and the kind of information they are finding.

Log file data includes a record of the URL/resource that was requested, action taken, time and date, IP of the machine it originated from, user agent/browser type, and other pieces of information.

A server log entry as follows: 188.65.114.122 - - [30/sep/2013:08:07:05 -0400] "Get /resources/whitepapers/retail-whitepaper/ H T T P/1.1" 200 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" — A new server log entry like the one above will be created each time a resource is requested from your website.

One benefit of log file analysis for SEO is to audit where a site’s crawl budget is spent. The higher the authority of the domain, the higher the crawl budget that is allocated by the search engines. While log file analysis can’t impact the crawl budget your site receives from Google, it can help to optimize how that budget is used. Some of the ways this is done include:

Identifying which URLs are crawled the most frequently and optimizing toward those
Identifying client and server errors and implementing fixes
Identifying large/slow URLs and making efforts to increase speed
Identifying orphaned pages that would not show up in a site crawl

Log File Analysis Steps

Because of its large size, log file analysis has always been difficult. There are products out there to make it easier, such as Screaming Frog’s new log file analysis tool, Logz.io and Google’s BigQuery solution, but it is still a long project.

The general process is below, with steps 3 and 4 being the most time intensive:

Begin analysis of data
Filter by user-agent to identify crawl bots
If interested in larger trends, pull multiple days – this will almost certainly mean that a third-party tool is necessary, such as Splunk or Logstash.
- If the desire is to work with Excel, convert the file to a .csv, take a random sample of ~100,000 rows of data, use text-to-columns to format data appropriately, and label rows for ease of use
Pull a day’s worth of data if only interested in a snapshot in time
Export log entries in common log file format from the server
- Filter by request type to find most/least crawled URLs
- Filter by HTTP response codes to find status errors
- Evaluate crawl waste using a combination of all columns
- Etc.
Optional: Expand database
- Import data from Analytics, site crawls, Google Search Console and potentially the CMS
Determine issues
- Look for errors to fix, redirects chains to shorten, orphaned pages that should be linked to or deleted, rogue bots not obeying the robots.txt file

Do You Need to Analyze Your Log Files?

The importance of a log file analysis depends on how mature a website is in its SEO efforts. As previously stated, the level of effort involved in a log file analysis is very high. The effort expended from the person or team doing the analysis is similar to a Content Audit. There are often easier, more pressing issues to address than those that require a log file analysis.

If, however, most of the basic SEO best practices are in place and you’re concerned about site speed, crawl budget and the depth or frequency of various search engine crawlers, log file analysis will be an illuminating project worth performing.