A slow leak isn’t a runaway train. It’s not capable of causing overnight meltdowns. Instead, it does its damage over time, capitalizing on the fact that it’s nearly imperceptible due to its natural camouflage. But there it is: a silent killer buried deep within historical trends, year-over-year performance charts and other big-picture snapshots. This is exactly how an overzealous Robots.txt file can drain your site of organic traffic in perpetuity — unless you know what to look for and how to stop it.
The Robots.txt file lives in the root directory of your site. It’s responsible for regulating site access permissions on your domain for search engines and crawlers. In simpler terms, it tells these important visitors where they should — and shouldn’t go — on your website. Clearly mapping these boundaries is essential to your website’s health.
The important thing to note when adding directories to configure your Robots.txt file, and in turn closing off specific areas of your site from being crawled, is that you may inadvertently block off areas you do want crawled because certain content is only available through a specific directory. This can cause Google to stop short at a door because it can’t get in, although the content behind the door would be valuable to the visitor (and in turn, your organic performance).
Additionally, this same aspect applies to blocked resources (images, documents, etc.) that are hidden or externally referenced through parameters that are blocked within the Robots.txt file. These files are not crawled and you miss the benefit of having crawlers access them as a consequence. That’s why, when setting up or tweaking the Robots.txt file, it’s important to always test the configuration afterward to make sure the correct content is being accessed and processed so you’re not robbed of any valuable organic traffic.
For a case study that details the risks of having a misconfigured Robots.txt file, read this feature by Glenn Gabe over at Search Engine Land.