We came across an interesting article for clients considering a significant site-wide change. Written by Glenn Gabe, it covers the ways to crawl a staging server before a major rollout. And if you’ve been there before or will be in the future, you recognize the incentive from a debugging perspective.
It’s important to complete a thorough site crawl before a major change goes live because it can identify SEO issues preemptively. But external entities (like crawlers) are commonly blocked by staging sites, making it difficult or impossible to crawl those sites.
That’s why we will cover four solutions for crawling staging servers, ranging from basic authentication to creating a custom user agent.
Method 1: Basic Authentication
Most commercial crawling tools such as Screaming Frog and DeepCrawl are able to easily circumvent this type of block. Getting through these barriers requires requesting authentication from the server host and simply entering a given username and password.
Method 2: VPN Access (Virtual Private Network)
Another roadblock emerges when you keep your staging site behind a protected firewall. To bypass this issue, simply provide VPN access so the staging server can be crawled. One downside of this is that you may not be able to use enterprise-level crawlers.
Method 3: Whitelist an IP Address
There are many clients that might redirect all users to a common login, which then redirects the user to a specific staging server. These redirects can cause problems by throwing off crawling tools. To solve this problem, you can whitelist a specific IP address for a finite amount of time, thereby providing access to the staging server and the ability to complete a crawl.
Method 4: Create a Custom User Agent
Using top crawling tools, a custom user agent can be setup. Once this user agent, or GSQiBot, is setup, that specific user agent can be whitelisted while all other access to the staging site is blocked. Voila!
From this article you should take away the importance of conducting a site crawl on a staging site. Once you gain access to the staging site, and you might have to be flexible to do so, perform both enterprise level and surgical level crawls. After one crawl, report any errors to the client’s development team so the errors can be fixed, and then crawl the site again. It is imperative to be diligent with these crawls because it is always better to catch a problem in the development stage as opposed to the first day that the new site is rolled out.