Originally published at https://rnye.tech
Hi all, today's post details an interesting problem that faced a website thanks to undocumented Google crawl behaviour that hit us suddenly. The website used Azure Front Door for global CDN/WAF capability but only had one origin - hosted in Azure in the UK South region. This should have been fine given it's a UK-centric site that receives very little global traffic - that is until Google starts crawling you from the West Coast of the US suddenly. Let's dive in.
The problem: what Google Search Console was telling us
All was well with the site from the UK, cache hit ratios were in the 80%+ range, response times were generally rapid even if cache was missed. Google typically crawled the site thousands of times a day. Then suddenly a ticket came in detailing a drastic drop-off in mid-April. Average response times were never great according to Google (700ms) but they'd suddenly jumped to almost double that number (1.3s) with seemingly no explanation. There was absolutely no denying the correlation between crawl requests and average response time, and indeed this is documented behaviour - if response times increase, Google backs off. They claim it's to prevent overloading the site, and I believe that, but I also feel it's likely to ensure they're not wasting their crawl compute resources on long-loading pages. Either way, Google Search Console offered zero explanation as to why.










