in

Crawl Budget

Advance SEO Technique

When you publish a new page on your site, will the search engine recognize it and show in the search result? Not necessarily.

According to the research, Google misses about half of the total pages on a large websites.

This is because Google has their quota on scanning all the pages on the web, which is what we called as crawl budget.

If you still remember the earlier lesson about how the search engine works, crawling is the entry point for the sites to enter Google search result.

It doesn’t matter how many content that you published everyday. if Google is limiting their crawl budget on your site, your content won’t be showed in the search result.

In this lesson, you’ll going to learn exactly what is crawl budget, how it works, how to maximize the usage of crawl budget, and how to make Google increase their crawl budget on your site.


What is Crawl Budget?

Crawl budget is the number of pages that Google bot crawl and index on a website with a given timeframe.

There are billion of new pages everyday on the internet, and it is impossible for Google to scan every single of them within a timeframe.

As the result, Google set a limit on how much time their spider bot can spend time crawling for each single site.


How is Crawl Budget Determined?

Crawl budget is determined by 2 elements: Crawl Capacity Limit & Crawl Demand.

how is crawl budget determined

Crawl Limit

Google bot is intended to crawl website without overloading it server. In order to do that, Google set a crawl capacity limit to prioritize all the important content without overwhelming the server.

The crawl capacity limit can go up and go down based on below factors:

  • Crawl Health: if the site server respond quickly, the crawl limit goes up. Reversely, if the site server is too slow, the crawl limit goes down. Controllability: Yes
  • Limit Set: Webmasters can optionally reduce the Googlebot crawling rate on their site. However, setting the higher limit won’t increase the crawl rate. Controllability: Yes
  • Googlebot Capacity: There are too many pages on the web, and Googlebot are not infinite. Therefore, Google prioritize what shall they prioritize. Controllability: No

Crawl Demand

Google typically spends as much time as necessary crawling a site, given its size, update frequency, page quality, and relevance, compared to other sites.

The crawl demand can go up and go down based on below factors:

  • Perceived inventory: Googlebot will try to crawl all or most of the URLs that it knows about on a site. If a site has many duplicates URL, this will waste the crawl budget on a site. Controllability: Yes
  • Popularity: Pages that are more popular on the Internet tend to be crawled more often by Googlebot. Controllability: Yes
  • Staleness: As Google want to keep the information fresh to their users, the Googlebot will attempt to recrawl and pick any changes. Controllability: No

When You Should Worry and Optimize for Crawl Budget?

When a website is still small, Google has no issue in scanning all the pages on the website.

However, as the website grow bigger with a lot of pages, the search engine will have trouble of finding them all.

So, if you have a big size site, it is important to always check your site crawl health and see the ratio of how many pages has been crawled by Googlebot vs those aren’t.

The way to do it by using Google Search Console.

Click the “Coverage” and then go to the “Excluded” section.

Google Search Console Check Crawling Pages

From here, you’ll see how many pages that your site’s page not crawled by Googlebot.

Pay attention on crawl ratio on your site.

The ideal crawl ratio should be at least 60% of the total pages.

You can get the number by dividing the total page of your site with the total number crawled by Googlebot.

If your crawl ratio is less than that, you should start working out on optimizing the crawl budget.


How to Optimize for Crawl Budget?

As you can see above, most of the crawl budget factors are controllable.

Meaning, there are something you can actually work on to optimize for crawl budget: utilizing the current crawl budget to the max and in the same time increase crawl budget on your site.

Please be noted that we won’t explain each point in detail since you have learnt it on the earlier lesson of our SEO academy.

1. Improve Site Speed

Google says that “Making a site faster improves the users’ experience while also increasing crawl rate.”

As mentioned earlier, Google set time to crawl for each website on the internet.

If your site load slowly, it is wasting the valuable Googlebot time spend on your site, and means lesser page will be crawled.

If you want to refresh your knowledge about how to improve site speed, check out our previous module here.

2. Internal Linking

Googlebot prioritize any page that has a lot of internal and external links that are pointing to it.

However, if you just published a new content or page on your site, getting a backlink could take some time.

By using internal linking, it help to send Googlebot to different pages on your site, including your new content.

If you want to refresh your knowledge about how to do internal linking properly, check out our previous module here.

3. Boost Page Popularity

Google states: “URLs that are more popular on the Internet tend to be crawled more often to keep them fresher in our index.”

This explains why sometimes a new page took long time to be crawled by Googlebot, simply because the page views are not as much as the existing content on your site.

Therefore, here are the ways to boost your new page popularity instantly:

  • Share on the social media.
  • Activate the newsletter or RSS for any new content published on your site so that your subscribers are notified when there is new content.

4. Block Unnecessary Part of Your Site

Crawling unnecessary pages on your site can eat up the Googlebot’s crawl budget on your site.

Therefore, block it using robot.txt to maximize the crawl budget limit.

If you want to refresh your knowledge about what you should block on the robot.txt, check out our previous module here.

5. Reduce Low Value URL on Your Site

Having many low value added URLs can negatively affect the site’s crawling rate.

Here are some of the list of low value URL:

  • Duplicate content
  • Low quality and spam content

Remove all the duplicate content, very thin content, and pages that do not have any value.

6. Increase Site’s Authority

There’s a strong correlation between site’s authority and crawl budget.

When a website has high authority, it basically telling the search engine that your contents are credible to be shown to their users.

As the result, the search engine will increase the crawl budget on the high authority site.

If you want to refresh your knowledge about website authority, check out our previous module here.


Final Thought

Crawl budget optimization is not needed if you follow all the SEO practice in our academy.

Basically, if you do maintain the SEO health correctly, there should be no issue with your site’s crawl budget.