Every Search Engine does have a certain Crawling Budget reserved for every website. This means: Crawl Budget defines the maximum number of pages that Google and other search engines can/will crawl on your domain/website.
Usually, you can say that the bigger your site, the bigger your crawling budget but it will never be unlimited.
You can find a detailed article on the Google Webmaster Blog about crawling budget. In general, you can say that the deeper your website structure, the more budget is consumed. Every bot will just follow a certain amount of levels when it comes to internal site links. There are a few things that affect the crawling budget for your website:
- Faceted navigation and session identifiers
- On-site duplicate content
- Soft error pages
- Hacked pages
- Infinite spaces and proxies
- Low quality and spam content
These are all things you should keep in mind anyway. The better your site performs (and I would like to add the following points to that), the better for your crawling budget:
- Site/Page Speed (because this will give you positive user signals)
- Wasting server resources for your site (makes your site slower)
- The popularity of your website (the more users you have, the more often your site will be crawled most likely)
If your site is getting bigger, it would make sense to add rel=”nofollow” links to certain parts of your websites, like privacy pages, contact forms, archives, author pages (which are usually only a duplicate of the existing content and so on.