Google employees in the Search Off the Record podcast discussed the crawling budget and talked about what affects content indexing.
According to Google employee Gary Ilsh, the term "crawl budget" was coined by the community of specialists. At Google itself, "there was nothing that could mean a crawl budget on its own."
"Since people were talking about it, we tried to come up with something, some definition. We worked with two, three, or four teams and tried to come up with at least a few internal metrics that could be combined into what users called the "crawl budget," says Gary Ilsh.
According to Ilsh, part of the calculation for the crawl budget is based on practical considerations. For example, how many URLs on one site can Googlebot crawl without overloading the server.
As Gary Ilsh noted during the discussion, over 90% of sites don't need to worry about the crawl budget. This problem is infrequent, but for some reason, it is discussed in SEO courses and at conferences.
Next, Google employees talked a little about how the search engine indexes content. In particular, when deciding whether to index new content on the site, the algorithms rely on the information available on the resource. For example, if a company launches a blog on the main site, then Google can decide whether it needs much information from this blog or not, based on the entire site.
An important criterion for whether content will be indexed is the quality signals of the main site. One such quality signal is user interest. Google understands that a site is interesting if other sites link to it and users discuss it.