Your website indeed search engines crawl scan the web and on each page they follow the links present on it which allows them to move from one site to another but before browsing a web page search engine robots like google will read the robotstxt file to find out if they have authorization to visit it where is the robotstxt file the robotstxt file for your site must be located at the root of your website for example for the web horspiste site httpswwwwebhorspistecom it is located here httpswwwwebhorspistecomrobotstxt it is necessary that your robotstxt file be named careful its case sensitive so dont put capital letters in its name so
no robotstxt robotstxt or anything else if a crawler does not find a robotstxt file at the root of your site then it will assume that it does not have one and will proceed to crawl the site the advantages and disadvantages of the robotstxt file advantages spiders crawling robots have a predefined allocation of time or resource for each website this is what we call the crawl budget if your site has crawl budget problems to avoid wasting it on nonimportant pages such as the login page thanks or shopping cart of an ecommerce or others you can indicate to search engines not to explore these pages it can also prevent your sites internal search results pages from being crawled indexed or appearing in search results it can also prevent duplicate content from being detected by google disadvantages the robotstxt file cannot
prevent indexing in fact google can index a page without having crawled it if a significant number of links point to this url it will include it in the index it will just ignore the content of this page it will place a title according to the anchors and for the meta description it will display this meta description disallow robots txt if you want to block the indexing of a page prefer to use the meta tag robots and remove disallow because otherwise google will not be able to know that the said page is in noindex to block all crawlers meta name robots content noindex to block googlebot meta name googlebot content