Generate robots.txt files to control how search engines crawl your website.
robots.txt is a file placed at the root of a website that tells search engine crawlers which pages they can and cannot access. It's the first file crawlers check before indexing your site.
Place it at the root of your domain: https://yourdomain.com/robots.txt. It must be accessible at this exact URL for search engines to find it.
If you don't want your content used for AI training, you can block GPTBot (OpenAI), CCBot (Common Crawl), and Google-Extended. This won't affect regular search indexing.