Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt is a plain text file at the root of your domain (yoursite.com/robots.txt) that tells search engine crawlers which pages they can and cannot access. It's the first file most crawlers request when visiting your site.

Question 2

Where do I put robots.txt on my site?

Accepted Answer

robots.txt must live at the root of your domain, accessible at yoursite.com/robots.txt. It must be served with Content-Type text/plain. On Next.js, place it in the public folder or generate it dynamically with app/robots.ts.

Question 3

Does robots.txt block Google from indexing a page?

Accepted Answer

No — robots.txt blocks crawling, not indexing. If other sites link to a blocked page, Google can still index it without content. To truly prevent indexing, use a noindex meta tag or HTTP header on the page itself (and don't block it in robots.txt, or Google can't see the noindex).

Question 4

Should I block AI crawlers like GPTBot and ClaudeBot?

Accepted Answer

It depends on your goal. If you don't want your content used to train LLMs, block them. But note: blocking AI crawlers also means your content is less likely to be cited in AI search results like ChatGPT, Perplexity, or Claude. Many sites choose to allow AI crawlers specifically to be discoverable in AI answers. The 'Block AI crawlers' preset here covers the major training bots (GPTBot, ClaudeBot, Google-Extended, CCBot, etc.).

Question 5

What does 'User-agent: *' mean?

Accepted Answer

The asterisk (*) is a wildcard meaning 'all crawlers.' Rules under 'User-agent: *' apply to every bot unless that bot has a more specific set of rules elsewhere in the file. You can target a specific crawler by name (e.g., 'User-agent: Googlebot').

Question 6

Do I need to include Sitemap: in robots.txt?

Accepted Answer

It's not required but highly recommended. Adding a Sitemap: line helps search engines discover your sitemap without needing to find it via Google Search Console. You can include multiple Sitemap: lines if you have more than one.

Robots.txt generator

Need an SEO strategy, not just config?

What robots.txt actually controls

When to use each preset

Allow all

Standard SEO

Block AI crawlers

Block all

Frequently asked questions