Where do I put the robots.txt file?

It must live at the root of your domain, reachable at https://example.com/robots.txt. A file in a subfolder is ignored. One robots.txt governs one host, so subdomains and the www / non-www versions each need their own.

What is the difference between Allow and Disallow when they overlap?

When two rules match the same URL, the most specific one — the longest matching path — wins. If an Allow and a Disallow match with the same length, Allow takes precedence. That is how you can disallow a whole folder but re-allow one file inside it, and the tester here reflects exactly that logic.

Should I block AI bots like GPTBot and Google-Extended?

That is a content and business decision. Block them if you do not want your pages used to train or to be cited by AI models, or to save crawl budget. Leave them on if AI-driven referrals and visibility matter to you. This tool gives you per-bot toggles plus a one-click block-all so you can choose deliberately.

SEO & Web

Robots.txt Generator

Build a correct robots.txt with allow/deny rules, sitemap and AI-bot presets.

Free forever
No sign-up
Runs in your browser

Share X LinkedIn

User-agent groups

User-agent

Allow paths

Disallow paths

One path per line. Leave Disallow empty to allow everything for this agent.

AI crawler presets

Selected bots get a blanket Disallow: / appended for that user-agent.

Sitemap URL

Crawl-delay (optional)

Generated robots.txt

User-agent: *
Disallow:

Test a path

User-agentPath

Matching uses the longest-prefix rule: the most specific path wins, and Allow beats Disallow on a tie.

What robots.txt is for

robots.txt is a plain text file at the root of your site that tells automated crawlers which parts of it they may request. Before a well-behaved bot fetches your pages, it reads https://yourdomain.com/robots.txt and obeys the rules it finds. It is the oldest and most widely supported part of the Robots Exclusion Protocol, and almost every major search engine and AI company respects it.

Crucially, robots.txt is advisory, not enforcing. It is a publicly visible set of instructions that compliant crawlers follow voluntarily. It does not password-protect anything, and a crawler that chooses to ignore it still can. Think of it as a sign on the door, not a lock.

This generator builds a valid file from a simple form and lets you test paths against it — all in your browser, with no upload and no account.

How robots.txt rules work

A robots.txt file is a series of groups. Each group starts with one or more User-agent lines naming the crawlers it targets, followed by the rules that apply to them:

User-agent — the crawler the rules apply to. * matches every bot that doesn't have its own group.
Disallow — a path prefix the crawler should not request. Disallow: /admin/ blocks everything under /admin/. An empty Disallow: means "nothing is blocked."
Allow — a path prefix that is permitted, used to carve exceptions out of a broader Disallow.
Crawl-delay — an optional hint asking the crawler to wait a number of seconds between requests. Honoured by some engines, ignored by Google.
Sitemap — an absolute URL to your XML sitemap. It is independent of any user-agent and usually sits at the bottom of the file.

A crawler reads the group that best matches its own name, ignoring the others. So a specific User-agent: GPTBot group fully overrides the User-agent: * group for that bot — it does not inherit the wildcard rules.

Allow vs Disallow precedence

When more than one rule could apply to the same URL, the most specific rule wins — and specificity means the length of the matching path. The classic pattern:

User-agent: *
Disallow: /downloads/
Allow: /downloads/free-guide.pdf

Here /downloads/ is blocked, but the longer, more specific Allow re-opens the single file. When an Allow and a Disallow match with the same length, Allow wins. The built-in tester applies exactly this longest-prefix logic: paste a path, pick a user-agent, and it reports Allowed or Disallowed along with which rule decided.

How to use this generator

Add user-agent groups. Start with * for all crawlers, then add specific groups (for example Googlebot) if some bots need different rules.
List allow and disallow paths, one per line. Leave Disallow empty to allow everything for that agent.
Toggle AI-crawler presets. Pick individual bots or hit Block all AI bots to append a blanket Disallow: / for each.
Add your sitemap URL and, if you want, a crawl-delay.
Watch the file render live, then Copy or Download it and upload it to your site's root.
Test paths in the tester to confirm the rules behave the way you expect before you ship.

The AI-crawler landscape — and why people block them

Beyond the familiar search bots, a newer class of crawlers gathers content to train large language models or to answer questions in AI products. The presets here cover the most common ones:

GPTBot — OpenAI's crawler for training data.
ClaudeBot — Anthropic's crawler for Claude.
Google-Extended — Google's token to opt out of Gemini and Vertex AI training without affecting Search.
PerplexityBot — Perplexity's answer-engine crawler.
CCBot — Common Crawl, whose open dataset feeds many AI models.
Bytespider — ByteDance's crawler.

Site owners increasingly block some or all of these to keep their work out of training corpora, to retain control over how content is reused, or simply to conserve crawl budget. Others leave them enabled because being cited by AI assistants drives visibility and referrals. There is no universally correct answer, which is why this tool makes the choice per-bot and reversible rather than all-or-nothing. Note again that these directives only stop crawlers that honour robots.txt.

Best practices and common mistakes

Don't block CSS and JS. If crawlers can't fetch the assets that render your pages, your rankings can suffer. Disallow data and admin paths, not the resources pages need to display.
Don't rely on robots.txt for secrecy. Listing /secret-admin/ in a public file advertises its existence. Protect sensitive areas with authentication.
One file per host. www, the bare domain, and every subdomain each need their own robots.txt at their own root.
Disallow blocks crawling, not indexing. A disallowed URL can still appear in search results if other sites link to it. To keep a page out of the index, allow crawling and use a noindex meta tag instead.
Validate before you ship. A stray rule can hide your whole site. Use the path tester here to sanity-check the important URLs.

Why this generator runs in your browser

Everything — building the file and testing paths — happens locally in your browser. Nothing about your site structure, your blocked sections or your draft rules is uploaded, logged or stored, and there's no sign-up. You get the AI-crawler presets that most free generators still lack, an instant path tester most of them don't offer at all, and the confidence that your configuration stayed on your own machine until you chose to publish it.

Frequently asked questions

Robots.txt is a request, not a wall. Well-behaved crawlers — including Google's, OpenAI's GPTBot and Anthropic's ClaudeBot — honour it, so adding a disallow for them is the standard, supported way to opt out. But the file is publicly readable and not enforced, so a crawler that ignores the rules can still access pages. For pages that must stay private, use authentication or server-side blocking, not robots.txt alone.

Comet's got your back

Stuck on something? Every tool has a short guide and FAQ — and Comet can point you to the right spot.

Visit help centre

Related tools

All SEO & Web tools →