SEO & Web

Robots.txt Generator

Build a correct robots.txt with allow/deny rules, sitemap and AI-bot presets.

  • Free forever
  • No sign-up
  • Runs in your browser
Share X LinkedIn

User-agent groups

One path per line. Leave Disallow empty to allow everything for this agent.

AI crawler presets

Selected bots get a blanket Disallow: / appended for that user-agent.

Generated robots.txt

User-agent: *
Disallow:

Test a path

Matching uses the longest-prefix rule: the most specific path wins, and Allow beats Disallow on a tie.

What robots.txt is for

robots.txt is a plain text file at the root of your site that tells automated crawlers which parts of it they may request. Before a well-behaved bot fetches your pages, it reads https://yourdomain.com/robots.txt and obeys the rules it finds. It is the oldest and most widely supported part of the Robots Exclusion Protocol, and almost every major search engine and AI company respects it.

Crucially, robots.txt is advisory, not enforcing. It is a publicly visible set of instructions that compliant crawlers follow voluntarily. It does not password-protect anything, and a crawler that chooses to ignore it still can. Think of it as a sign on the door, not a lock.

This generator builds a valid file from a simple form and lets you test paths against it — all in your browser, with no upload and no account.

How robots.txt rules work

A robots.txt file is a series of groups. Each group starts with one or more User-agent lines naming the crawlers it targets, followed by the rules that apply to them:

  • User-agent — the crawler the rules apply to. * matches every bot that doesn't have its own group.
  • Disallow — a path prefix the crawler should not request. Disallow: /admin/ blocks everything under /admin/. An empty Disallow: means "nothing is blocked."
  • Allow — a path prefix that is permitted, used to carve exceptions out of a broader Disallow.
  • Crawl-delay — an optional hint asking the crawler to wait a number of seconds between requests. Honoured by some engines, ignored by Google.
  • Sitemap — an absolute URL to your XML sitemap. It is independent of any user-agent and usually sits at the bottom of the file.

A crawler reads the group that best matches its own name, ignoring the others. So a specific User-agent: GPTBot group fully overrides the User-agent: * group for that bot — it does not inherit the wildcard rules.

Allow vs Disallow precedence

When more than one rule could apply to the same URL, the most specific rule wins — and specificity means the length of the matching path. The classic pattern:

User-agent: *
Disallow: /downloads/
Allow: /downloads/free-guide.pdf

Here /downloads/ is blocked, but the longer, more specific Allow re-opens the single file. When an Allow and a Disallow match with the same length, Allow wins. The built-in tester applies exactly this longest-prefix logic: paste a path, pick a user-agent, and it reports Allowed or Disallowed along with which rule decided.

How to use this generator

  1. Add user-agent groups. Start with * for all crawlers, then add specific groups (for example Googlebot) if some bots need different rules.
  2. List allow and disallow paths, one per line. Leave Disallow empty to allow everything for that agent.
  3. Toggle AI-crawler presets. Pick individual bots or hit Block all AI bots to append a blanket Disallow: / for each.
  4. Add your sitemap URL and, if you want, a crawl-delay.
  5. Watch the file render live, then Copy or Download it and upload it to your site's root.
  6. Test paths in the tester to confirm the rules behave the way you expect before you ship.

The AI-crawler landscape — and why people block them

Beyond the familiar search bots, a newer class of crawlers gathers content to train large language models or to answer questions in AI products. The presets here cover the most common ones:

  • GPTBot — OpenAI's crawler for training data.
  • ClaudeBot — Anthropic's crawler for Claude.
  • Google-Extended — Google's token to opt out of Gemini and Vertex AI training without affecting Search.
  • PerplexityBot — Perplexity's answer-engine crawler.
  • CCBot — Common Crawl, whose open dataset feeds many AI models.
  • Bytespider — ByteDance's crawler.

Site owners increasingly block some or all of these to keep their work out of training corpora, to retain control over how content is reused, or simply to conserve crawl budget. Others leave them enabled because being cited by AI assistants drives visibility and referrals. There is no universally correct answer, which is why this tool makes the choice per-bot and reversible rather than all-or-nothing. Note again that these directives only stop crawlers that honour robots.txt.

Best practices and common mistakes

  • Don't block CSS and JS. If crawlers can't fetch the assets that render your pages, your rankings can suffer. Disallow data and admin paths, not the resources pages need to display.
  • Don't rely on robots.txt for secrecy. Listing /secret-admin/ in a public file advertises its existence. Protect sensitive areas with authentication.
  • One file per host. www, the bare domain, and every subdomain each need their own robots.txt at their own root.
  • Disallow blocks crawling, not indexing. A disallowed URL can still appear in search results if other sites link to it. To keep a page out of the index, allow crawling and use a noindex meta tag instead.
  • Validate before you ship. A stray rule can hide your whole site. Use the path tester here to sanity-check the important URLs.

Why this generator runs in your browser

Everything — building the file and testing paths — happens locally in your browser. Nothing about your site structure, your blocked sections or your draft rules is uploaded, logged or stored, and there's no sign-up. You get the AI-crawler presets that most free generators still lack, an instant path tester most of them don't offer at all, and the confidence that your configuration stayed on your own machine until you chose to publish it.

Frequently asked questions

Comet's got your back

Stuck on something? Every tool has a short guide and FAQ — and Comet can point you to the right spot.

Visit help centre