Robots.txt Guide 2026: Complete SEO Setup & Common Mistakes

Robots.txt is a 30-year-old text file that still controls billions of dollars in organic traffic. Get it wrong, and Google stops crawling your most valuable pages. Get it right, and you direct crawl budget exactly where it matters most.

What is Robots.txt?

Robots.txt is a plain text file placed at the root of your website (yourdomain.com/robots.txt) that tells search engine crawlers which pages or sections they can and cannot access. It uses a simple directive syntax recognized by all major search engines.

Critical distinction: robots.txt controls crawling, not indexing. A disallowed URL can still appear in search results if other sites link to it. To prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header.

Basic Syntax & Directives

User-agent: *
Disallow: /admin/
Allow: /admin/public/
Sitemap: https://yourdomain.com/sitemap.xml

Core Directives Explained

User-agent: Specifies which crawler the rules apply to. * means all crawlers. Googlebot targets only Google.
Disallow: Paths the crawler should NOT access. Disallow: / blocks everything. Disallow: /admin/ blocks the admin directory.
Allow: Explicitly permits access to a path within a broader Disallow rule. Useful for exceptions.
Sitemap: Tells crawlers where to find your XML sitemap. Must be an absolute URL (include https://).
Crawl-delay: Specifies wait time (in seconds) between requests. Only Bing respects this; Google ignores it.

Templates for Popular Platforms

WordPress Robots.txt

# WordPress robots.txt
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Allow: /wp-admin/admin-ajax.php
Allow: /*.css$
Allow: /*.js$
Sitemap: https://yourdomain.com/sitemap.xml

Next.js Robots.txt

# Next.js robots.txt
User-agent: *
Allow: /
Disallow: /api/
Allow: /_next/static/
Sitemap: https://yourdomain.com/sitemap.xml

eCommerce Robots.txt

# eCommerce robots.txt
User-agent: *
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /cart
Disallow: /checkout
Disallow: /account
Sitemap: https://yourdomain.com/sitemap.xml

7 Critical Robots.txt Mistakes

Blocking CSS/JS Files

Google needs CSS and JavaScript to render pages properly. Blocking /assets/ or /*.js$ prevents rendering and can hurt rankings. Always Allow CSS/JS.

Using Disallow: / by Accident

Typo or misconfiguration that blocks your entire site. One wrong character = zero organic traffic. Always test before deploying.

Blocking the Sitemap

Disallow: /sitemap.xml prevents Google from discovering your content. Sitemaps should NEVER be blocked. Declare them with Sitemap: directive.

Using Robots.txt Instead of Noindex

Disallowed URLs can still be indexed via external links. To truly prevent indexing, use <meta name="robots" content="noindex"> in the page HTML.

Forgetting Trailing Slashes

Disallow: /admin blocks /admin.html but NOT /admin/. Use Disallow: /admin/ to block the directory. Wildcards help: Disallow: /admin*

Blocking Entire /blog/ Directory

Common in staging-to-production migrations where devs forget to remove test blocks. Always audit before launch.

No Sitemap Directive

Without Sitemap: directive, Google must discover your sitemap manually. Add it to speed up indexing and ensure discoverability.

Testing Robots.txt in Google Search Console

Before deploying robots.txt to production, test it thoroughly:

Go to Google Search Console → robots.txt Tester
Paste your draft robots.txt content
Test specific URLs to verify they\'re allowed/blocked correctly
Check for syntax errors (red flags in the tester)
Submit for re-crawling after deployment

You can also use our free robots.txt generator and validator for real-time syntax checking and templates.

Advanced: User-Agent Specificity

Different crawlers can have different rules. More specific user-agent rules take precedence over general ones:

User-agent: Googlebot
Disallow: /private/

User-agent: Bingbot
Crawl-delay: 10
Disallow: /private/

User-agent: *
Disallow: /

This blocks everyone except Google and Bing, while giving Bing a crawl delay.

Frequently Asked Questions

Does every website need a robots.txt file?▼

Yes, every public website should have a robots.txt file, even if it just allows all crawling. Without one, some crawlers make assumptions that may not match your intent. A minimal robots.txt (User-agent: * / Allow: / / Sitemap: URL) ensures crawlers know they're welcome and can find your sitemap. Google recommends having one.

What's the difference between robots.txt and meta robots?▼

Robots.txt controls crawling (whether bots visit a page). Meta robots controls indexing (whether a page appears in search results). Disallowing a URL in robots.txt does NOT prevent indexing if external links exist. To truly block indexing, use <meta name="robots" content="noindex"> or X-Robots-Tag HTTP header.

Can robots.txt hurt my SEO?▼

Yes, if misconfigured. Common SEO-killing mistakes: blocking CSS/JS files (Google needs them for rendering), blocking entire /blog/ directory accidentally, using Disallow: / instead of specific paths, blocking sitemap.xml itself. Always test in Google Search Console robots.txt Tester before deploying.

Should I block bad bots in robots.txt?▼

No. Malicious bots ignore robots.txt — it's a voluntary standard. To block bad bots, use server-level controls (nginx/Apache config), Cloudflare Bot Fight Mode, or WAF rules. Robots.txt is only effective for legitimate crawlers that respect the standard (Google, Bing, etc.).

How do I test my robots.txt before going live?▼

Use Google Search Console → robots.txt Tester tool. Upload your draft, test specific URLs to see if they're blocked, and check for syntax errors. Also test with our free robots.txt validator at turboseo.tools/robots-txt-generator for real-time syntax checking before deployment.

AI Overview Checker

Bulk URL Checker

Image Compressor

H-Tags

Meta Tags

Social Cards

Redirects

Internal Links

Schema

Page Compare

Keywords

RankSpeed