Robots.txt is a 30-year-old text file that still controls billions of dollars in organic traffic. Get it wrong, and Google stops crawling your most valuable pages. Get it right, and you direct crawl budget exactly where it matters most.
What is Robots.txt?
Robots.txt is a plain text file placed at the root of your website (yourdomain.com/robots.txt) that tells search engine crawlers which pages or sections they can and cannot access. It uses a simple directive syntax recognized by all major search engines.
Critical distinction: robots.txt controls crawling, not indexing. A disallowed URL can still appear in search results if other sites link to it. To prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header.
Basic Syntax & Directives
Disallow: /admin/
Allow: /admin/public/
Sitemap: https://yourdomain.com/sitemap.xml
Core Directives Explained
- User-agent: Specifies which crawler the rules apply to.
*means all crawlers.Googlebottargets only Google. - Disallow: Paths the crawler should NOT access.
Disallow: /blocks everything.Disallow: /admin/blocks the admin directory. - Allow: Explicitly permits access to a path within a broader Disallow rule. Useful for exceptions.
- Sitemap: Tells crawlers where to find your XML sitemap. Must be an absolute URL (include https://).
- Crawl-delay: Specifies wait time (in seconds) between requests. Only Bing respects this; Google ignores it.
Templates for Popular Platforms
WordPress Robots.txt
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Allow: /wp-admin/admin-ajax.php
Allow: /*.css$
Allow: /*.js$
Sitemap: https://yourdomain.com/sitemap.xml
Next.js Robots.txt
User-agent: *
Allow: /
Disallow: /api/
Allow: /_next/static/
Sitemap: https://yourdomain.com/sitemap.xml
eCommerce Robots.txt
User-agent: *
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /cart
Disallow: /checkout
Disallow: /account
Sitemap: https://yourdomain.com/sitemap.xml
7 Critical Robots.txt Mistakes
Blocking CSS/JS Files
Google needs CSS and JavaScript to render pages properly. Blocking /assets/ or /*.js$ prevents rendering and can hurt rankings. Always Allow CSS/JS.
Using Disallow: / by Accident
Typo or misconfiguration that blocks your entire site. One wrong character = zero organic traffic. Always test before deploying.
Blocking the Sitemap
Disallow: /sitemap.xml prevents Google from discovering your content. Sitemaps should NEVER be blocked. Declare them with Sitemap: directive.
Using Robots.txt Instead of Noindex
Disallowed URLs can still be indexed via external links. To truly prevent indexing, use <meta name="robots" content="noindex"> in the page HTML.
Forgetting Trailing Slashes
Disallow: /admin blocks /admin.html but NOT /admin/. Use Disallow: /admin/ to block the directory. Wildcards help: Disallow: /admin*
Blocking Entire /blog/ Directory
Common in staging-to-production migrations where devs forget to remove test blocks. Always audit before launch.
No Sitemap Directive
Without Sitemap: directive, Google must discover your sitemap manually. Add it to speed up indexing and ensure discoverability.
Testing Robots.txt in Google Search Console
Before deploying robots.txt to production, test it thoroughly:
- Go to Google Search Console → robots.txt Tester
- Paste your draft robots.txt content
- Test specific URLs to verify they\'re allowed/blocked correctly
- Check for syntax errors (red flags in the tester)
- Submit for re-crawling after deployment
You can also use our free robots.txt generator and validator for real-time syntax checking and templates.
Advanced: User-Agent Specificity
Different crawlers can have different rules. More specific user-agent rules take precedence over general ones:
Disallow: /private/
User-agent: Bingbot
Crawl-delay: 10
Disallow: /private/
User-agent: *
Disallow: /
This blocks everyone except Google and Bing, while giving Bing a crawl delay.