📚 Learn About Robots.txt

What is robots.txt?

A robots.txt file tells web crawlers which pages or files they can or can't request from your site. It's placed at the root of your website (e.g., yoursite.com/robots.txt).

Basic Structure

# Allow all bots to crawl everything User-agent: * Allow: / # Block specific directories Disallow: /admin/ Disallow: /private/ # Sitemap location Sitemap: https://yoursite.com/sitemap.xml
Remember: robots.txt is publicly accessible and not a security measure! Use proper authentication for sensitive content.

Prevent Duplicate Content

# Block parameter-based duplicates Disallow: /*?sort= Disallow: /*?filter= Disallow: /*?page= Disallow: /*?utm_ Disallow: /*?ref=

Conserve Crawl Budget

# Block low-value pages Disallow: /search/ Disallow: /cart/ Disallow: /checkout/ Disallow: /account/ Disallow: /print/

Multiple Sitemaps

Sitemap: https://yoursite.com/sitemap-posts.xml Sitemap: https://yoursite.com/sitemap-pages.xml Sitemap: https://yoursite.com/sitemap-images.xml

Bot-Specific Rules

# Allow major search engines User-agent: Googlebot Allow: / User-agent: Bingbot Crawl-delay: 2 Allow: / # Block resource-heavy crawlers User-agent: AhrefsBot Disallow: / User-agent: MJ12bot Disallow: /

Smart API Protection

# Allow only specific endpoints for rich snippets Allow: /api/schema/ Disallow: /api/

Performance Optimization

# Fast crawling for premium search engines User-agent: Googlebot Crawl-delay: 1 # Slower for less important bots User-agent: * Crawl-delay: 30

🔧 Generate Your Robots.txt