What is robots.txt
?
The robots.txt
file is a simple text file located in the root directory of a website (e.g., yourdomain.com/robots.txt
). It tells search engine crawlers (like Googlebot) which pages or sections of the site they are allowed or disallowed to crawl and index.
Why is robots.txt
Necessary?
- Controls Crawler Access
- Prevents search engines from indexing sensitive pages (e.g., admin panels, login pages, staging sites).
- Example:
User-agent: * Disallow: /admin/ Disallow: /private-files/
This blocks all bots from crawling/admin/
and/private-files/
.
- Avoids Duplicate Content Issues
- Stops Google from indexing multiple versions of the same page (e.g., printer-friendly pages, search results).
- Saves Crawl Budget
- For large sites,
robots.txt
helps prioritize which pages should be crawled first (important for SEO).
- Prevents Unwanted Indexing
- Keeps low-value pages (e.g., thank-you pages, internal search results) out of search results.
Common robots.txt
Rules
Rule | Purpose |
---|---|
User-agent: * | Applies to all search bots (Googlebot, Bingbot, etc.) |
Disallow: /folder/ | Blocks crawling of a specific directory |
Allow: /public-page.html | Overrides a Disallow rule for a specific page |
Sitemap: https://example.com/sitemap.xml | Tells crawlers where to find your sitemap |
Example:
User-agent: *
Disallow: /temp/
Disallow: /wp-admin/
Allow: /public-blog/
Sitemap: https://example.com/sitemap.xml
(Blocks /temp/
and /wp-admin/
but allows /public-blog/
)
What Happens if You Don’t Have a robots.txt
?
- Search engines will crawl everything (including private or duplicate pages).
- Wastes crawl budget on unimportant pages.
- Risk of sensitive pages appearing in Google search.
How to Check Your robots.txt
?
- Visit
yourdomain.com/robots.txt
in a browser. - Use Google Search Console’s robots.txt Tester (under “Old Tools” > “Robots.txt Tester”).
⚠️ Warning: A misconfigured robots.txt
can accidentally block Google from indexing your entire site! Always test changes.
Final Tip
- Do NOT block CSS/JS files (Google needs them to understand your site).
- Never use
robots.txt
to hide private data (use password protection instead).
Need help checking your robots.txt
? Share your URL, and I can review it! 🚀