1. What Are Wildcard Rules?
Robots.txt supports two wildcards:
* (matches any number of characters)
$ (anchors the rule to the end of a URL)
Examples:
Disallow: /search*
Disallow: /*.pdf$
2. When to Use Wildcards
✅ To Block Parameter Variants
Disallow: /*?session=
Disallow: /*?ref=
Useful when parameters create duplicate or crawl-waste URLs.
✅ To Block File Types
Disallow: /*.pdf$
Disallow: /*.docx$
Prevents crawling of non-HTML assets that don’t need indexing.
✅ To Block All Variations of a Path
Disallow: /tag*
Disallow: /filter/*
Stops crawlers from accessing entire sections with dynamic or thin content.
✅ To Protect Infinite URL Spaces
If user-generated content or calendar pages cause near-infinite URL generation:
Disallow: /calendar/*
3. When Not to Use Wildcards
❌ To Block URLs You Actually Want Indexed
A broad pattern like:
Disallow: /product*
…could block valuable product pages unintentionally.
❌ As a Substitute for Canonical or Noindex
Robots.txt blocks crawling, not indexing. Google can still index blocked URLs if linked externally—just without understanding the content.
❌ To Block Critical JS or CSS
Blocking:
Disallow: /assets/*
…can prevent Google from rendering pages correctly. Keep JS/CSS crawlable for proper page rendering.
4. Best Practices
- Test with robots.txt Tester.
- Use wildcards sparingly and with clear intent.
- Monitor blocked URLs in Search Console → Pages → Blocked by robots.txt.
- Combine with meta tags (noindex) for full control when needed.
Conclusion
Wildcard rules in robots.txt offer precision—but with risk. Use them to reduce crawl waste and protect thin or duplicate areas. Never rely on them to control indexation alone. One misplaced asterisk can deindex a site section—so test before deploying.