Robots.txt Wildcard – Cheatsheet

2025-07-13 00:11
36
Read also:

Contents

Supported Wildcards

1. * – Matches any number of characters

txt

CopyEdit

Disallow: /private/*

➡ Blocks all URLs starting with /private/ (e.g., /private/data, /private/files/image.png).

2. $ – Anchors the pattern to the end of the URL

txt

CopyEdit

Disallow: /*.pdf$

➡ Blocks all .pdf files anywhere on the site.

❌ Not Supported by All Bots

While Googlebot and Bingbot support * and $, other bots might not:

Bot NameWildcard SupportNotes
DuckDuckBotPartial/UnknownNo official wildcard documentation. Likely follows standard rules only.
YandexBot✅ LimitedSupports *, but not $ (per Yandex docs).
Baiduspider❌ No wildcard supportIgnores * and $. Uses strict string match only.
Sogou Spider❌ No wildcard supportIgnores advanced rules. Known for aggressive crawling.
AhrefsBot / SemrushBot❌ No clear supportRespect disallow directives but typically do not interpret wildcards.
MJ12bot (Majestic)❌ No wildcard supportFollows basic syntax only.
Applebot✅ PartialSupports basic patterns, but $ may not be recognized.
archive.org_bot❌ No support for *, only respects Disallow paths. 

🧠 Common Use Cases

Case 1: Block tracking parameters

Disallow: /*?ref=

➡ Blocks URLs like /page?ref=affiliate, but does not block /page?ref=123&other=456.

✅ Better:

Disallow: /*?ref=*

➡ Blocks any ?ref= parameter, regardless of its value.

Case 2: Prevent indexing of file types

Disallow: /*.zip$

Disallow: /*.exe$

➡ Blocks downloads or archives from being indexed.

Case 3: Block specific folders

Disallow: /temp/

Disallow: /dev/*

➡ Blocks everything inside /temp/ and any subfolders under /dev/.

Case 4: Allow certain paths while disallowing broader ones

Disallow: /images/

Allow: /images/public/

➡ Blocks /images/ but allows /images/public/logo.png.

Case 5: Catch all dynamic URLs

Disallow: /*?*

➡ Blocks all URLs that include any query string.

Edge Cases to Watch

❌ Incorrect wildcard inside query:

Disallow: /?ref=*

This will not work as expected in most cases. Match query patterns like this:

Disallow: /*?ref=*

🛠️ Testing & Tools

Always test with Robots.txt Tester:

 

✅ Recap Cheatsheet

PatternMatchesUse Case
*Any charactersWildcards in path or query
$End of URLBlock file extensions
/*?*All query stringsBlock dynamic URLs
/*?utm_*All utm tagsBlock marketing parameters

Learn more about when to use wildcard and when not.