Should Allow Come Before Disallow in robots.txt? RFC and Compatibility Best Practices

robots.txt controls web crawling. Have you ever been told that “Allow directives should be written before comprehensive Disallow directives”?

While this practice is treated as “common sense” in many workplaces, do you truly understand its technical background?

To get straight to the point, this advice is “incorrect if only considering crawlers compliant with the latest RFC” but “correct from the perspective of maximizing compatibility with all crawlers”.

This article explores the interpretation rules of robots.txt behind this seemingly contradictory conclusion and presents best practices for real-world implementation.

robots.txt Specification: Order Doesn’t Matter

Currently, major search engines like Google and Bing follow the robots.txt specification standardized as RFC 9309 (Robots Exclusion Protocol).

In this specification, the priority of rule application is not determined by the order of directives. It follows these rules:

Crawlers evaluate all Allow and Disallow directives within groups matching their User-agent.
For the target URL, the rule with the longest path (i.e., most specific) takes precedence.
If multiple rules have the same path length, Allow takes precedence over Disallow.

For example, the following two robots.txt files have exactly the same meaning for RFC 9309-compliant crawlers:

# Pattern A: Disallow first
User-agent: *
Disallow: /
Allow: /assets/

# Pattern B: Allow first
User-agent: *
Allow: /assets/
Disallow: /

In both patterns, when evaluating the URL /assets/styles.css, Allow: /assets/ (path length: 7) is more specific than Disallow: / (path length: 1), so crawling of the /assets/ directory is allowed.

Google’s official documentation also explains this specificity-based evaluation logic without mentioning line order. In other words, for major search engines, the order of Allow and Disallow doesn’t affect crawling results.

Why Is “Allow First” Recommended?

The background of this convention lies in the history of robots.txt and the diversity of existing crawlers.

Historically, there have been two main approaches to interpreting robots.txt rules:

Specificity Rule As described above, this is the current standard adopted by Google and others. Priority is determined by path length.
First Match Rule A method adopted by older or simply-implemented custom crawlers. They read the file from top to bottom and apply the first matching rule, ending the evaluation.

For crawlers using the “first match” rule, order is critically important. Let’s look at Pattern A again:

# Pattern A: Disallow first
User-agent: *
Disallow: /  # ← Everything matches here
Allow: /assets/ # ← This line is never evaluated

When this crawler evaluates /assets/styles.css, it determines “no crawling allowed” when it matches the first Disallow: /, and the subsequent Allow: /assets/ is ignored. As a result, it unintentionally denies crawling of the entire site.

To avoid such tragedies, the practice of “writing exceptional permissions (Allow) first and comprehensive denials (Disallow) later” emerged and spread as a defensive writing style to ensure intended behavior across all crawlers.

Practical robots.txt Recommendations

Based on this background, our action plan is clear:

Target	Order Impact	Recommended Writing Style
Major search engines like Google/Bing	No impact	Either order is fine
Old crawlers or unknown bots	High possibility of impact	`Allow` → `Disallow` order is safer
Code readability/maintainability	Human interpretability	`Allow` → `Disallow` (“exceptions first, general rules later”) is more intuitive

Summary

Let me summarize the key points about Allow/Disallow order in robots.txt:

In RFC 9309 (standard specification) and major crawlers, line order doesn’t affect evaluation. The most specific path takes precedence.
The convention of “writing Allow first” is a best practice to ensure compatibility with crawlers using the old “first match” rule.
Beyond compatibility, from a readability perspective, the order of “exceptions (Allow) first, general (Disallow) later” makes sense.

Therefore, while the claim that “it won’t work unless you change the order” may be technically inaccurate in some cases, it’s extremely valuable advice for ensuring maximum compatibility and preventing unintended behavior. Unless there’s a specific reason not to, following this safe practice is recommended.

That’s all from the Gemba, where I’ve relearned about robots.txt specifications.

robots.txt Specification: Order Doesn’t Matter

Why Is “Allow First” Recommended?

Practical robots.txt Recommendations

Summary

References