[Robots.txt] Incorrect robots txt result for uppercase user agents

Example for forbes.com robots txt

https://www.forbes.com/robots.txt

They have blocked all paths for `GPTBot`

```
User-agent: GPTBot
Disallow: /
```

However for url `https://www.forbes.com/test`

```java
public boolean canCrawl(String url, String userAgent, String robotsBody)
            throws MalformedURLException {
        SimpleRobotRulesParser robotParser = new SimpleRobotRulesParser();
        robotParser.setExactUserAgentMatching(false);
        BaseRobotRules robotRules =
                robotParser.parseContent(
                        "https://www.forbes.com/robots.txt",
                        robotsBody.getBytes(StandardCharsets.UTF_8),
                        "text/plain",
                        Collections.singletonList(userAgent));
        return robotRules.isAllowed(url);
    }
```

returns `true`

The function is called as

```java
boolean canCrawl = canCrawl("https://www.forbes.com/test", "GPTBot", "<robots body>");
```

Verified this behaviour with https://github.com/samclarke/robots-parser and https://github.com/google/robotstxt, they both return `false` which seems correct.

If the UA would have been `gptbot` in robots.txt, it would return false for both `GPTBot` and `gptbot` UA. There seems to be some case sensitive check in the code base.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Robots.txt] Incorrect robots txt result for uppercase user agents #453

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Robots.txt] Incorrect robots txt result for uppercase user agents #453

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions