Bug report
Bug description:
The robots.txt parsing fails if one line is not parsable from a robots.txt file. I don't think this is valid behavior. Ideally, non-parsable/invalid lines should be skipped. The norobots-rfc says the same too: Implementors should pay particular attention to the robustness in parsing of the /robots.txt file..
File "/usr/local/lib/python3.11/urllib/robotparser.py", line 123, in parse
entry.rulelines.append(RuleLine(line[1], False))
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/robotparser.py", line 222, in __init__
path = urllib.parse.urlunparse(urllib.parse.urlparse(path))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/parse.py", line 395, in urlparse
splitresult = urlsplit(url, scheme, allow_fragments)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/parse.py", line 500, in urlsplit
_check_bracketed_host(bracketed_host)
File "/usr/local/lib/python3.11/urllib/parse.py", line 446, in _check_bracketed_host
ip = ipaddress.ip_address(hostname) # Throws Value Error if not IPv6 or IPv4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/ipaddress.py", line 54, in ip_address
raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: '[routes.productDetail(product.sku, product.slug)' does not appear to be an IPv4 or IPv6 address
I know [routes.productDetail(product.sku, product.slug) is clearly not a valid URL, but I don't think the whole parsing should error out because of this one line.
CPython versions tested on:
3.11
Operating systems tested on:
Linux
Linked PRs
Bug report
Bug description:
The robots.txt parsing fails if one line is not parsable from a robots.txt file. I don't think this is valid behavior. Ideally, non-parsable/invalid lines should be skipped. The norobots-rfc says the same too:
Implementors should pay particular attention to the robustness in parsing of the /robots.txt file..I know
[routes.productDetail(product.sku, product.slug)is clearly not a valid URL, but I don't think the whole parsing should error out because of this one line.CPython versions tested on:
3.11
Operating systems tested on:
Linux
Linked PRs
urllib.robotparser#113231