Skip to content

quote_from_bytes uses a lot of memory for larger bytestrings #95865

Description

@iforapsy

Bug report

When passed a bytestring that is over a hundred mebibytes (MiB), the urllib.parse.quote_from_bytes function uses much more memory and CPU than one would expect.

repro.py:

#!/usr/bin/env python3

import base64
from time import perf_counter
from urllib.parse import quote_from_bytes

MIB = 1024 ** 2


def main():
    bytes_ = base64.b64encode(100 * MIB * b'\x00')  # note 1
    start = perf_counter()
    quoted = quote_from_bytes(bytes_)
    stop = perf_counter()

    print(f"Quoting {len(bytes_)/1024**2:.3f} MiB took {stop-start} seconds")


if __name__ == '__main__':
    main()

I use /usr/bin/time to track how much CPU and memory is used.

$ /usr/bin/time -v ./repro.py
Quoting 133.333 MiB took 7.290915511985077 seconds
        Command being timed: "./repro.py"
        User time (seconds): 7.12
        System time (seconds): 0.68
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.82
        ...
        Maximum resident set size (kbytes): 1374872
        ...

The function ends up at one point needing ten times the size of the bytestring to quote it (i.e. 1.31 GiB). It also takes several seconds to return. I expect it to return in under a second. Fortunately, there's no memory leak as the interpreter does return the memory after the function returns.

Interestingly, if I reduce 100 to 90 in the line marked "note 1", the function returns in half a second and uses only 250 MiB, which is much more in line with my pre-bug expectations.

This function consuming so much memory affects the AWSSDK for Python, boto3, as a lot of AWS APIs are called with URL-encoded parameters. boto3/botocore calls urllib.parse.urlencode to do that encoding. That ends up calling the problematic quote_from_bytes. Sample stack trace:

  File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 508, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 898, in _make_api_call
    http, parsed_response = self._make_request(
  File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 921, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 119, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 198, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 139, in create_request
    prepared_request = self.prepare_request(request)
  File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 150, in prepare_request
    return request.prepare()
  File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 473, in prepare
    return self._request_preparer.prepare(self)
  File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 360, in prepare
    body = self._prepare_body(original)
  File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 416, in _prepare_body
    body = urlencode(params, doseq=True)
  File "/usr/lib/python3.8/urllib/parse.py", line 962, in urlencode
    v = quote_via(v, safe)
  File "/usr/lib/python3.8/urllib/parse.py", line 870, in quote_plus
    return quote(string, safe, encoding, errors)
  File "/usr/lib/python3.8/urllib/parse.py", line 859, in quote
    return quote_from_bytes(string, safe)
  File "/usr/lib/python3.8/urllib/parse.py", line 898, in quote_from_bytes
    return ''.join([quoter(char) for char in bs])

Your environment

Python 3.8.10 on Ubuntu 20.04 running on a t3.large EC2 instance. I have also been able to reproduce it with Python 3.10.6 and 3.11.0rc1+. I also reproduced it on Windows 10 running Python 3.9.13.

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.12only security fixesperformancePerformance or resource usage

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions