-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessarily large memory footprint #10550
Comments
It's a bit unclear from your report how you are using aiohttp as your reproduction code doesn't include aiohttp. Can you provide a working reproducer that uses aiohttp? |
I'm pretty sure I recall this being something we can't do as it would break too many things, though I could be thinking of something else.. If you run the test suite with your changes (or create a draft PR and let the CI run it), then it'll likely show you why it won't work. If the tests are still passing, we can evaluate the change properly (and as mentioned above, we'd want a proper test). |
@bdraco admittedly, I'm not using aiohttp directly but rather indirectly through UPath -> fsspec -> s3fs -> aiohttp. Given that fsspec/s3fs are fairly common I would assume that they are using aiohttp properly but that may not be the case. I wanted to see if this solution would even be acceptable, so I'll open a provisional PR to see if any tests break and if they don't you can tell me what the next steps are. Perhaps like you said to have a repro that doesn't rely on other libraries. |
I'm pretty sure the issue is you are reading the whole file in at once
If you read in chunks I expect the issue will go away |
I agree that if I were to read the file in chunks, the resulting bloat in memory use would be minimal. That being said, I am writing library code, not application code and part of the API is to hand the user a stream-like object on which they are allowed to call I see that the provisional PR I opened created a terrible performance regression. I'll see if there is a way to address the performance so that there is no regression and also not have this memory bloat. Performance here is clearly much more important. |
I made an iota of progress in that I have a repro that only hits aiobotocore & aiohttp. I also ran a test where I am trying the
With
With current
|
Describe the bug
I have been trying to debug a memory pressure bug which has lead me to
aiohttp
. I have loosely traced the issue to the fact thatstream.py
uses the pattern of appending byte chunks to a list, then returningb"".join(chunks)
. When I replace this pattern with using abuffer = io.BytesIO()
that I append to and returnbuffer.getvalue()
, the unnecessary memory pressure goes away.To reproduce the issue, I am downloading a large file (~500MB in this example) from S3 using UPath (which uses
aiohttp
internally). Without my proposed change, my high-watermark memory footprint is about twice the size of the large file, and when I get rid of the reference to the data it remains at about the size of the file. With my proposed change of usingio.BytesIO()
and avoidingb"".join(chunks)
, the high-watermark memory footprint is about the size of the file and when I drop the reference to it, the memory footprint goes back down to almost nothing.For determining the memory footprint, I am using
psutil
and looking at the rss.I am happy to try to dig deeper if needed or if you think that replacing the joining of bytes by an
io.BytesIO
buffer is not desirable, but given that I have a simple repro and a simple fix, I thought I would report the issue to open up the discussion. I'm happy to open a RP with my proposed solution.To Reproduce
Below is the output with the version 3.11.13 of aiohttp
Expected behavior
When I modify
aiohttp
to use my proposed solution with anio.BytesIO()
buffer, I see the memory footprint grow to the size of the file when I hold a reference to it, and then going back down to almost nothing when I release the reference to the data, which is what I would expect.Logs/tracebacks
Python Version
aiohttp Version
multidict Version
propcache Version
$ python -m pip show propcache
yarl Version
$ python -m pip show yarl
OS
Unnecessarily large memory footprint
Related component
Client
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: