"Memory Slots" (or: how to save memory in Python)
I recently sort of locked my workstation while I tried to query a SOAP webservice at work about for 24000 entities. While my request was as simple as “list all entities you know” to the server, its response was heavy, obviously.
The plain XML in the SOAP response of the service was just about 30 megabytes, not that much. However, while Suds (website, the SOAP client we are using) is parsing the received XML, it generates various objects for the found elements and attributes in the response. This is basically pretty cool because you get a clean object back from Suds representing the structure of the service response. The only downside is that probably nobody tested Suds with so much data, to process the 30 megabytes XML data, it takes up to 1,7 gigabytes of RAM (RSS). This is quite a lot, even for 24000 entities.
So first I debugged my client code around Suds but it turned
out that Suds itself hogs the memory. Thanks to objgraph (and its
awesome show_most_common_types() method I found that Suds creates about
700000 suds.sax.element.Element objects and about 120000
suds.sax.attribute.Attribute objects. I assume there are references to
these objects kept which prevent the garbage collector from freeing
them. Though this is just a guess, since I had only limited time to
debug this issue, I could not find the real cause.
However, thanks to a colleague and his great hint, I could reduce the
(RSS) memory usage of Suds by using __slots__
for the Element
and Attribute to about 740 megabytes, so more than the half of memory
was saved.
This is cool, isn’t it?
If you don’t know what __slots__
do in Python, the documentation
explains it in detail. In short: using __slots__
prevents Python of
creating a dictionary for each instance to hold the instance attributes
and instead just reserves the necessary memory per instance to hold
the values of the instance attributes.
However, as often with cool things, __slots__
also have downsides
as explained in the documentation and on a Stack Overflow discussion.
One of the most obvious disadvantage is that you can’t easily
pickle objects with __slots__
_. Though it’s probably still possible by
using custom __getstate__
and __setstate__
methods. Another one is
you can’t add new attributes to instances of classes using __slots__
.
At least for my case, the given limitations were acceptable and so
as a temporary solution until the real memory hog problem in Suds
is found and fixed, __slots__
work quite good.
(I will send patches to Suds once I really know it doesn’t break other things and when I find to do it.)
Update: I sent my changes to upstream together with some other improvements: https://fedorahosted.org/suds/ticket/445