"Memory Slots" (or: how to save memory in Python)

I recently sort of locked my workstation while I tried to query a SOAP webservice at work about for 24000 entities. While my request was as simple as “list all entities you know” to the server, its response was heavy, obviously.

The plain XML in the SOAP response of the service was just about 30 megabytes, not that much. However, while Suds (website, the SOAP client we are using) is parsing the received XML, it generates various objects for the found elements and attributes in the response. This is basically pretty cool because you get a clean object back from Suds representing the structure of the service response. The only downside is that probably nobody tested Suds with so much data, to process the 30 megabytes XML data, it takes up to 1,7 gigabytes of RAM (RSS). This is quite a lot, even for 24000 entities.

So first I debugged my client code around Suds but it turned out that Suds itself hogs the memory. Thanks to objgraph (and its awesome show_most_common_types() method I found that Suds creates about 700000 suds.sax.element.Element objects and about 120000 suds.sax.attribute.Attribute objects. I assume there are references to these objects kept which prevent the garbage collector from freeing them. Though this is just a guess, since I had only limited time to debug this issue, I could not find the real cause. However, thanks to a colleague and his great hint, I could reduce the (RSS) memory usage of Suds by using __slots__ for the Element and Attribute to about 740 megabytes, so more than the half of memory was saved.

This is cool, isn’t it?

If you don’t know what __slots__ do in Python, the documentation explains it in detail. In short: using __slots__ prevents Python of creating a dictionary for each instance to hold the instance attributes and instead just reserves the necessary memory per instance to hold the values of the instance attributes.

However, as often with cool things, __slots__ also have downsides as explained in the documentation and on a Stack Overflow discussion. One of the most obvious disadvantage is that you can’t easily pickle objects with __slots___. Though it’s probably still possible by using custom __getstate__ and __setstate__ methods. Another one is you can’t add new attributes to instances of classes using __slots__.

At least for my case, the given limitations were acceptable and so as a temporary solution until the real memory hog problem in Suds is found and fixed, __slots__ work quite good.

(I will send patches to Suds once I really know it doesn’t break other things and when I find to do it.)

Update: I sent my changes to upstream together with some other improvements: https://fedorahosted.org/suds/ticket/445