Python大规模导表内存优化方法

浅析__slot__作用

Posted by JohnWayne on March 2, 2022

动态语言可以在程序运行过程中动态给对象绑定新属性和新方法,但相比静态语言会消耗更多空间和性能,slot方法就是可以限时对象属性,从而优化空间的方法。在游戏行业的数据导表里就有应用。


先来看一下这个例子

限制只能用姓名和年龄两个属性,强行给对象加个身高的属性就会报错

class TestSlot(object):
    __slots__ = ('name', 'age')
    def __init__(self, name, age):
        self.name = name
        self.age = age

person = TestSlot('Tom', 19)
person.height = 180
    
    
>>>Traceback (most recent call last):
  File "/usercode/file.py", line 10, in <module>
    person.height = 180
AttributeError: 'TestSlot' object has no attribute 'height'

下面来看看内存空间上优化效果

这是一个没有__slot__的普通类

class Person(object):
    def __init__(self, name, age):
        self.name = name
        self.age = age


from guppy import hpy

test_memory = hpy()
test_memory.setrelheap()
person_1 = Person('Tom', 19)
print(test_memory.heap())

test_memory.setrelheap()
person_2 = TestSlot('Tom', 19)
print(test_memory.heap())

>>>
Partition of a set of 3 objects. Total size = 600 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  33      432  72       432  72 types.FrameType
     1      1  33      112  19       544  91 dict of __main__.Person
     2      1  33       56   9       600 100 __main__.Person
Partition of a set of 2 objects. Total size = 488 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  50      432  89       432  89 types.FrameType
     1      1  50       56  11       488 100 __main__.TestSlot

使用slot节省了600-488=112

下面的data和data_slot分别是普通导表和静态导表

test_memory = hpy()
test_memory.setrelheap()
data = {
	0: Person('a', 0),
	1: Person('b', 1),
	2: Person('c', 2),
	3: Person('d', 3),
	4: Person('e', 4),
}
print(test_memory.heap())

test_memory.setrelheap()
data_slot = {
	0: TestSlot('a', 0),
	1: TestSlot('b', 1),
	2: TestSlot('c', 2),
	3: TestSlot('d', 3),
	4: TestSlot('e', 4),
}
print(test_memory.heap())

>>>
Partition of a set of 12 objects. Total size = 1512 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      5  42      560  37       560  37 dict of __main__.Person
     1      1   8      432  29       992  66 types.FrameType
     2      5  42      280  19      1272  84 __main__.Person
     3      1   8      240  16      1512 100 dict (no owner)
Partition of a set of 7 objects. Total size = 952 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  14      432  45       432  45 types.FrameType
     1      5  71      280  29       712  75 __main__.TestSlot
     2      1  14      240  25       952 100 dict (no owner)

5个对象时使用slot后一共节省1512 - 952 = 560, 560 = 112 * 5

综上,大规模的时候静态限制对象会节省大量空间

最后再讲一下静态性节省内存的原理

声明为slots的成员变量,会在类对象的tp_dict中保存成员变量指针所对应的PyMemberDescrObject,读写这些成员变量,会调用descriptor所对应的PyMemberDescr_Type的tp_descr_get、tp_descr_set,会直接通过偏移量找到对应属性,没有查dict过程,所以会快些。而且,采用slots机制,类对象将不再构建__dict__,内存节省明显。

作者 [张巍]

2022 年 03月 02日