IndexFlatL2、IndexIVFFlat都是将全部向量存储在内存中的
要扩展到海量的数据集上,Faiss提供了基于乘积量化的有损压缩方式来存储向量索引
向量仍然存储在Voronoi cells中,但是他们的尺寸被降低到一个可配置的字节数目。
1
2
3
4
5
6
7
8
9
10
11
12
13
14 nlist = 100
m = 8 # number of bytes per vector
k = 4
quantizer = faiss.IndexFlatL2(d) # this remains the same
index = faiss.IndexIVFPQ(quantizer, d, nlist, m, 8)
# 8 specifies that each sub-vector is encoded as 8 bits
index.train(xb)
index.add(xb)
D, I = index.search(xb[:5], k) # sanity check
print I
print D
index.nprobe = 10 # make comparable with experiment above
D, I = index.search(xq, k) # search
print I[-5:]