vector db

Consider a query q, and suppose the algorithm outputs a set X of k candidate near neighbors, and suppose G is the ground-truth set of the k closest neighbors to q from among the points of the base dataset. Then, we define the k-recall@k of this set X to be |X∩G| / k.

The goal of an ANN algorithm then is to maximize recall while retrieving the results as quickly as possible, which results in the recall-vs-latency tradeoff.

Vector databases

Graph Index

Quantization

Misc