HNSW: Vector search for High Dimensional datasets
Approximate Nearest Neighbor (ANN) search is a method for finding data points near a given point in a dataset, though not always the exact nearest one. HNSW is one of the most accurate and fastest Approximate Nearest Neighbour search algorithms, It’s beneficial in high-dimensional spaces where finding the same nearest neighbor would be too slow and costly.
There are three main types of ANN search algorithms:
- Tree-based search algorithms: Use a tree structure to organize and store data points.
- Hash-based search algorithms: Use a hash table to store and manage data points.
- Graph-based search algorithms: Use a graph structure to store data points, which can be a bit complex. HNSW is a graph-based algorithm, that will be broken down into smaller parts to make it easier to understand how it works.
All graph-based search algorithms rely on the idea of a proximity graph, where the graph is built based on the proximity of data points, measured by their Euclidean distances. Jumping straight to HNSW might be complicated, so first, we'll explain two important algorithms that help understand HNSW: the Skip List and Navigable Small World (NSW) Graphs, which are the predecessors of HNSW.