May 1, 2024

A Visual Guide to the Hierarchical Navigable Small World Algorithm

This post is part two in this series about similarity search. To read the first part of this series, check out A Visual Guide to Vector Search and K-Nearest Neighbors.

Error

This article is a work in progress and is not ready to read. The text, code examples, and the visualizations are works-in-progress and will be updated.

I don’t care, show me the article

We’ll dive into the inner workings of an approximate nearest neighbor (ANN) algorithm called Hierarchical Navigable Small Worlds (HNSW). HNSW is commonly used in vector databases to enhance the speed of similarity search. The article uses interactive visuals and demos to demonstrate the fundamental concepts of the HNSW algorithm and includes a live demo of HNSW itself.

Introduction to HNSW

Hierarchical NSW is one of several approximate nearest neighbor (ANN) algorithms developed to speed up similarity search. Initially proposed by Yu. A. Malkov and D. A. Yashunin in 2016, HNSW builds on the concept of two other algorithms:

Navigable Small World (NSW) graphs - a type of graph that allows for fast approximate nearest neighbor search
Skip Lists - a data structure allowing fast insertion and search via a multi-layer linked list structure.

Combining both benefits, HNSW generates a multi-layer NSW graph that enables a logarithmic time complexity for approximate nearest neighbor search. Its performance was significantly better compared to previous state-of-the-art ANN algorithms.

Mastering the concepts of skip lists and navigable small world data structures is a prerequisite for comprehending the inner workings of HNSW. This understanding will serve as a solid foundation for our exploration of HNSW.

Skip Lists

Skip Lists enhance singly linked lists by introducing multiple layers, achieving O(log n) average time complexity for searches and insertions. This multi-layered approach allows for efficient element “skipping” during searches. Unlike traditional linked lists where each node points to the next, Skip List nodes have multiple pointers for different layers, facilitating quicker navigation. The node’s height, indicating its layer stack, ranges from a dense, fully connected, bottom layer to a sparse top layer.

An example below demonstrates searching for a node valued 5 in a Skip List.

Node heights in a skip list are probabilistically assigned, with each additional layer usually having a halved chance of inclusion, with the probability p being set to a default value of 0.5. This means a node’s height is akin to flipping a coin: a head adds another layer, while a tail stops the process, making taller nodes less common as layers increase.

This results in a structure with multiple levels, with the top level containing the sparsest amount of nodes and the bottom level being the most dense. The lowest level contains all the elements like a regular linked list, except ordered. Inserts and searches on a skip list have an average time complexity of O(log n), compared to O(n) linear time for a linked list.

This unique structure enables the “skipping” of elements during search, which is the property that gives skip lists performance advantages over traditional linked lists and a similar performance profile to balanced trees.

Play with the interactive demo below to see how this skipping works in practice to speed up search:

The animation above shows a Skip List search for a node with value 5. The search starts at the top level, comparing the target value with adjacent nodes. If the target matches or is less than the next node, the search moves to that node. Otherwise, if the next node’s value is greater or it’s the end of the list, the search drops to the next lower level. This process repeats until the desired node is found at the lowest level.

This method is significantly faster than a standard linear search in a linked list. The Hierarchical Navigable Small World (HNSW) algorithm applies a similar concept to gain performance improvements during search and insertion, but using a graph instead of a linked list.

Navigable Small World (NSW) provides a method for constructing and traversing a graph that facilitates fast approximate nearest neighbor search. A greedy search of this graph has a polylogarithmic time complexity of O(logᵏn), which can be faster than the linear time complexity of a naive KNN search.

The NSW graph employs a strategy known as greedy routing during search and insertion. The search process in NSW commences by selecting an entry point.

Typically the entry point is the first node ever inserted. Early nodes typically remain a starting point because long-range edges are likely to be created during the initial graph construction phase. These edges play a vital role in graph navigation. They serve as bridges between network hubs, maintaining overall graph connectivity and enabling logarithmic scaling of the number of hops during greedy routing.

The algorithm then takes the set of all of the neighbors of the current node (a.k.a the neighborhood) plus the current node itself and computes the distances between the elements of this set with the query vector (the vector we are trying to find similar vectors to). If a node closer to the query vector is found among that set, that node becomes the current node, and the process continues at that node. The search procedure concludes when the algorithm cannot find a neighbor node closer to the query than the current node itself. This node is then returned as the response to the query.

However, this greedy strategy does not guarantee the discovery of the best nearest neighbor as it uses only local information at each step to make decisions. One of the properties of this algorithm is early stopping, which happens when an iteration finds no better neighbor nodes than the current one. This property is what gives NSW its improved performance compared to naive search, but runs the risk that the node returned is only a local minimum instead of the global minimum.

The insertion process mirrors the search process, tracing the same path it would have taken if it were a search query. It establishes connections to its K nearest vertices upon reaching a local minimum.

The demo below demonstrates this process. It begins running automatically, but you can pause it to manually add nodes and perform searches to better understand how the algorithm operates.

Building Graph: adding new nodes to the graph.

Using the demo, here’s an interesting challenge: see if you can generate an example of early stopping, where the node returned from the search is not actually the closet node but a local minima.

Hierarchical Navigable Small World (HNSW)

The HNSW is essentially a multi layer NSW. Similar to a skip list, whenever a node is added to the graph, it is assigned a height probabilisticly with decaying probability as the height of the node increases. This means that each node will not just have one set of neighbors, but actually a set of neighbors for each level they are on.

Simplisticly, during the search, the inital node is chosen by selecting the last inserted node with the tallest height. From that node, a greedy search is conducted, starting with that initial tallest node and at the top level. The greedy search is constrained at that layer until a local minimum/nearest neighbor is found for that specific layer. For all the layers above the bottom layer (index 0), only one node is selected.

Once a node is selected and there are no closer nodes at that layer, the search continues with that node, except at the next layer down. Once the bottom layer has been reached, we extend the search to return k nearest neighbors instead of just the single closest one.

This architecture and strategy brings down the complexity of the search and insert operations to O(log(n)) on average.

The below interactive demonstration of HNSW shows how search is conducted. Pay attention to how search is conducted at each layer, with a neighborhood being constructed to find closer nodes at the current layer. The query node is highlighted in yellow. The current node with be the one with the animated dark blue/red border. Neighborhood search of nodes will highlight neigbors with a light blue fill. The most similar nodes returned from the search will be highlighted yellow.

k = 2
Auto-running graph traversal

HNSW Insertion and Search Deep Dive

The Hierarchical Navigable Small World (HNSW) algorithm introduces key enhancements over the Navigable Small World (NSW) model, focusing on search efficiency and queue management. The notable improvements include:

Fixed Entry Point: HNSW employs a static entry point for searches, unlike NSW’s dynamic approach. This change streamlines the search process and ensures consistent search performance.
ef Parameter for Search Quality: HNSW uses the ef (effective search parameter) to fine-tune search quality, a departure from NSW’s reliance on the number of multi-searches. This allows for more precise control over search outcomes.

These advancements optimize search operations and contribute to HNSW’s superior performance in nearest neighbor searches.

HNSW further deviates from NSW with an enhances neighbor selection an heuristic. While HNSW can continue to use the simple neighor selection strategy of directly connecting to the closest elements (this is the strategy used in our prior NSW demonstration), the more sophisticated heuristic considers distances between candidates to ensure diverse directional connections. The heuristic method introduces two parameters: extendCandidates, which optionally expands the candidate set for densely clustered data, and keepPrunedConnections, allowing a fixed number of connections per element. The maximum connections per layer are governed by M max, with a distinct parameter M max0 for the ground layer. When a node’s connections exceed this limit, the algorithm prunes the list using the criteria from either selection method. In all scenarios, employing a heuristic for neighbor selection in the proximity graph (algorithm 4) consistently outperforms or matches the performance of naive nearest neighbor connections (algorithm 3). This advantage is most notable for low-dimensional data, high recall in mid-dimensional data, and highly clustered data. The Hierarchical NSW algorithm, when using heuristic-based connections, excels in clustered data scenarios by avoiding the pitfalls of search getting stuck at cluster boundaries.

The probabilistic function that determines a node’s height in the graph slightly deviates from the traditional skip list approach. The average number of layers for an element to be added in is a constant that depends on m_L:

E[l+1] = E[-\ln(\text{uniform}(0,1)) \cdot m_L]

Algorithm construction parameters such as (m_L) and (M*_max0) play crucial roles in maintaining the navigability of the small world graphs constructed. Setting (m_L) to zero, which corresponds to a single-layer graph, and (M*_max0) to (M), results in the creation of directed k-NN graphs known for their power-law search complexity, as previously studied. Conversely, setting (mL) to zero and (M_max0) to infinity yields NSW graphs characterized by polylogarithmic complexity. Introducing a non-zero value for (m*L) leads to the emergence of controllable hierarchy graphs, enabling logarithmic search complexity through the introduction of layers. To optimize the performance benefits of the controllable hierarchy, it’s essential to minimize the overlap between neighbors across different layers, which is achieved by reducing (m_L). However, decreasing (m_L) also increases the average number of hops required during a greedy search on each layer, adversely affecting performance. This trade-off suggests an optimal value exists for the (m_L) parameter. A simple yet effective choice for (m_L) is (1/\ln(M)), analogous to the skip list parameter (p = 1/M), ensuring minimal overlap between layers.

The remaining key construction parameter, (M), has an optimal range between 5 and 48, with the choice affecting both the algorithm’s memory consumption and its performance across different recalls and data dimensionalities. The selection of the (efConstruction) parameter should ensure a K-ANNS recall close to unity during the construction process, with 0.95 being sufficient for most applications. This parameter, as suggested, could potentially be auto-configured using sample data.

Graph Construction and Insertion

HNSW constructs its graph by assigning nodes to layers based on an exponentially decaying probability, starting the insertion with a greedy search to find the closest neighbors (ef closest) at the top layer. This process uses a dynamic candidate list, managed by two priority queues for efficiency, and the ef parameter to ensure search quality. As the search progresses to lower layers, the ef parameter increases to improve search recall, considering previously found neighbors for new connections. The algorithm balances connection diversity and proximity, maintaining a set number of connections per node, with excess connections pruned based on neighbor selection criteria. The insertion completes once connections at the base layer are established.

Search

The search algorithm mirrors the insertion logic but aims to return the closest neighbors. It similarly adjusts the ef parameter to modulate search quality, ensuring an efficient and effective neighbor search.