Skip to content

2024

Improve RAG with Contextual Retrieval

For chatbots to be effective in domains like legal advice or customer support, they require relevant background information. Retrieval-Augmented Generation (RAG) enhances responses by pulling information from knowledge bases and combining it with the user’s prompt, improving the model's output. However, traditional RAG often end up losing crucial context, leading to missing relevant information from the knowledge base. Some might suggest that writing very specific and lengthy context prompts could solve the issue of retrieving relevant information from a knowledge base. But this approach only works for smaller knowledge bases. You'll need a more efficient and scalable solution as your knowledge base expands.

Open In Colab

Contextual Retrieval

In traditional RAG, a basic chunking method creates vector embeddings for each chunk separately, and RAG systems use these embeddings to find chunks that match the query. However, this approach has a problem: it loses the context of the original document. In the past, several methods have been proposed to improve retrieval using context. These include adding generic document summaries to chunks, *Hypothetical Document Embedding (HyDE)***, and summary-based indexing**.

Contextual Embeddings address this issue by incorporating relevant context and prepending it into each chunk before creating embeddings. This approach enhances the quality of each embedded chunk, leading to more accurate retrieval and improved overall performance. On average, across all the data sources we tested, Contextual Embeddings reduced the failure rate for retrieving the top 20 chunks by 35%.Contextual Retrieval Processing

alt text

To understand it better with domain-specific examples

In the case of Legal Documents

Scenario: A user asks, "What was the outcome of the Johnson v. Smith case?"

Relevant Chunk: "The court ruled in favor of Johnson, awarding damages of $50,000."

Problem: Without additional context, it’s unclear which Johnson and Smith are being referenced, or the date of the case, making it difficult to retrieve or apply the information.

Solution: Contextual Retrieval can enhance the chunk by including key identifiers, such as “In the 2021 case of Johnson v. Smith in the New York District Court, the court ruled in favor of Johnson, awarding damages of $50,000.” This added context helps ensure accurate retrieval and interpretation.

HNSW: Vector search for High Dimensional datasets

Approximate Nearest Neighbor (ANN) search is a method for finding data points near a given point in a dataset, though not always the exact nearest one. HNSW is one of the most accurate and fastest Approximate Nearest Neighbour search algorithms, It’s beneficial in high-dimensional spaces where finding the same nearest neighbor would be too slow and costly.

There are three main types of ANN search algorithms:

  1. Tree-based search algorithms: Use a tree structure to organize and store data points.
  2. Hash-based search algorithms: Use a hash table to store and manage data points.
  3. Graph-based search algorithms: Use a graph structure to store data points, which can be a bit complex. HNSW is a graph-based algorithm, that will be broken down into smaller parts to make it easier to understand how it works.

All graph-based search algorithms rely on the idea of a proximity graph, where the graph is built based on the proximity of data points, measured by their Euclidean distances. Jumping straight to HNSW might be complicated, so first, we'll explain two important algorithms that help understand HNSW: the Skip List and Navigable Small World (NSW) Graphs, which are the predecessors of HNSW.

IVFPQ: Accelerate vector search by creating indices

Vector similarity search is finding similar vectors from a list of given vectors in a particular embedding space. It plays a vital role in various fields and applications because it efficiently retrieves relevant information from large datasets.

Vector similarity search requires excessive memory resources for efficient search, especially when dealing with dense vector datasets. Here comes the role of compressing High Dimensional vectors for optimizing memory storage. In this blog, We’ll discuss about

  1. Product Quantization(PQ) & How it works
  2. Inverted File Product Quantization(IVFPQ) Index
  3. Implementation of IVFPQ using LanceDB

We’ll also see the performance of PQ and IVFPQ in terms of memory and cover an implementation of the IVFPQ Index using LanceDB.

Quantization is a process used for dimensional reduction without losing important information.

Cambrian-1 is a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can boost multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research.

alt text

Cambrian-1 is built on five key pillars, each providing important insights into the design of multimodal LLMs (MLLMs):

  1. Visual Representations: They explore various vision encoders and their combinations.
  2. Connector Design: They design a new dynamic, spatially-aware connector that integrates visual features from several models with LLMs while reducing the number of tokens.
  3. Instruction Tuning Data: They curate high-quality visual instruction-tuning data from public sources, emphasizing distribution balancing.
  4. Instruction Tuning Recipes: They discuss strategies and best practices for instruction tuning.
  5. Benchmarking: They examine existing MLLM benchmarks and introduce a new vision-centric benchmark called "CV-Bench".

We'll learn how Cambrian-1 works with an example of Vision-Centric Exploration on images found through vector search. This will involve two steps. 1. Performing vector search to get related images 2. Use obtained images for vision-centric exploration