Vector Search: Made for AI

Overview – 1

One of the underlying technologies for Generative AI is vector search. I was curious to understand it better, and so decided to write out a blog series aggregating inputs from multiple sources. I began by asking ChatGPT three questions: What is vector search? How is it different from regular search? Why is it such a crucial pillar for AI?

Vector search is a method of searching through a collection of items (such as documents, images, or other types of data) based on their vector representations. In this context, a vector is a numeric representation of an item, often derived through machine learning models that encode various features or attributes of the item.

How is it Different from Regular Search?

Representation and Similarity:

Regular Search: Typically relies on keywords, boolean operators, and string matching. It is often based on text-based queries and uses indexing methods like inverted indexes.
Vector Search: Uses continuous vector spaces where items are represented as high-dimensional vectors. It finds results based on the distance or similarity between vectors, usually measured by metrics like cosine similarity, Euclidean distance, or dot product.

Data Types:

Regular Search: Primarily deals with text data.
Vector Search: Can handle a variety of data types including text, images, audio, and more, as long as they can be converted into vector representations.

Query Handling:

Regular Search: Requires exact or partial keyword matches.
Vector Search: Handles more abstract queries and can find semantically similar results even if exact keywords are not matched.

Performance:

Regular Search: Efficient for large datasets with straightforward keyword queries.
Vector Search: More computationally intensive due to the high-dimensional calculations, but is optimized with techniques like Approximate Nearest Neighbor (ANN) search.

Why is it a Crucial Pillar for AI?

Enhanced Search Capabilities: Vector search allows for more intuitive and accurate retrieval of information by understanding the semantics and context behind the queries and the items in the database.
Handling Diverse Data: It can be applied to various data types, including unstructured data, which is common in real-world applications like image and audio search, recommendation systems, and natural language understanding.
Improved User Experience: Provides more relevant and context-aware results, enhancing the user experience in applications like chatbots, virtual assistants, and customer support systems.
Foundation for Advanced AI Applications: Essential for tasks like similarity search, clustering, classification, and anomaly detection. It underpins many AI-driven features in e-commerce, personalized content recommendations, and intelligent data retrieval systems.
Semantic Understanding: Enables AI models to understand and process the meaning and relationships within data, moving beyond simple keyword matching to deeper comprehension and analysis.

More from Netcore UNBXD: “Vector Search, a form of Semantic Search, identifies items similar to a given query by examining their similarity in vector space. Vector search represents each document or item as a vector in a multidimensional space, where each dimension corresponds to a feature or attribute of the item. The similarity between two items is calculated based on the distance between two vectors in vector space. A significant application of vector search is often found in information retrieval and natural language processing (NLP), where it can find similar documents or items, relevant keywords, or categorize documents. It is based on the idea that items that are similar in specific ways will tend to have vectors that are close together in vector space. Vector search has the advantage of handling large volumes of data and scaling well as the dataset size increases. In addition, it is relatively easy to implement and works with various types of data, such as text, images, and audio.”

Overview – 2

Wikipedia:

A vector database, vector store or vector search engine is a database that can store vectors (fixed-length lists of numbers) along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor (ANN) algorithms, so that one can search the database with a query vector to retrieve the closest matching database records.

Vectors are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from few hundreds to tens of thousands, depending on the complexity of the data being represented. A vector’s position in this space represents its characteristics. Words, phrases, or entire documents, and images, audio, and other types of data can all be vectorized.

These feature vectors may be computed from the raw data using machine learning methods such as feature extraction algorithms, word embeddings or deep learning networks. The goal is that semantically similar data items receive feature vectors that are close to each other.

Vector databases can be used for similarity search, multi-modal search, recommendations engines, large language models (LLMs), etc.

Vector databases are also often used to implement Retrieval-Augmented Generation (RAG), a method to improve domain-specific responses of large language models. The retrieval component of a RAG can be any search system, but is most often implemented as a vector database. Text documents describing the domain of interest are collected and for each document or document section a feature vector (known as an “embedding”) is computed, typically using a deep learning network, and stored in a vector database. Given a user prompt, the feature vector of the prompt is computed and the database is queried to retrieve the most relevant documents. These are then automatically added into the context window of the large language model and the large language model proceeds to create a response to the prompt given this context.

Harpreet Sahota: “Traditional databases are designed to work with discrete, categorical data, not continuous, numerical data like vectors. They’re great for storing and querying structured data. But they’re not optimized for searching, filtering, or ranking data based on complex, high-dimensional relationships. For example, suppose you wanted to find all songs in your music database that have a similar vibe to [a specific song]. With a traditional database, you’d have to search through discrete fields like genre, artist, etc. However, songs with similar vibes may span multiple genres and artists. Not to mention that you’re assuming that you can get to the vibe of a particular track based on discrete attributes of a song. Instead, represent each song as a high-dimensional vector capturing attributes like tempo, mood, lyrics, etc. You can find the most similar songs by looking for the nearest vectors in that space…Vector databases [are] specifically designed and built to store, search, and efficiently query this data type. They’re optimized for high-performance similarity searches, clustering, and other critical operations in recommendation systems, computer vision, and natural language processing applications.”

Devansh: “To use Vector DBs, we need the Vectors we will insert. We generate these Vectors by using AI to create vector embeddings for the data we want to index into our DB. The AI used is called our Embedding Model (EM). The vector embeddings are inserted into our vector database. Generally, you’d want to keep some reference to the original content the embedding was created from to make help your embeddings stand out and improve performance when we want to search through our DB. When our application queries, we use the same EM to create embeddings for the query and use those embeddings to query the database for similar vector embeddings. When it comes to Gen AI like ChatGPT, we tack on another layer to this- the model uses these similarity computations to compute the most likely next word.”

I asked Claude about vector databases and vector search.

Vector Database:

A vector database is a specialized database system designed to store and manage high-dimensional vectors efficiently.
It provides a structured way to store and organize vector representations of data, such as text embeddings, image embeddings, or user/item embeddings.
Vector databases are optimized for fast similarity search operations, allowing quick retrieval of similar vectors based on a query vector.
They often employ indexing techniques, such as approximate nearest neighbor (ANN) algorithms, to enable efficient search in high-dimensional spaces.
Examples of vector database systems include Faiss, Annoy, and Hnswlib.

Vector Search:

Vector search refers to the process of finding similar or relevant vectors in a vector space based on a given query vector.
It involves measuring the similarity or distance between vectors, typically using metrics like cosine similarity or Euclidean distance.
Vector search can be performed on vectors stored in a vector database or on vectors computed on-the-fly from input data.
The goal of vector search is to retrieve the most relevant or similar vectors to a given query, enabling tasks like semantic search, recommendation, or clustering.
Vector search can be applied to various domains, such as text retrieval, image search, or user-item matching in recommendation systems.

More from Claude: “Vector search has played a crucial role in driving the development and success of large language models (LLMs) in AI. By enabling efficient retrieval of relevant information from vast amounts of textual data, vector search has facilitated the training of LLMs on massive datasets. Through techniques like word embeddings and sentence embeddings, vector search allows LLMs to understand and represent the semantic meaning of words and phrases in a dense vector space. This semantic understanding enables LLMs to generate more coherent and contextually relevant text. Moreover, vector search has been instrumental in tasks such as question answering, document retrieval, and text summarization, which are key applications of LLMs. By quickly finding the most relevant information from large knowledge bases, vector search has enhanced the performance and efficiency of LLMs in these tasks. Overall, vector search has been a fundamental building block in the development of powerful and versatile LLMs in AI.”

Wall Street Journal: “Vector databases are finding a new use by enterprise and AI developers because they power the RAG technique. Originating from a 2020 paper by an AI research group at Meta Platforms, RAG is commonly used by enterprises to build chatbots for employees to reference company policies, or for customer service and salespeople to pull information from knowledge bases. Vector databases are different from traditional databases with columns and rows because they are designed to store a massive amount of data as “vectors,” or numerical representations of the raw data. That makes them ideal for RAG, the process where generative AI models pull from large amounts of vector data to improve their responses with the additional information.”

JP Morgan: “Vector search reigns in a new form of information retrieval. Traditional databases are optimized for storing data such as tables, documents, and key-value pairs. However, with advancements in AI and natural language processing, increasing quantities of semantic vector data have required new repositories optimized for this use case. Vectors allow for storing the intrinsic meaning of unstructured content, such as images, videos, and natural language in a machine-readable format. Vector databases refer to storage technology enabling efficient organization of semantic data (capturing meaning) in vector format, with vector search referring to the retrieval of this information… Vector databases played a crucial role within Gen AI application stacks, to enable enterprises to expose private, enterprise data at the prompt time to LLMs, thus providing broader context, driving up the accuracy of the generated answers, while maintaining enterprise data security and governance. As such, the use of vector databases within a Gen AI tech stack is fast becoming an efficient, cost-effective design pattern, for enterprises to take advantage of LLMs for customer-facing or internal use cases.”

Components

Netcore UNBXD lists out the components involved in vector search:

Vector Representation of Data: Data points are represented as high-dimensional vectors, where each dimension corresponds to a particular feature or attribute of the data.
Indexing and Storage: The vectors are typically indexed and stored in Vector databases. These databases allow for the efficient retrieval of relevant vectors. Many traditional databases, like Vespa by Yahoo, are built for this purpose, launched ages ago.
Neural Framework: Indexing and retrieving documents, made available as a software framework.
Similarity Measure: A similarity measure is used to compare the query vector to the stored vectors in order to identify the most similar vectors. There are many Distance calculation algorithms for efficiently checking the distance between the query and a set of document vectors. For, e.g., Nearest Neighbor Algorithm
Query processing: The query vector is processed using the same indexing and similarity calculation methods as the stored vectors in order to efficiently find the most similar vectors.
Ranking and filtering: The retrieved vectors are typically ranked based on their similarity to the query vector and may also be filtered based on various criteria such as relevance or quality.
User feedback: In some cases, user feedback may be incorporated into the algorithm in order to improve the search results over time.

I asked ChatGPT about the most important concept to understand vector search.

Vector Embeddings

Definition: Vector embeddings are dense, low-dimensional representations of data points (e.g., words, images) that capture their semantic meaning and relationships in a continuous vector space.
Importance: They form the foundation of vector search by converting complex data into a numerical format that can be efficiently processed and compared.
Example: Word embeddings like Word2Vec, where similar words have similar vector representations.

Distance Metrics

Definition: Distance metrics measure the similarity or dissimilarity between two vectors in the embedding space.
Common Metrics:
- Euclidean Distance: The straight-line distance between two points.
- Cosine Similarity: Measures the cosine of the angle between two vectors, indicating their directional similarity.
- Dot Product: The sum of the products of the corresponding entries of two sequences of numbers.
Importance: The choice of distance metric affects the accuracy and relevance of the search results.
Example: In a recommendation system, cosine similarity might be used to find products similar to a user’s past purchases.

High-Dimensional Space

Definition: A high-dimensional space is one with many dimensions (features), where each dimension represents an attribute or characteristic of the data.
Importance: Understanding the concept of high-dimensional spaces is crucial because vector embeddings often reside in these spaces, making traditional search techniques inefficient.
Example: An image might be represented in a 128-dimensional space after processing by a convolutional neural network (CNN).

Approximate Nearest Neighbor (ANN) Search

Definition: ANN algorithms find points in the vector space that are approximately nearest to a given query point, optimizing for speed and efficiency.
Importance: Exact nearest neighbor search can be computationally expensive in high-dimensional spaces. ANN techniques balance accuracy and performance, making real-time vector search feasible.
Common ANN Algorithms:
- KD-Trees: Suitable for low-dimensional data.
- LSH (Locality-Sensitive Hashing): Hashes input items so that similar items map to the same buckets with high probability.
- HNSW (Hierarchical Navigable Small World): Graph-based approach suitable for high-dimensional spaces.
Example: Using HNSW for fast image retrieval in large-scale image databases.

How It Works

I next asked ChatGPT to explain how it works with an example.

Scenario

Suppose you have a collection of images and you want to find images similar to a given query image.

Steps

Data Preparation: Each image in your collection is processed by a machine learning model (e.g., a convolutional neural network) to extract features. These features are represented as vectors in a high-dimensional space. For simplicity, let’s assume each image is represented as a vector of 3 dimensions: (x,y,z).

Query Vector: The query image is also processed by the same model to extract its feature vector. Suppose the query image is represented by the vector (3,2,5).

Vector Representation: Your image collection might have vectors like:

Image A: (1,0,4)
Image B: (3,3,6)
Image C: (4,1,2)
Image D: (2,2,5)

Similarity Measurement: To find similar images, you measure the distance between the query vector and each image vector in the collection. Common distance metrics include Euclidean distance, cosine similarity, etc. For this example, we’ll use Euclidean distance.

Find Nearest Neighbors: Based on the distances, the nearest neighbors to the query vector are:

Image D (1)
Image B (1.41)
Image A (3)
Image C (3.32)

Result: The images most similar to the query image are Image D and Image B.

Conclusion

In vector search, items (in this case, images) are represented as vectors in a high-dimensional space. The search involves finding items with vector representations closest to the query vector based on a distance metric. This method allows for more flexible and semantically meaningful searches compared to traditional keyword-based searches.

I also asked Claude for an explainer with an example.

Suppose we have a collection of three documents:

“The quick brown fox jumps over the lazy dog.”
“A quick brown fox is faster than a lazy dog.”
“The lazy dog sleeps all day long.”

Step 1: Vector Representation First, we need to represent each document as a vector in a high-dimensional space. One common approach is to use word embeddings, such as Word2Vec or GloVe, which map words to dense vectors that capture semantic relationships.

Let’s assume we have the following word vectors:

“quick”: [0.2, 0.5]
“brown”: [0.3, 0.1]
“fox”: [0.4, 0.2]
“lazy”: [-0.1, 0.6]
“dog”: [0.1, 0.3]

To represent each document as a vector, we can take the average of the word vectors in the document. This is a simplified approach, but it serves the purpose of this example.

Document 1: [0.26, 0.34] Document 2: [0.22, 0.36] Document 3: [0.0, 0.45]

Step 2: Indexing Next, we index the document vectors in a vector database or a suitable data structure that allows for efficient nearest neighbor search, such as Faiss or Annoy.

Step 3: Query Representation When a user enters a query, we need to represent the query as a vector in the same vector space as the document vectors. Let’s say the user queries: “fast brown fox”

We calculate the query vector by averaging the word vectors for “fast” (assumed to be [0.3, 0.4]), “brown”, and “fox”. Query vector: [0.33, 0.23]

Step 4: Nearest Neighbor Search Using the vector database or indexing structure, we perform a nearest neighbor search to find the documents that are most similar to the query vector. Similarity is typically measured using cosine similarity or Euclidean distance.

In this case, the nearest neighbor search would rank the documents as follows:

Document 2 (cosine similarity: 0.97)
Document 1 (cosine similarity: 0.95)
Document 3 (cosine similarity: 0.65)

Step 5: Result Retrieval Finally, the search results are returned to the user, ordered by their similarity scores. In this example, Document 2 would be considered the most relevant to the query, followed by Document 1 and then Document 3.

This example demonstrates how vector search captures the semantic meaning of the query and retrieves relevant documents based on their vector representations, even though the exact query terms may not appear in the documents.

Here is a pictorial representation from Elastic:

This is from Google Cloud:

Vs Traditional Search

Netcore UNBXD explains how it is different from traditional search:

BM25 (Best Match 25) is a traditional text retrieval function often used for information retrieval and natural language processing tasks. Traditional search uses boolean retrieval to match documents from the index. It is based on the assumption that relevance is proportional to the term frequency (i.e., the number of times a term appears in a document) and inverse document frequency (i.e., the rarity of a term across the entire corpus of documents). BM25 scores are then used to rank the documents in a corpus, with the highest-scoring documents considered the most relevant.

On the other hand, vector search is based on representing documents and queries as numerical vectors in a high-dimensional space. Trey Grainger has described the core principle at work as “a word is known by the company it keeps.”

Vector-based semantic search has the capability to search not just on the term but also take into consideration the context in which the term appears. The terms in the documents are encoded into n-dimensional vectors using a transformer-encoder and then indexed into a vector indexing database. At query time, the query terms also get encoded into vectors. A nearest neighbor search is performed between the query and document vectors to fetch the most relevant documents using a distance calculation metric such as cosine similarity. The documents with vectors closest to the query vector are returned. Vector search is typically more effective than BM25 when matching semantically similar phrases and dealing with synonyms and polysemy.

Here is a table from ChatGPT summarising the differences between traditional search and vector search:

Feature	Traditional Search	Vector Search
Method	Relies on keywords and string matching	Uses vector representations and similarity measurements
Techniques	Indexing methods like inverted indexes	Machine learning models to create embeddings and distance metrics
Data Types	Primarily text data	Diverse data types (text, images, audio)
Query Handling	Requires exact or partial keyword matches	Finds semantically similar results even without exact keyword matches
Performance	Efficient for straightforward text queries	More computationally intensive, optimized with ANN algorithms
Similarity Measurement	Exact or partial string matches	Distance/similarity metrics (e.g., cosine similarity, Euclidean distance)
Query Flexibility	Limited by exact keywords	Flexible, context-aware, and semantically rich
Example	Searching for documents containing “apple”	Finding images similar to a query image based on visual features

This is from Claude, comparing traditional search and vector search:

Feature	Traditional Search	Vector Search
Indexing	Inverted index based on keywords	High-dimensional vector representations of documents
Query Matching	Keyword matching using Boolean operators	Nearest neighbor search in vector space
Semantic Understanding	Limited, based on keyword presence	Captures semantic meaning and context
Relevance Scoring	Based on keyword frequency, proximity, etc.	Based on vector similarity (e.g., cosine similarity)
Scalability	Scales well with large datasets	Requires efficient indexing and search algorithms for high-dimensional vectors
Query Flexibility	Requires exact keyword matches	Allows for semantic searches and retrieves relevant results even without exact keyword matches
Language Dependence	Relies on language-specific preprocessing (e.g., stemming, stop-word removal)	Can be language-agnostic, depending on the vector representation method
Updateability	Relatively easy to update incrementally	May require re-indexing or incremental index updates
Explainability	Easy to understand why a document is retrieved	Less transparent, as relevance is based on abstract vector similarity
Typical Use Cases	Document retrieval, keyword-based search	Semantic search, recommendation systems, similar document retrieval

Applications

I asked ChatGPT to list the top business applications for vector search.

Recommendation Systems: Vector search can recommend products, movies, music, and other items based on user preferences and behavior. E-commerce platforms use vector embeddings to suggest products and content that are similar to what users have previously shown interest in.
Image and Video Retrieval: Vector search allows for finding visually similar images or videos within large databases. Image platforms use vector search to help users find images that resemble a given query image, based on visual features.
Natural Language Processing (NLP) Applications: Vector search enhances search engines, chatbots, and virtual assistants by understanding and processing natural language queries more effectively. Customer support systems use vector embeddings to retrieve relevant FAQs, documents, or previous support tickets that are semantically similar to a user’s query.
Fraud Detection and Anomaly Detection: Vector search helps identify unusual patterns or outliers in data, which is crucial for detecting fraudulent activities. Financial institutions and e-commerce platforms use vector embeddings to monitor transactions and detect anomalies that may indicate fraud.
Personalized Content Delivery: Vector search enables the delivery of personalized content to users based on their interests and behaviors. Social media platforms use vector search to recommend posts, articles, and advertisements tailored to individual user preferences.

Google Cloud lists some applications of vector search:

Finding similar users: If you define a vector to represent each user in your business by combining the user’s activities, past purchase history, and other user attributes, then you can find all users similar to a specified user. You can then see, for example, users who are purchasing similar products, users that are likely bots, or users who are potential premium customers and who should be targeted with digital marketing.
Finding similar products or items: With a vector generated from product features such as description, price, sales location, and so on, you can find similar products to answer any number of questions; for example, “What other products do we have that are similar to this one and may work for the same use case?” or “What products sold in the last 24 hours in this area?” (based on time and proximity)
Finding defective IoT devices: With a vector that captures the features of defective devices from their signals, vector search enables you to instantly find potentially defective devices for proactive maintenance.
Finding ads: Well-defined vectors let you find the most relevant or appropriate ads for viewers in milliseconds at high throughput.
Finding security threats: You can identify security threats by vectorizing the signatures of computer virus binaries or malicious attack behaviors against web services or network equipment.

I asked ChatGPT how vector search could be used for creating digital twins in B2C marketing.

Digital twins in the context of B2C marketing are virtual representations of customers that simulate their behavior, preferences, and interactions. Vector search can play a crucial role in creating and utilizing these digital twins by enabling more accurate and efficient data processing and analysis. Here’s how:

Customer Profiling and Segmentation:

Vector Embeddings: Customer data (e.g., purchase history, browsing behavior, demographics) is converted into vector embeddings that capture the multi-dimensional aspects of each customer.
Clustering and Segmentation: Vector search can group similar customers together by finding clusters in the embedding space, enabling more precise and meaningful customer segmentation.
Example: A retail company can segment customers based on their purchasing patterns and preferences, allowing for targeted marketing campaigns.

Personalized Recommendations:

Similarity Search: Vector search can find products, services, or content that are similar to what a customer (or their digital twin) has shown interest in.
Enhanced User Experience: By continuously updating the digital twin with real-time data, businesses can offer highly personalized recommendations that evolve with the customer’s changing preferences.
Example: An e-commerce platform uses vector search to recommend products similar to those a customer has previously viewed or purchased.

Predictive Analytics and Behavioral Insights:

Predicting Future Behavior: By analyzing the vectors representing a customer’s past behavior, vector search can help predict future actions and preferences.
Behavioral Analysis: Vector search allows for deep analysis of customer behavior patterns, helping businesses understand and anticipate customer needs.
Example: A streaming service uses vector embeddings to predict what genres or types of content a customer is likely to enjoy next, based on their viewing history.

Targeted Advertising:

Ad Personalization: Vector search enables the creation of highly targeted and personalized advertisements by matching customer vectors with relevant ad content vectors.
Improved ROI: More relevant and personalized ads result in higher engagement rates and better return on investment (ROI) for marketing campaigns.
Example: A social media platform uses vector search to serve ads that are closely aligned with the interests and behaviors of individual users.

Customer Journey Mapping:

Mapping Interactions: Vector embeddings can represent various touchpoints in a customer’s journey, helping to map and understand the entire customer experience.
Optimizing Engagement: By analyzing these vectors, businesses can identify key moments and interactions that drive customer satisfaction and loyalty.
Example: A travel company uses vector search to understand and optimize the customer journey from browsing for trips to post-travel feedback, enhancing the overall customer experience.

Limitations and Future

I asked both ChatGPT and Claude to discuss the limitations of vector search and how to address them.

ChatGPT

High Computational Complexity

Limitation: Calculating distances between high-dimensional vectors is computationally expensive, especially for large datasets.