1
Overview – 1
One of the underlying technologies for Generative AI is vector search. I was curious to understand it better, and so decided to write out a blog series aggregating inputs from multiple sources. I began by asking ChatGPT three questions: What is vector search? How is it different from regular search? Why is it such a crucial pillar for AI?
Vector search is a method of searching through a collection of items (such as documents, images, or other types of data) based on their vector representations. In this context, a vector is a numeric representation of an item, often derived through machine learning models that encode various features or attributes of the item.
How is it Different from Regular Search?
- Representation and Similarity:
- Regular Search: Typically relies on keywords, boolean operators, and string matching. It is often based on text-based queries and uses indexing methods like inverted indexes.
- Vector Search: Uses continuous vector spaces where items are represented as high-dimensional vectors. It finds results based on the distance or similarity between vectors, usually measured by metrics like cosine similarity, Euclidean distance, or dot product.
- Data Types:
- Regular Search: Primarily deals with text data.
- Vector Search: Can handle a variety of data types including text, images, audio, and more, as long as they can be converted into vector representations.
- Query Handling:
- Regular Search: Requires exact or partial keyword matches.
- Vector Search: Handles more abstract queries and can find semantically similar results even if exact keywords are not matched.
- Performance:
- Regular Search: Efficient for large datasets with straightforward keyword queries.
- Vector Search: More computationally intensive due to the high-dimensional calculations, but is optimized with techniques like Approximate Nearest Neighbor (ANN) search.
Why is it a Crucial Pillar for AI?
- Enhanced Search Capabilities: Vector search allows for more intuitive and accurate retrieval of information by understanding the semantics and context behind the queries and the items in the database.
- Handling Diverse Data: It can be applied to various data types, including unstructured data, which is common in real-world applications like image and audio search, recommendation systems, and natural language understanding.
- Improved User Experience: Provides more relevant and context-aware results, enhancing the user experience in applications like chatbots, virtual assistants, and customer support systems.
- Foundation for Advanced AI Applications: Essential for tasks like similarity search, clustering, classification, and anomaly detection. It underpins many AI-driven features in e-commerce, personalized content recommendations, and intelligent data retrieval systems.
- Semantic Understanding: Enables AI models to understand and process the meaning and relationships within data, moving beyond simple keyword matching to deeper comprehension and analysis.
More from Netcore UNBXD: “Vector Search, a form of Semantic Search, identifies items similar to a given query by examining their similarity in vector space. Vector search represents each document or item as a vector in a multidimensional space, where each dimension corresponds to a feature or attribute of the item. The similarity between two items is calculated based on the distance between two vectors in vector space. A significant application of vector search is often found in information retrieval and natural language processing (NLP), where it can find similar documents or items, relevant keywords, or categorize documents. It is based on the idea that items that are similar in specific ways will tend to have vectors that are close together in vector space. Vector search has the advantage of handling large volumes of data and scaling well as the dataset size increases. In addition, it is relatively easy to implement and works with various types of data, such as text, images, and audio.”
2
Overview – 2
A vector database, vector store or vector search engine is a database that can store vectors (fixed-length lists of numbers) along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor (ANN) algorithms, so that one can search the database with a query vector to retrieve the closest matching database records.
Vectors are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from few hundreds to tens of thousands, depending on the complexity of the data being represented. A vector’s position in this space represents its characteristics. Words, phrases, or entire documents, and images, audio, and other types of data can all be vectorized.
These feature vectors may be computed from the raw data using machine learning methods such as feature extraction algorithms, word embeddings or deep learning networks. The goal is that semantically similar data items receive feature vectors that are close to each other.
Vector databases can be used for similarity search, multi-modal search, recommendations engines, large language models (LLMs), etc.
Vector databases are also often used to implement Retrieval-Augmented Generation (RAG), a method to improve domain-specific responses of large language models. The retrieval component of a RAG can be any search system, but is most often implemented as a vector database. Text documents describing the domain of interest are collected and for each document or document section a feature vector (known as an “embedding”) is computed, typically using a deep learning network, and stored in a vector database. Given a user prompt, the feature vector of the prompt is computed and the database is queried to retrieve the most relevant documents. These are then automatically added into the context window of the large language model and the large language model proceeds to create a response to the prompt given this context.
Harpreet Sahota: “Traditional databases are designed to work with discrete, categorical data, not continuous, numerical data like vectors. They’re great for storing and querying structured data. But they’re not optimized for searching, filtering, or ranking data based on complex, high-dimensional relationships. For example, suppose you wanted to find all songs in your music database that have a similar vibe to [a specific song]. With a traditional database, you’d have to search through discrete fields like genre, artist, etc. However, songs with similar vibes may span multiple genres and artists. Not to mention that you’re assuming that you can get to the vibe of a particular track based on discrete attributes of a song. Instead, represent each song as a high-dimensional vector capturing attributes like tempo, mood, lyrics, etc. You can find the most similar songs by looking for the nearest vectors in that space…Vector databases [are] specifically designed and built to store, search, and efficiently query this data type. They’re optimized for high-performance similarity searches, clustering, and other critical operations in recommendation systems, computer vision, and natural language processing applications.”
Devansh: “To use Vector DBs, we need the Vectors we will insert. We generate these Vectors by using AI to create vector embeddings for the data we want to index into our DB. The AI used is called our Embedding Model (EM). The vector embeddings are inserted into our vector database. Generally, you’d want to keep some reference to the original content the embedding was created from to make help your embeddings stand out and improve performance when we want to search through our DB. When our application queries, we use the same EM to create embeddings for the query and use those embeddings to query the database for similar vector embeddings. When it comes to Gen AI like ChatGPT, we tack on another layer to this- the model uses these similarity computations to compute the most likely next word.”
I asked Claude about vector databases and vector search.
Vector Database:
- A vector database is a specialized database system designed to store and manage high-dimensional vectors efficiently.
- It provides a structured way to store and organize vector representations of data, such as text embeddings, image embeddings, or user/item embeddings.
- Vector databases are optimized for fast similarity search operations, allowing quick retrieval of similar vectors based on a query vector.
- They often employ indexing techniques, such as approximate nearest neighbor (ANN) algorithms, to enable efficient search in high-dimensional spaces.
- Examples of vector database systems include Faiss, Annoy, and Hnswlib.
Vector Search:
- Vector search refers to the process of finding similar or relevant vectors in a vector space based on a given query vector.
- It involves measuring the similarity or distance between vectors, typically using metrics like cosine similarity or Euclidean distance.
- Vector search can be performed on vectors stored in a vector database or on vectors computed on-the-fly from input data.
- The goal of vector search is to retrieve the most relevant or similar vectors to a given query, enabling tasks like semantic search, recommendation, or clustering.
- Vector search can be applied to various domains, such as text retrieval, image search, or user-item matching in recommendation systems.
More from Claude: “Vector search has played a crucial role in driving the development and success of large language models (LLMs) in AI. By enabling efficient retrieval of relevant information from vast amounts of textual data, vector search has facilitated the training of LLMs on massive datasets. Through techniques like word embeddings and sentence embeddings, vector search allows LLMs to understand and represent the semantic meaning of words and phrases in a dense vector space. This semantic understanding enables LLMs to generate more coherent and contextually relevant text. Moreover, vector search has been instrumental in tasks such as question answering, document retrieval, and text summarization, which are key applications of LLMs. By quickly finding the most relevant information from large knowledge bases, vector search has enhanced the performance and efficiency of LLMs in these tasks. Overall, vector search has been a fundamental building block in the development of powerful and versatile LLMs in AI.”
Wall Street Journal: “Vector databases are finding a new use by enterprise and AI developers because they power the RAG technique. Originating from a 2020 paper by an AI research group at Meta Platforms, RAG is commonly used by enterprises to build chatbots for employees to reference company policies, or for customer service and salespeople to pull information from knowledge bases. Vector databases are different from traditional databases with columns and rows because they are designed to store a massive amount of data as “vectors,” or numerical representations of the raw data. That makes them ideal for RAG, the process where generative AI models pull from large amounts of vector data to improve their responses with the additional information.”
JP Morgan: “Vector search reigns in a new form of information retrieval. Traditional databases are optimized for storing data such as tables, documents, and key-value pairs. However, with advancements in AI and natural language processing, increasing quantities of semantic vector data have required new repositories optimized for this use case. Vectors allow for storing the intrinsic meaning of unstructured content, such as images, videos, and natural language in a machine-readable format. Vector databases refer to storage technology enabling efficient organization of semantic data (capturing meaning) in vector format, with vector search referring to the retrieval of this information… Vector databases played a crucial role within Gen AI application stacks, to enable enterprises to expose private, enterprise data at the prompt time to LLMs, thus providing broader context, driving up the accuracy of the generated answers, while maintaining enterprise data security and governance. As such, the use of vector databases within a Gen AI tech stack is fast becoming an efficient, cost-effective design pattern, for enterprises to take advantage of LLMs for customer-facing or internal use cases.”
3
Components
Netcore UNBXD lists out the components involved in vector search:
- Vector Representation of Data: Data points are represented as high-dimensional vectors, where each dimension corresponds to a particular feature or attribute of the data.
- Indexing and Storage: The vectors are typically indexed and stored in Vector databases. These databases allow for the efficient retrieval of relevant vectors. Many traditional databases, like Vespa by Yahoo, are built for this purpose, launched ages ago.
- Neural Framework: Indexing and retrieving documents, made available as a software framework.
- Similarity Measure: A similarity measure is used to compare the query vector to the stored vectors in order to identify the most similar vectors. There are many Distance calculation algorithms for efficiently checking the distance between the query and a set of document vectors. For, e.g., Nearest Neighbor Algorithm
- Query processing: The query vector is processed using the same indexing and similarity calculation methods as the stored vectors in order to efficiently find the most similar vectors.
- Ranking and filtering: The retrieved vectors are typically ranked based on their similarity to the query vector and may also be filtered based on various criteria such as relevance or quality.
- User feedback: In some cases, user feedback may be incorporated into the algorithm in order to improve the search results over time.
I asked ChatGPT about the most important concept to understand vector search.
- Vector Embeddings
- Definition: Vector embeddings are dense, low-dimensional representations of data points (e.g., words, images) that capture their semantic meaning and relationships in a continuous vector space.
- Importance: They form the foundation of vector search by converting complex data into a numerical format that can be efficiently processed and compared.
- Example: Word embeddings like Word2Vec, where similar words have similar vector representations.
- Distance Metrics
- Definition: Distance metrics measure the similarity or dissimilarity between two vectors in the embedding space.
- Common Metrics:
- Euclidean Distance: The straight-line distance between two points.
- Cosine Similarity: Measures the cosine of the angle between two vectors, indicating their directional similarity.
- Dot Product: The sum of the products of the corresponding entries of two sequences of numbers.
- Importance: The choice of distance metric affects the accuracy and relevance of the search results.
- Example: In a recommendation system, cosine similarity might be used to find products similar to a user’s past purchases.
- High-Dimensional Space
- Definition: A high-dimensional space is one with many dimensions (features), where each dimension represents an attribute or characteristic of the data.
- Importance: Understanding the concept of high-dimensional spaces is crucial because vector embeddings often reside in these spaces, making traditional search techniques inefficient.
- Example: An image might be represented in a 128-dimensional space after processing by a convolutional neural network (CNN).
- Approximate Nearest Neighbor (ANN) Search
- Definition: ANN algorithms find points in the vector space that are approximately nearest to a given query point, optimizing for speed and efficiency.
- Importance: Exact nearest neighbor search can be computationally expensive in high-dimensional spaces. ANN techniques balance accuracy and performance, making real-time vector search feasible.
- Common ANN Algorithms:
- KD-Trees: Suitable for low-dimensional data.
- LSH (Locality-Sensitive Hashing): Hashes input items so that similar items map to the same buckets with high probability.
- HNSW (Hierarchical Navigable Small World): Graph-based approach suitable for high-dimensional spaces.
- Example: Using HNSW for fast image retrieval in large-scale image databases.
4
How It Works
I next asked ChatGPT to explain how it works with an example.
Scenario
Suppose you have a collection of images and you want to find images similar to a given query image.
Steps
Data Preparation: Each image in your collection is processed by a machine learning model (e.g., a convolutional neural network) to extract features. These features are represented as vectors in a high-dimensional space. For simplicity, let’s assume each image is represented as a vector of 3 dimensions: (x,y,z).
Query Vector: The query image is also processed by the same model to extract its feature vector. Suppose the query image is represented by the vector (3,2,5).
Vector Representation: Your image collection might have vectors like:
- Image A: (1,0,4)
- Image B: (3,3,6)
- Image C: (4,1,2)
- Image D: (2,2,5)
Similarity Measurement: To find similar images, you measure the distance between the query vector and each image vector in the collection. Common distance metrics include Euclidean distance, cosine similarity, etc. For this example, we’ll use Euclidean distance.

Find Nearest Neighbors: Based on the distances, the nearest neighbors to the query vector are:
- Image D (1)
- Image B (1.41)
- Image A (3)
- Image C (3.32)
Result: The images most similar to the query image are Image D and Image B.
Conclusion
In vector search, items (in this case, images) are represented as vectors in a high-dimensional space. The search involves finding items with vector representations closest to the query vector based on a distance metric. This method allows for more flexible and semantically meaningful searches compared to traditional keyword-based searches.
I also asked Claude for an explainer with an example.
Suppose we have a collection of three documents:
- “The quick brown fox jumps over the lazy dog.”
- “A quick brown fox is faster than a lazy dog.”
- “The lazy dog sleeps all day long.”
Step 1: Vector Representation First, we need to represent each document as a vector in a high-dimensional space. One common approach is to use word embeddings, such as Word2Vec or GloVe, which map words to dense vectors that capture semantic relationships.
Let’s assume we have the following word vectors:
- “quick”: [0.2, 0.5]
- “brown”: [0.3, 0.1]
- “fox”: [0.4, 0.2]
- “lazy”: [-0.1, 0.6]
- “dog”: [0.1, 0.3]
To represent each document as a vector, we can take the average of the word vectors in the document. This is a simplified approach, but it serves the purpose of this example.
Document 1: [0.26, 0.34] Document 2: [0.22, 0.36] Document 3: [0.0, 0.45]
Step 2: Indexing Next, we index the document vectors in a vector database or a suitable data structure that allows for efficient nearest neighbor search, such as Faiss or Annoy.
Step 3: Query Representation When a user enters a query, we need to represent the query as a vector in the same vector space as the document vectors. Let’s say the user queries: “fast brown fox”
We calculate the query vector by averaging the word vectors for “fast” (assumed to be [0.3, 0.4]), “brown”, and “fox”. Query vector: [0.33, 0.23]
Step 4: Nearest Neighbor Search Using the vector database or indexing structure, we perform a nearest neighbor search to find the documents that are most similar to the query vector. Similarity is typically measured using cosine similarity or Euclidean distance.
In this case, the nearest neighbor search would rank the documents as follows:
- Document 2 (cosine similarity: 0.97)
- Document 1 (cosine similarity: 0.95)
- Document 3 (cosine similarity: 0.65)
Step 5: Result Retrieval Finally, the search results are returned to the user, ordered by their similarity scores. In this example, Document 2 would be considered the most relevant to the query, followed by Document 1 and then Document 3.
This example demonstrates how vector search captures the semantic meaning of the query and retrieves relevant documents based on their vector representations, even though the exact query terms may not appear in the documents.
Here is a pictorial representation from Elastic:

This is from Google Cloud:

5
Vs Traditional Search
Netcore UNBXD explains how it is different from traditional search:
BM25 (Best Match 25) is a traditional text retrieval function often used for information retrieval and natural language processing tasks. Traditional search uses boolean retrieval to match documents from the index. It is based on the assumption that relevance is proportional to the term frequency (i.e., the number of times a term appears in a document) and inverse document frequency (i.e., the rarity of a term across the entire corpus of documents). BM25 scores are then used to rank the documents in a corpus, with the highest-scoring documents considered the most relevant.
On the other hand, vector search is based on representing documents and queries as numerical vectors in a high-dimensional space. Trey Grainger has described the core principle at work as “a word is known by the company it keeps.”
Vector-based semantic search has the capability to search not just on the term but also take into consideration the context in which the term appears. The terms in the documents are encoded into n-dimensional vectors using a transformer-encoder and then indexed into a vector indexing database. At query time, the query terms also get encoded into vectors. A nearest neighbor search is performed between the query and document vectors to fetch the most relevant documents using a distance calculation metric such as cosine similarity. The documents with vectors closest to the query vector are returned. Vector search is typically more effective than BM25 when matching semantically similar phrases and dealing with synonyms and polysemy.
Here is a table from ChatGPT summarising the differences between traditional search and vector search:
| Feature | Traditional Search | Vector Search |
| Method | Relies on keywords and string matching | Uses vector representations and similarity measurements |
| Techniques | Indexing methods like inverted indexes | Machine learning models to create embeddings and distance metrics |
| Data Types | Primarily text data | Diverse data types (text, images, audio) |
| Query Handling | Requires exact or partial keyword matches | Finds semantically similar results even without exact keyword matches |
| Performance | Efficient for straightforward text queries | More computationally intensive, optimized with ANN algorithms |
| Similarity Measurement | Exact or partial string matches | Distance/similarity metrics (e.g., cosine similarity, Euclidean distance) |
| Query Flexibility | Limited by exact keywords | Flexible, context-aware, and semantically rich |
| Example | Searching for documents containing “apple” | Finding images similar to a query image based on visual features |
This is from Claude, comparing traditional search and vector search:
| Feature | Traditional Search | Vector Search |
| Indexing | Inverted index based on keywords | High-dimensional vector representations of documents |
| Query Matching | Keyword matching using Boolean operators | Nearest neighbor search in vector space |
| Semantic Understanding | Limited, based on keyword presence | Captures semantic meaning and context |
| Relevance Scoring | Based on keyword frequency, proximity, etc. | Based on vector similarity (e.g., cosine similarity) |
| Scalability | Scales well with large datasets | Requires efficient indexing and search algorithms for high-dimensional vectors |
| Query Flexibility | Requires exact keyword matches | Allows for semantic searches and retrieves relevant results even without exact keyword matches |
| Language Dependence | Relies on language-specific preprocessing (e.g., stemming, stop-word removal) | Can be language-agnostic, depending on the vector representation method |
| Updateability | Relatively easy to update incrementally | May require re-indexing or incremental index updates |
| Explainability | Easy to understand why a document is retrieved | Less transparent, as relevance is based on abstract vector similarity |
| Typical Use Cases | Document retrieval, keyword-based search | Semantic search, recommendation systems, similar document retrieval |
6
Applications
I asked ChatGPT to list the top business applications for vector search.
- Recommendation Systems: Vector search can recommend products, movies, music, and other items based on user preferences and behavior. E-commerce platforms use vector embeddings to suggest products and content that are similar to what users have previously shown interest in.
- Image and Video Retrieval: Vector search allows for finding visually similar images or videos within large databases. Image platforms use vector search to help users find images that resemble a given query image, based on visual features.
- Natural Language Processing (NLP) Applications: Vector search enhances search engines, chatbots, and virtual assistants by understanding and processing natural language queries more effectively. Customer support systems use vector embeddings to retrieve relevant FAQs, documents, or previous support tickets that are semantically similar to a user’s query.
- Fraud Detection and Anomaly Detection: Vector search helps identify unusual patterns or outliers in data, which is crucial for detecting fraudulent activities. Financial institutions and e-commerce platforms use vector embeddings to monitor transactions and detect anomalies that may indicate fraud.
- Personalized Content Delivery: Vector search enables the delivery of personalized content to users based on their interests and behaviors. Social media platforms use vector search to recommend posts, articles, and advertisements tailored to individual user preferences.
Google Cloud lists some applications of vector search:
- Finding similar users: If you define a vector to represent each user in your business by combining the user’s activities, past purchase history, and other user attributes, then you can find all users similar to a specified user. You can then see, for example, users who are purchasing similar products, users that are likely bots, or users who are potential premium customers and who should be targeted with digital marketing.
- Finding similar products or items: With a vector generated from product features such as description, price, sales location, and so on, you can find similar products to answer any number of questions; for example, “What other products do we have that are similar to this one and may work for the same use case?” or “What products sold in the last 24 hours in this area?” (based on time and proximity)
- Finding defective IoT devices: With a vector that captures the features of defective devices from their signals, vector search enables you to instantly find potentially defective devices for proactive maintenance.
- Finding ads: Well-defined vectors let you find the most relevant or appropriate ads for viewers in milliseconds at high throughput.
- Finding security threats: You can identify security threats by vectorizing the signatures of computer virus binaries or malicious attack behaviors against web services or network equipment.
I asked ChatGPT how vector search could be used for creating digital twins in B2C marketing.
Digital twins in the context of B2C marketing are virtual representations of customers that simulate their behavior, preferences, and interactions. Vector search can play a crucial role in creating and utilizing these digital twins by enabling more accurate and efficient data processing and analysis. Here’s how:
- Customer Profiling and Segmentation:
- Vector Embeddings: Customer data (e.g., purchase history, browsing behavior, demographics) is converted into vector embeddings that capture the multi-dimensional aspects of each customer.
- Clustering and Segmentation: Vector search can group similar customers together by finding clusters in the embedding space, enabling more precise and meaningful customer segmentation.
- Example: A retail company can segment customers based on their purchasing patterns and preferences, allowing for targeted marketing campaigns.
- Personalized Recommendations:
- Similarity Search: Vector search can find products, services, or content that are similar to what a customer (or their digital twin) has shown interest in.
- Enhanced User Experience: By continuously updating the digital twin with real-time data, businesses can offer highly personalized recommendations that evolve with the customer’s changing preferences.
- Example: An e-commerce platform uses vector search to recommend products similar to those a customer has previously viewed or purchased.
- Predictive Analytics and Behavioral Insights:
- Predicting Future Behavior: By analyzing the vectors representing a customer’s past behavior, vector search can help predict future actions and preferences.
- Behavioral Analysis: Vector search allows for deep analysis of customer behavior patterns, helping businesses understand and anticipate customer needs.
- Example: A streaming service uses vector embeddings to predict what genres or types of content a customer is likely to enjoy next, based on their viewing history.
- Targeted Advertising:
- Ad Personalization: Vector search enables the creation of highly targeted and personalized advertisements by matching customer vectors with relevant ad content vectors.
- Improved ROI: More relevant and personalized ads result in higher engagement rates and better return on investment (ROI) for marketing campaigns.
- Example: A social media platform uses vector search to serve ads that are closely aligned with the interests and behaviors of individual users.
- Customer Journey Mapping:
- Mapping Interactions: Vector embeddings can represent various touchpoints in a customer’s journey, helping to map and understand the entire customer experience.
- Optimizing Engagement: By analyzing these vectors, businesses can identify key moments and interactions that drive customer satisfaction and loyalty.
- Example: A travel company uses vector search to understand and optimize the customer journey from browsing for trips to post-travel feedback, enhancing the overall customer experience.
7
Limitations and Future
I asked both ChatGPT and Claude to discuss the limitations of vector search and how to address them.
ChatGPT
- High Computational Complexity
Limitation: Calculating distances between high-dimensional vectors is computationally expensive, especially for large datasets.
Addressing the Limitation:
- Approximate Nearest Neighbor (ANN) Algorithms: Use ANN algorithms like Locality-Sensitive Hashing (LSH), KD-Trees, or Hierarchical Navigable Small World (HNSW) to speed up searches.
- Dimensionality Reduction: Apply techniques like Principal Component Analysis (PCA) or t-Distributed Stochastic Neighbor Embedding (t-SNE) to reduce the dimensionality of vectors while preserving their structure.
- Efficient Indexing: Implement specialized indexing structures, such as FAISS (Facebook AI Similarity Search), that are optimized for vector search.
- Scalability Issues
Limitation: As the number of data points increases, the storage and retrieval operations become more challenging.
Addressing the Limitation:
- Distributed Systems: Use distributed computing frameworks like Apache Spark or distributed databases like Elasticsearch to handle large-scale data.
- Cloud Solutions: Leverage cloud services that offer scalable vector search solutions, such as AWS SageMaker or Google Vertex AI.
- Sharding: Divide the dataset into smaller, manageable shards and distribute the search load across multiple servers.
- Interpretability
Limitation: The high-dimensional nature of vector embeddings can make it difficult to interpret the results and understand why certain items are considered similar.
Addressing the Limitation:
- Visualization Tools: Use visualization tools like t-SNE plots or UMAP (Uniform Manifold Approximation and Projection) to project high-dimensional data into 2D or 3D space, making it easier to understand.
- Explainability Techniques: Implement techniques such as attention mechanisms or SHAP (SHapley Additive exPlanations) values to provide insights into why certain vectors are similar.
- Quality of Embeddings
Limitation: The effectiveness of vector search is highly dependent on the quality of the embeddings. Poor embeddings can lead to irrelevant or inaccurate search results.
Addressing the Limitation:
- Pre-trained Models: Utilize high-quality pre-trained models (e.g., BERT for text, ResNet for images) that have been trained on large, diverse datasets.
- Fine-Tuning: Fine-tune pre-trained models on domain-specific data to improve the relevance of the embeddings.
- Regular Updates: Continuously update and refine the embedding models to ensure they adapt to new data and changing patterns.
- Handling Diverse Data Types
Limitation: Different data types (text, images, audio) require different techniques for creating embeddings, which can complicate the implementation.
Addressing the Limitation:
- Unified Frameworks: Use frameworks that support multiple data types and provide seamless integration, such as TensorFlow or PyTorch.
- Custom Pipelines: Develop custom pipelines that preprocess and embed different data types consistently, ensuring that the embeddings are comparable.
- Cross-Modal Embeddings: Research and implement cross-modal embeddings that allow for the integration of multiple data types into a single embedding space.
While vector search offers significant advantages for various applications, it does come with challenges related to computational complexity, scalability, interpretability, embedding quality, and handling diverse data types. By employing strategies such as ANN algorithms, dimensionality reduction, distributed systems, visualization tools, fine-tuning, and using unified frameworks, these limitations can be effectively addressed to harness the full potential of vector search.
Claude
While vector search offers numerous benefits, it also has some limitations. Here are the main limitations and how they can be addressed:
Computational Cost and Scalability
- Limitation: Vector search can be computationally expensive, especially for high-dimensional vectors and large datasets. Nearest neighbor search becomes slower as the dataset grows.
- Addressing the limitation:
- Use approximate nearest neighbor (ANN) algorithms, such as locality-sensitive hashing (LSH) or hierarchical navigable small world (HNSW), to trade off some accuracy for faster search speeds.
- Employ dimensionality reduction techniques, like principal component analysis (PCA) or t-SNE, to reduce the vector dimensionality while preserving important information.
- Utilize distributed computing frameworks and parallel processing to scale vector search across multiple machines or clusters.
Cold-Start Problem
- Limitation: Vector search relies on having sufficient data to learn meaningful vector representations. For new or rarely occurring items, there may be limited data available, leading to suboptimal results.
- Addressing the limitation:
- Implement hybrid approaches that combine vector search with other techniques, such as collaborative filtering or content-based filtering, to handle cold-start scenarios.
- Utilize transfer learning techniques to leverage knowledge from related domains or pre-trained models to initialize vector representations for new items.
- Employ active learning strategies to selectively acquire additional data or user feedback for new items to improve their vector representations over time.
Interpretability and Explainability
- Limitation: Vector representations can be abstract and difficult to interpret, making it challenging to explain why certain results are retrieved or how the search algorithm arrived at a particular ranking.
- Addressing the limitation:
- Use techniques like vector visualization or dimensionality reduction to provide visual explanations of the vector space and the relationships between items.
- Employ post-hoc explanation methods, such as local interpretable model-agnostic explanations (LIME) or Shapley values, to provide instance-level explanations for search results.
- Combine vector search with rule-based or knowledge-based approaches to incorporate explicit domain knowledge and improve interpretability.
Semantic Drift and Concept Shift
- Limitation: Vector representations learned from historical data may become outdated or fail to capture evolving semantic meanings or shifting concepts over time.
- Addressing the limitation:
- Regularly update and retrain vector models with new data to capture the latest semantic trends and user preferences.
- Employ incremental learning techniques to adapt vector representations in real-time based on user feedback and interactions.
- Utilize temporal models or time-aware vector representations to capture the temporal dynamics and evolving nature of data.
Evaluation and Validation
- Limitation: Evaluating the quality and effectiveness of vector search can be challenging, as relevance judgments may be subjective and context-dependent.
- Addressing the limitation:
- Establish well-defined evaluation metrics and benchmarks specific to the application domain and user requirements.
- Conduct user studies and gather feedback to assess the perceived relevance and user satisfaction with search results.
- Employ A/B testing and online evaluation techniques to compare different vector search approaches and configurations in real-world settings.
By addressing these limitations through a combination of algorithmic improvements, hybrid approaches, interpretability techniques, regular updates, and robust evaluation methodologies, the effectiveness and applicability of vector search can be enhanced across various domains and use cases.
8
Future
A possible future is hybrid search. From Claude: “Hybrid search, which combines keyword-based and vector-based approaches, is likely to play a significant role in the future of information retrieval and search systems. By leveraging the strengths of both approaches, hybrid search aims to deliver more relevant and meaningful results to users. It offers flexibility in handling different types of queries and can adapt to various search scenarios. Hybrid search can be designed to scale efficiently for large datasets and real-time requirements, incorporating advanced indexing techniques and caching mechanisms. Moreover, it can benefit from ongoing advancements in natural language processing, machine learning, and deep learning to continuously improve its understanding of language and retrieval accuracy. While the specific implementation may vary depending on the domain and use case, hybrid search offers a powerful and flexible solution to address the limitations of individual approaches and deliver more effective and user-centric search experiences in the future.”
From JP Morgan: “(1) Vector databases are seen as a key part of the Gen AI stack. RAG is one of the most popular use cases currently for vector search, with many players predicting that rather than fine-tuning off the shelf LLMs with company data, RAG will be used to provide context to the LLM instead as a cost-effective, more accurate, and compliant solution. 2) Hybrid Search is a Growing Trend. There is consensus that the market will head towards hybrid search, combining vector and lexical search under one roof…Hybrid search players will likely win out over pure-plays as users will only have to maintain a single infrastructure for both search features.”
Jo Kristian Bergum: “A successful search implementation uses hybrid retrieval techniques, combining the best of both types of representations; sparse and dense vectors. The hybrid model is demonstrably better than the sum of its parts, especially when applied to new domains without lots of interaction data to train vector embedding models that map data to vectors… The critical observation is that search implementations must support exact matches and richer ranking than vector similarity alone. Given this, I believe integrating excellent dense vector search capabilities into feature-rich search engine technologies is the right direction.”
**
I asked ChatGPT and Claude for conclusions for the essay I wrote.
ChatGPT: “Vector search represents a significant advancement in information retrieval, offering more flexible and semantically meaningful searches compared to traditional keyword-based methods. Its ability to handle diverse data types, understand context, and provide personalised results makes it a crucial component in modern AI applications. However, challenges such as computational complexity, scalability, and interpretability must be addressed through continuous research and innovation. As hybrid search systems, which combine the strengths of both vector and traditional search, gain traction, the future of information retrieval looks poised for further enhancements in efficiency and accuracy, ultimately leading to improved user experiences across various domains.”
Claude: “In conclusion, vector search has emerged as a transformative technology that is reshaping the landscape of information retrieval and powering the advancements in AI. By enabling efficient and semantically meaningful search through high-dimensional vector representations, vector search has unlocked new possibilities for various business applications, from personalized recommendations to fraud detection. As the demand for intelligent and context-aware systems grows, the future lies in hybrid search approaches that combine the strengths of both keyword-based and vector-based methods. With ongoing research and innovation, vector search is poised to become an indispensable tool in the arsenal of businesses striving to deliver exceptional user experiences and drive success in the era of AI.”
**
As I did the research for this essay, I was reminded of the 1990s when I was building IndiaWorld. I was fascinated by search, information retrieval, and auto-summarisation. It was in the context of Khoj.com (a snapshot from late 1997), which was India’s first search engine (modeled on Yahoo). Of course, it was all too early. The past 25+ years have seen technology come a long way. I relied on ChatGPT and Claude (along with Google Search) to write this essay. Together, they highlight the best of keyword, vector, and hybrid search.
Going forward, I can visualise immense applications of vector search in Agentic AI-powered martech to bring the new world of Large Customer Models, Digital Twins, Co-Marketer, Mirror World, and Generative Journeys to life to help bring to life N=1 hyper-personalisation for maximising every customer’s lifetime value in the Profipoly Quest. More on this in coming essays.