Inverted Index
Definition
An inverted index is a data structure that maps content (such as keywords or terms) to their locations within a dataset, enabling fast lookup and filtering operations. In the context of Qdrant, the inverted index is used to optimize filtering capabilities by allowing efficient retrieval of vectors based on specific payload conditions, such as filtering by metadata or tags associated with the stored vectors.
This mechanism is particularly useful in hybrid search scenarios where sparse (keyword-based) filtering is combined with dense (vector-based) similarity searches.
Example in Qdrant
Imagine you are building a recommendation engine for an e-commerce platform. Each vector represents a product, and payloads (metadata) include fields such as category
, price
, and brand
.
Creating a Collection with an Inverted Index
POST /collections/products
{
"vectors": {
"size": 128,
"distance": "Cosine"
},
"payload_schema": {
"category": {
"type": "keyword",
"index": true
},
"price": {
"type": "integer",
"index": true
},
"brand": {
"type": "keyword",
"index": true
}
}
}
In this configuration:
- category
and brand
are indexed as keyword
, allowing filtering by exact matches.
- price
is indexed as integer
, enabling range queries (e.g., products priced between $10 and $50).
Query Example
To retrieve vectors for all products in the electronics
category with a price between $50 and $200:
POST /collections/products/points/search
{
"filter": {
"must": [
{ "key": "category", "match": { "value": "electronics" } },
{ "key": "price", "range": { "gte": 50, "lte": 200 } }
]
},
"vector": [0.1, 0.2, 0.3, ...],
"top": 10
}
Result
The inverted index ensures that the filter step is efficient, significantly reducing the search space before the similarity search is performed.
Why It Matters
An inverted index in Qdrant allows developers to create powerful, real-time search applications that combine metadata filtering and semantic similarity, optimizing both speed and relevance.
Tabular Example
Suppose we have the following dataset of products:
Product ID | Category | Brand | Price |
---|---|---|---|
1 | Electronics | Samsung | 150 |
2 | Electronics | Apple | 200 |
3 | Home Appliances | Samsung | 300 |
4 | Electronics | Sony | 100 |
5 | Furniture | IKEA | 250 |
Based on this data, an inverted index could look like this:
Key | Value |
---|---|
category:electronics |
Product IDs: [1, 2, 4] |
category:home appliances |
Product IDs: [3] |
category:furniture |
Product IDs: [5] |
brand:samsung |
Product IDs: [1, 3] |
brand:apple |
Product IDs: [2] |
brand:sony |
Product IDs: [4] |
brand:ikea |
Product IDs: [5] |
price_range:0-100 |
Product IDs: [] |
price_range:101-200 |
Product IDs: [1, 4] |
price_range:201-300 |
Product IDs: [2, 5] |
price_range:301-400 |
Product IDs: [3] |
Explanation:
- The inverted index maps keys (like
category:electronics
orbrand:samsung
) to a list of Product IDs. - It can also include derived keys, such as
price_range
, which groups prices into ranges.
This structure allows efficient filtering, as you can quickly retrieve all product IDs for a specific category, brand, or price range without scanning the entire dataset.