Table of Contents
This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.
ANNOY
ANNOY (Approximate Nearest Neighbors Oh Yeah) is an index that uses a hyperplane to divide a high-dimensional space into multiple subspaces, and then stores them in a tree structure.
When searching for vectors, ANNOY follows the tree structure to find subspaces closer to the target vector, and then compares all the vectors in these subspaces (The number of vectors being compared should not be less than search_k
) to obtain the final result. Obviously, when the target vector is close to the edge of a certain subspace, sometimes it is necessary to greatly increase the number of searched subspaces to obtain a high recall rate. Therefore, ANNOY uses n_trees
different methods to divide the whole space, and searches all the dividing methods simultaneously to reduce the probability that the target vector is always at the edge of the subspace.
-
Index building parameters
Parameter Description Range n_trees
The number of methods of space division. [1, 1024] -
Search parameters
Parameter Description Range search_k
The number of nodes to search. -1 means 5% of the whole data. {-1} ∪ [ top_k
, n ×n_trees
]
For example:
In python
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
# create a collection
collection_name = "milvus_ANNOY"
default_fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=d)
]default_schema = CollectionSchema(fields=default_fields, description="test collection")
print(f"\nCreate collection...")
collection = Collection(name= collection_name, schema=default_schema)
# insert data
import random
vectors = [[random.random() for _ in range(8)] for _ in range(10)]
entities = [vectors]
mr = collection.insert(entities)
print(collection.num_entities)
# create index
collection.create_index(field_name=field_name,
index_params={'index_type': 'ANNOY',
'metric_type': 'L2',
'params': {
"n_trees": 8 # int. 1~1024
}})
collection.load()
# search
top_k = 10
results = collection.search(vectors[:5], anns_field="vector",param={
"search_k": -1 # int. {-1} U [top_k, n*n_trees], n represents vectors count.
}, limit=top_k) # show results
for result in results:
print(result.ids)
print(result.distance)
Tutorial
Advanced Deployment
Deploy Milvus with External Components
Deploy a Milvus Cluster on EC2
Deploy a Milvus Cluster on EKS
Deploy a Milvus Cluster on GCP
Deploy Milvus on Azure with AKS
Upgrade Milvus with Helm Chart