Created Searching with ANNOY index (markdown)

2021-10-18 17:26:54 +08:00 · 2021-10-18 17:26:54 +08:00 · 4a8ed18c29
parent 84ed1e2f36
commit 4a8ed18c29
1 changed files with 53 additions and 0 deletions
--- a/Searching-with-ANNOY-index.md
+++ b/Searching-with-ANNOY-index.md
@ -0,0 +1,53 @@
+### ANNOY
+
+<a name="ANNOY"></a>
+
+ANNOY (Approximate Nearest Neighbors Oh Yeah) is an index that uses a hyperplane to divide a high-dimensional space into multiple subspaces, and then stores them in a tree structure.
+
+When searching for vectors, ANNOY follows the tree structure to find subspaces closer to the target vector, and then compares all the vectors in these subspaces (The number of vectors being compared should not be less than `search_k`) to obtain the final result. Obviously, when the target vector is close to the edge of a certain subspace, sometimes it is necessary to greatly increase the number of searched subspaces to obtain a high recall rate. Therefore, ANNOY uses `n_trees` different methods to divide the whole space, and searches all the dividing methods simultaneously to reduce the probability that the target vector is always at the edge of the subspace.
+
+
+- Index building parameters
+
+  | Parameter | Description                              | Range     |
+  | --------- | ---------------------------------------- | --------- |
+  | `n_trees` | The number of methods of space division. | [1, 1024] |
+
+- Search parameters
+
+  | Parameter  | Description                                                  | Range                           |
+  | ---------- | ------------------------------------------------------------ | ------------------------------- |
+  | `search_k` | The number of nodes to search. -1 means 5% of the whole data. | {-1} ∪ [`top_k`, n × `n_trees`] |
+
+```python
+from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
+# create a collection
+collection_name = "milvus_ANNOY"
+default_fields = [
+    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
+    FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=d)
+]default_schema = CollectionSchema(fields=default_fields, description="test collection")print(f"\nCreate collection...")collection = Collection(name= collection_name, schema=default_schema)
+
+# insert data
+mr = collection.insert([xb])print(collection.num_entities)
+
+# create index
+collection.create_index(field_name=field_name,
+                        index_params={'index_type': 'ANNOY',
+                                      'metric_type': 'L2',
+                                      'params': {
+                                        "n_trees": 8      # int. 1~1024
+                                      }})
+collection.load()
+                                     
+# search 
+top_k = 10
+results = collection.search(data=xq, anns_field="vector",params={
+                "search_k": -1    # int. {-1} U [top_k, n*n_trees], n represents vectors count.
+              }, limit=top_k) # show results
+              
+for result in results:
+  print(result.ids)
+  print(result.distance)
+
+```