5 Hybrid Search
C-alcium edited this page 2023-10-10 15:09:26 +01:00

Hybrid Search

In addition to vectors, Milvus supports data types such as boolean, integers, floating-point numbers, and more. A collection in Milvus can hold multiple fields for accommodating different data features or properties. Milvus is a flexible vector database that pairs scalar filtering with powerful vector similarity search.

A hybrid search is a vector similarity search, during which you can filter the scalar data by specifying a boolean expression.

For example:

In Python

import random
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType

# Connect to server
connections.connect("default", host='localhost', port='19530')

# Create a collection
collection_name = "test_collection_search"
schema = CollectionSchema([
    FieldSchema("film_id", DataType.INT64, is_primary=True),
    FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
])
collection = Collection(collection_name, schema, using='default', shards_num=2)

# Insert some random data
data = [
    [i for i in range(10)],
    [[random.random() for _ in range(2)] for _ in range(10)],
]
collection.insert(data)
collection.num_entities

# Load collection to memory
collection.load()

# Conduct a similarity search with an expression filtering ID column
search_param = {
    "data": [[1.0, 1.0]],
    "anns_field": "films",
    "param": {"metric_type": "L2"},
    "limit": 2,
    "expr": "film_id in [2,4,6,8]"
}
res = collection.search(**search_param)

# Check results
hits = res[0]
print(f"- Total hits: {len(hits)}, hits ids: {hits.ids} ")
print(f"- Top1 hit id: {hits[0].id}, distance: {hits[0].distance}, score: {hits[0].score} ")

In Node.js

import { MilvusClient } from "@zilliz/milvus2-sdk-node";
const milvusClient = new MilvusClient("localhost:19530");

// Prepare a test collection
const COLLECTION_NAME = "test_collection_search";
milvusClient.collectionManager.createCollection({
  collection_name: COLLECTION_NAME,
  fields: [
    {
      name: "films",
      description: "vector field",
      data_type: DataType.FloatVector,
      type_params: {
        dim: "2",
      },
    },
    {
      name: "film_id",
      data_type: DataType.Int64,
      autoID: false,
      is_primary_key: true,
      description: "",
    },
  ],
});

// Insert some random data
let id = 1;
const entities = Array.from({ length: 10 }, () => ({
  films: Array.from({ length: 2 }, () => Math.random() * 10),
  film_id: id++,
}));

await milvusClient.collectionManager.insert({
  collection_name: COLLECTION_NAME,
  fields_data: entities,
});

// Load collection to memory & conduct a search with boolean expression
await milvusClient.collectionManager.loadCollection({
  collection_name: COLLECTION_NAME,
});
await milvusClient.dataManager.search({
  collection_name: COLLECTION_NAME,
  // partition_names: [],
  expr: "film_id in [1,4,6,8]",
  vectors: [entities[0].films],
  search_params: {
    anns_field: "films",
    topk: "4",
    metric_type: "L2",
    params: JSON.stringify({ nprobe: 10 }),
  },
  vector_type: 100, // float vector -> 100
});

// search result will be like:{  status: { error_code: 'Success', reason: '' },  results: [    { score: 0, id: '1' },    { score: 9.266796112060547, id: '4' },    { score: 28.263811111450195, id: '8' },    { score: 41.055686950683594, id: '6' }  ]}

Boolean Expression Rules

A predicate expression outputs a boolean value. Milvus conducts scalar filtering by searching with predicates. A predicate expression, when evaluated, returns either TRUE or FALSE.

EBNF grammar rules describe boolean expressions rules:

Expr = LogicalExpr | NIL

LogicalExpr = LogicalExpr BinaryLogicalOp LogicalExpr 
              | UnaryLogicalOp LogicalExpr
              | "(" LogicalExpr ")"
              | SingleExpr;

BinaryLogicalOp = "&&" | "and" | "||" | "or";

UnaryLogicalOp = "not";

SingleExpr = TermExpr | CompareExpr;

TermExpr = IDENTIFIER "in" ConstantArray;

Constant = INTEGER | FLOAT

ConstantExpr = Constant
               | ConstantExpr BinaryArithOp ConstantExpr
               | UnaryArithOp ConstantExpr;
                                                          
ConstantArray = "[" ConstantExpr { "," ConstantExpr } "]";

UnaryArithOp = "+" | "-"

BinaryArithOp = "+" | "-" | "*" | "/" | "%" | "**";

CompareExpr = IDENTIFIER CmpOp IDENTIFIER
              | IDENTIFIER CmpOp ConstantExpr
              | ConstantExpr CmpOp IDENTIFIER
              | ConstantExpr CmpOpRestricted IDENTIFIER CmpOpRestricted ConstantExpr;

CmpOpRestricted = "<" | "<=";

CmpOp = ">" | ">=" | "<" | "<=" | "=="| "!=";

The following table lists the description of each symbol mentioned in the above Boolean expression rules:

Notation Description
= Definition.
, Concatenation.
; Termination.
| Alternation.
{...} Repetition.
(...) Grouping.
NIL Empty. The expression can be an empty string.
INTEGER Integers such as 1, 2, 3.
FLOAT Float numbers such as 1.0, 2.0.
CONST Integers or float numbers.
IDENTIFIER Identifier. In Milvus, the IDENTIFIER represents the field name.
LogicalOp A LogicalOp is a logical operator that supports combining more than one relational operation in one comparison. Returned value of a LogicalOp is either TRUE (1) or FALSE (0). There are two types of LogicalOps, including BinaryLogicalOps and UnaryLogicalOps.
UnaryLogicalOp UnaryLogicalOp refers to the unary logical operator "not".
BinaryLogicalOp Binary logical operators that perform actions on two operands. In a complex expression with two or more operands, the order of evaluation depends on precedence rules.
ArithmeticOp An ArithmeticOp, namely an arithmetic operator, performs mathematical operations such as addition and subtraction on operands.
UnaryArithOp A UnaryArithOp is an arithmetic operator that performs an operation on a single operand. The negative UnaryArithOp changes a positive expression into a negative one, or the other way round.
BinaryArithOp A BinaryArithOp, namely a binary operator, performs operations on two operands. In a complex expression with two or more operands, the order of evaluation depends on precedence rules.
CmpOp CmpOp is a relational operator that perform actions on two operands.
CmpOpRestricted CmpOpRestricted is restricted to "Less than" and "Equal".
ConstantExpr ConstantExpr can be a Constant or a BinaryArithop on two ConstExprs or a UnaryArithOp on a single ConstantExpr. It is defined recursively.
ConstantArray ConstantArray is wrapped by square brackets, and ConstantExpr can be repeated in the square brackets. ConstArray must include at least one ConstantExpr.
TermExpr TermExpr is used to check whether the value of an IDENTIFIER appears in a ConstantArray. TermExpr is represented by "in".
CompareExpr A CompareExpr, namely comparison expression can be relational operations on two IDENTIFIERs, or relational operations on one IDENTIFIER and one ConstantExpr, or ternary operation on two ConstantExprs and one IDENTIFIER.
SingleExpr SingleExpr, namely single expression, can be either a TermExpr or a CompareExpr.
LogicalExpr A LogicalExpr can be a BinaryLogicalOp on two LogicalExprs, or a UnaryLogicalOp on a single LogicalExpr, or a LogicalExpr grouped within parentheses, or a SingleExpr. The LogicalExpr is defined recursively.
Expr Expr, an abbreviation meaning expression, can be LogicalExpr or NIL.

Operators

Logical operators:

Logical operators perform a comparison between two expressions.

Symbol Operation Example Description
'and' && and expr1 && expr2 True if both expr1 and expr2 are true.
'or' || or expr1 || expr2 True if either expr1 or expr2 are true.

Binary arithmetic operators:

Binary arithmetic operators contain two operands and can perform basic arithmetic operations and return the corresponding result.

Symbol Operation Example Description
+ Addition a + b Add the two operands.
- Subtraction a - b Subtract the second operand from the first operand.
* Multiplication a * b Multiply the two operands.
/ Division a / b Divide the first operand by the second operand.
** Power a ** b Raise the first operand to the power of the second operand.
% Modulo a % b Divide the first operand by the second operand and yield the remainder portion.

Relational operators:

Relational operators use symbols to check for equality, inequality, or relative order between two expressions.

Symbol Operation Example Description
< Less than a < b True if a is less than b.
> Greater than a > b True if a is greater than b.
== Equal a == b True if a is equal to b.
!= Not equal a != b True if a is not equal to b.
<= Less than or equal a <= b True if a is less than or equal to b.
>= Greater than or equal a >= b True if a is greater than or equal to b.

Operator precedence and associativity:

The following table lists the precedence and associativity of operators. Operators are listed top to bottom, in descending precedence.

Precedence Operator Description Associativity
1 + - UnaryArithOp Left-to-right
2 not UnaryLogicOp Right-to-left
3 ** BinaryArithOp Left-to-right
4 * / % BinaryArithOp Left-to-right
5 + - BinaryArithOp Left-to-right
6 < <= > >= CmpOp Left-to-right
7 == != CmpOp Left-to-right
8 && and BinaryLogicOp Left-to-right
9 || or BinaryLogicOp Left-to-right
  • Expressions are normally evaluated from left to right. Complex expressions are evaluated one at a time. The order in which the expressions are evaluated is determined by the precedence of the operators used.
  • If an expression contains two or more operators with the same precedence, the operator to the left is evaluated first.
  • When a lower precedence operation should be processed first, it should be enclosed within parentheses.
  • Parentheses can be nested within expressions. Innermost parenthetical expressions are evaluated first.