Updated Hybrid Search (markdown)

master
Jael Gu 2021-10-18 15:24:13 +08:00
parent 2db812aef5
commit 196b425f28
1 changed files with 135 additions and 1 deletions

@ -2,7 +2,7 @@
In addition to vectors, Milvus supports data types such as boolean, integers, floating-point numbers, and more. A collection in Milvus can hold multiple fields for accommodating different data features or properties. Milvus is a flexible vector database that pairs scalar filtering with powerful vector similarity search.
A hybrid search is a vector similarity search, during which you can filter the scalar data by specifying a [boolean expression](https://milvus.io/docs/v2.0.0/boolean.md).
A hybrid search is a vector similarity search, during which you can filter the scalar data by specifying a [boolean expression](https://github.com/milvus-io/milvus/wiki/Hybrid-Search#boolean-expression-rules).
For example:
@ -111,3 +111,137 @@ await milvusClient.dataManager.search({
// search result will be like:{ status: { error_code: 'Success', reason: '' }, results: [ { score: 0, id: '1' }, { score: 9.266796112060547, id: '4' }, { score: 28.263811111450195, id: '8' }, { score: 41.055686950683594, id: '6' } ]}
```
## Boolean Expression Rules
A predicate expression outputs a boolean value. Milvus conducts scalar filtering by searching with predicates. A predicate expression, when evaluated, returns either TRUE or FALSE.
[EBNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form) grammar rules describe boolean expressions rules:
```
Expr = LogicalExpr | NIL
LogicalExpr = LogicalExpr BinaryLogicalOp LogicalExpr
| UnaryLogicalOp LogicalExpr
| "(" LogicalExpr ")"
| SingleExpr;
BinaryLogicalOp = "&&" | "and" | "||" | "or";
UnaryLogicalOp = "not";
SingleExpr = TermExpr | CompareExpr;
TermExpr = IDENTIFIER "in" ConstantArray;
Constant = INTERGER | FLOAT
ConstantExpr = Constant
| ConstantExpr BinaryArithOp ConstantExpr
| UnaryArithOp ConstantExpr;
ConstantArray = "[" ConstantExpr { "," ConstantExpr } "]";
UnaryArithOp = "+" | "-"
BinaryArithOp = "+" | "-" | "*" | "/" | "%" | "**";
CompareExpr = IDENTIFIER CmpOp IDENTIFIER
| IDENTIFIER CmpOp ConstantExpr
| ConstantExpr CmpOp IDENTIFIER
| ConstantExpr CmpOpRestricted IDENTIFIER CmpOpRestricted ConstantExpr;
CmpOpRestricted = "<" | "<=";
CmpOp = ">" | ">=" | "<" | "<=" | "=="| "!=";
```
The following table lists the description of each symbol mentioned in the above Boolean expression rules:
| Notation | Description |
| ----------- | ----------- |
| = | Definition. |
| , | Concatenation. |
| ; | Termination. |
| \| | Alternation. |
| {...} | Repetition. |
| (...) | Grouping. |
| NIL | Empty. The expression can be an empty string. |
| INTEGER | Integers such as 1, 2, 3. |
| FLOAT | Float nubmers such as 1.0, 2.0. |
| CONST | Integers or float numbers. |
| IDENTIFIER | Identifier. In Milvus, the IDENTIFIER represents the field name. |
| LogicalOp | A LogicalOp is a logical operator that supports combining more than one relational operation in one comparison. Returned value of a LogicalOp is either TRUE (1) or FALSE (0). There are two types of LogicalOps, including BinaryLogicalOps and UnaryLogicalOps. |
| UnaryLogicalOp | UnaryLogicalOp refers to the unary logical operator "not". |
| BinaryLogicalOp | Binary logical operators that perform actions on two operands. In a complex expression with two or more operands, the order of evaluation depends on precedence rules. |
| ArithmeticOp | An ArithmeticOp, namely an arithmetic operator, performs mathematical operations such as addition and subtraction on operands. |
| UnaryArithOp | A UnaryArithOp is an arithmetic operator that performs an operation on a single operand. The negative UnaryArithOp changes a positive expression into a negative one, or the other way round. |
| BinaryArithOp | A BinaryArithOp, namely a binary operator, performs operations on two operands. In a complex expression with two or more operands, the order of evaluation depends on precedence rules. |
| CmpOp | CmpOp is a relational operator that perform actions on two operands. |
| CmpOpRestricted | CmpOpRestricted is restricted to "Less than" and "Equal". |
| ConstantExpr | ConstantExpr can be a Constant or a BinaryArithop on two ConstExprs or a UnaryArithOp on a single ConstantExpr. It is defined recursively. |
| ConstantArray | ConstantArray is wrapped by square brackets, and ConstantExpr can be repeated in the square brackets. ConstArray must include at least one ConstantExpr. |
| TermExpr | TermExpr is used to check whether the value of an IDENTIFIER appears in a ConstantArray. TermExpr is represented by "in". |
| CompareExpr | A CompareExpr, namely comparison expression can be relational operations on two IDENTIFIERs, or relational operations on one IDENTIFIER and one ConstantExpr, or ternary operation on two ConstantExprs and one IDENTIFIER. |
| SingleExpr | SingleExpr, namely single expression, can be either a TermExpr or a CompareExpr. |
| LogicalExpr | A LogicalExpr can be a BinaryLogicalOp on two LogicalExprs, or a UnaryLogicalOp on a single LogicalExpr, or a LogicalExpr grouped within parentheses, or a SingleExpr. The LogicalExpr is defined recursively. |
| Expr | Expr, an abbreviation meaning expression, can be LogicalExpr or NIL. |
### Operators
#### Logical operators:
Logical operators perform a comparison between two expressions.
| Symbol| Operation | Example | Description |
| ----------| ------------- | ----------- | ------------------------- |
| 'and' && | and | expr1 && expr2 | True if both expr1 and expr2 are true. |
| 'or' \|\| | or | expr1 \|\| expr2 | True if either expr1 or expr2 are true. |
#### Binary arithmetic operators:
Binary arithmetic operators contain two operands and can perform basic arithmetic operations and return the corresponding result.
| Symbol| Operation | Example | Description |
| ----------| ------------- | ----------- | ------------------------- |
| + | Addition | a + b | Add the two operands. |
| - | Subtraction | a - b | Subtract the second operand from the first operand. |
| * | Multiplication| a * b | Multiply the two operands. |
| / | Division | a / b | Divide the first operand by the second operand. |
| ** | Power | a ** b | Raise the first operand to the power of the second operand. |
| % | Modulo | a % b | Divide the first operand by the second operand and yield the remainder portion. |
#### Relational operators:
Relational operators use symbols to check for equality, inequality, or relative order between two expressions.
| Symbol| Operation | Example | Description |
| ----------| ------------- | ----------- | ------------------------- |
| < | Less than | a < b | True if a is less than b. |
| > | Greater than | a > b | True if a is greater than b. |
| == | Equal | a == b | True if a is equal to b. |
| != | Not equal | a != b | True if a is not equal to b. |
| <= | Less than or equal | a <= b | True if a is less than or equal to b. |
| >= | Greater than or equal | a >= b | True if a is greater than or equal to b. |
#### Operator precedence and associativity:
The following table lists the precedence and associativity of operators. Operators are listed top to bottom, in descending precedence.
| Precedence | Operator | Description | Associativity |
|------------|-----------|---------------|---------------|
| 1 | + - | UnaryArithOp | Left-to-right |
| 2 | not | UnaryLogicOp | Right-to-left |
| 3 | ** | BinaryArithOp | Left-to-right |
| 4 | * / % | BinaryArithOp | Left-to-right |
| 5 | + - | BinaryArithOp | Left-to-right |
| 6 | < <= > >= | CmpOp | Left-to-right |
| 7 | == != | CmpOp | Left-to-right |
| 8 | && and | BinaryLogicOp | Left-to-right |
| 9 | \|\| or | BinaryLogicOp | Left-to-right |
- Expressions are normally evaluated from left to right. Complex expressions are evaluated one at a time. The order in which the expressions are evaluated is determined by the precedence of the operators used.
- If an expression contains two or more operators with the same precedence, the operator to the left is evaluated first.
- When a lower precedence operation should be processed first, it should be enclosed within parentheses.
- Parentheses can be nested within expressions. Innermost parenthetical expressions are evaluated first.