Commit Graph

11 Commits (eb046863485fdf3e130fc60484485c901b81276b)

Author SHA1 Message Date
Spade A 52c7d7dd80
fix: offset combined with term should be based on Token positions in phrase match (#39931)
fix: #39711

Unlike English sentence where each words are parsed exactly once and one
after one with position length 1, one Chinese word may be parsed to
multiple words with position length larger than 1.

For example, "badminton and skiing" will be parsed to Token{ start: 0,
length: 1, text: "badminton" }, Token{ start: 1, length: 1, text: "and"
}, and Token{ start: 2, length: 1, text: "tennis" }.

While for exmaple for Chinsese: "羽毛球和滑雪" may be parsed to Token{ start:
0, length: 2, text: "羽毛" }, Token{ start: 0, length: 3, text: "羽毛球" },
Token{ start: 3, length: 1, text: "和" }, and Token{ start: 4, length: 2,
text: "滑雪" }.

This PR fix that the code not recognizes this situation.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-02-18 20:38:51 +08:00
Spade A 032292a432
feat: support phrase match query (#38869)
The relevant issue: https://github.com/milvus-io/milvus/issues/38930

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-01-12 20:24:58 +08:00
Spade A 8abf6c9149
fix: build text index when loading field data (#39070)
fix: https://github.com/milvus-io/milvus/issues/39053
may fix https://github.com/milvus-io/milvus/issues/38644 which could be
caused by https://github.com/milvus-io/milvus/issues/39053

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-01-09 15:24:56 +08:00
Zhen Ye b537a72309
fix: interted index out of range (#38577)
issue: #38546, #38486

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-19 15:20:47 +08:00
zhagnlu e4b6773d0a
fix: fix create text index dir conflict bug (#37693)
#37623

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-11-15 18:26:30 +08:00
smellthemoon 3389a6b500
enhance: support null in text match index (#37517)
#37508

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-13 11:08:29 +08:00
aoiasd 12951f0abb
enhance: rename tokenizer to analyzer and check analyzer params (#37478)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-10 16:12:26 +08:00
aoiasd d67853fa89
feat: Tokenizer support build with params and clone for concurrency (#37048)
relate: https://github.com/milvus-io/milvus/issues/35853
https://github.com/milvus-io/milvus/issues/36751

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-06 17:48:24 +08:00
Buqian Zheng f7b811450d
feat: add enable_tokenizer params to VarChar field (#36480)
issue: #35922

add an enable_tokenizer param to varchar field: must be set to true so
that a varchar field can enable_match or used as input of BM25 function

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-10 20:33:21 +08:00
zhagnlu 489087d18b
enhance: refactor executor framework V2 (#35251)
#32636

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-13 20:57:09 +08:00
Jiquan Long 89bf226f0b
feat: support keyword text match (#35923)
fix: #35922

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-09-10 15:11:08 +08:00