Commit Graph

7951 Commits (44819613f9868a7e901e3aa52dd68c10d00ae056)

Author SHA1 Message Date
chyezh be1bd9615a
enhance: add configurable memory index load predict memory usage factor (#30563)
pr: #30561

related pr: #30475

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-06 22:00:49 +08:00
congqixia 8fec7de472
fix: [Cherry-pick] Proxy restful api doesn't register (#30072) (#30559)
Cherry-pick from master
pr: #30072
issue: #30074
This PR fix that management restful api in proxy doesn't register to
http service

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: wei liu <wei.liu@zilliz.com>
2024-02-06 16:58:33 +08:00
wayblink b2d3278c56
enhance: Add log when garbage collection resumed (#30536)
/kind enhancement
pr: #30535

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-02-05 17:09:53 +08:00
foxspy 88d57f1db9
enhance: Update Knowhere version (#30513)
/kind improvement

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-02-04 22:13:07 +08:00
aoiasd cc2bc3f8f2
enhance: [Cherry-Pick] access log should get get client info by get method (#30503)
https://github.com/milvus-io/milvus/pull/30502

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-04 18:57:07 +08:00
congqixia f2310ab4ce
enhance: [Cherry-pick] Use dynamic pool for `NewLoadIndexInfo` (#30489) (#30497)
Cherry-pick from master
pr: #30489 
See also #30445

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-04 16:39:06 +08:00
aoiasd ad4a53d225
enhance: [Cherry-Pick] Fix some access log bugs (#30496)
pr: https://github.com/milvus-io/milvus/pull/30409
https://github.com/milvus-io/milvus/pull/29680

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-04 16:37:07 +08:00
cai.zhang 3c5ff624f8
fix: [pick]Only use bound indexnodes in bound mode (#30462)
master pr: #30461 
issue: #30463

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-03 21:59:05 +08:00
yah01 655e235230
enhance: calculate the accuracy memory usage while loading segment (#30473) (#30475)
the old version Knowhere would copy the index data while loading, we
need to consider this to avoid OOM.

Knowhere provides a util function to indicate whether it will load the
index with disk, if not, we need to double the memory usage prediction
for index data

pr: #30473

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-03 13:01:12 +08:00
yihao.dai 20608287b9
fix: Decoupling importing segment from flush process (#30402) (#30439)
This pr decoups importing segment from flush process by:
1. Exclude the importing segment from the flush policy, this approch
avoids notifying the datanode to flush the importing segment, which may
not exist.
2. When RootCoord call Flush, DataCoord directly set the importing
segment state to `Flushed`.

issue: https://github.com/milvus-io/milvus/issues/30359

pr: https://github.com/milvus-io/milvus/pull/30402

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-02-03 12:59:14 +08:00
yah01 f50799b7fd
fix: proxy may never setup if the port binded (#30035) (#30416)
the proxy miss-returned nil while failed to listen the port, then the
server continues to run but we can't connect to service resolve #30034
pr: #30035

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-02 16:21:06 +08:00
smellthemoon 692dcebac6
enhance: support varchar autoid when bulkinsert(#30377) (#30448)
related pr: #30377

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-02-02 16:11:08 +08:00
congqixia 69a82acc46
enhance: [Cherry-pick] Set delete scope for LoadSegment streaming data (#30245) (#30367)
Cherry pick from master
pr: #30245
See also #29474

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-02 16:05:06 +08:00
SimFG 73df0b872e
fix: [2.3] add more requests to the database interceptor (#30453)
issue: https://github.com/milvus-io/milvus/issues/30368
pr: #30452

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-02-02 16:03:06 +08:00
cqy123456 3036c19867
fix: can't not get search_cache_budget_gb in create index (#30353)
issue:https://github.com/milvus-io/milvus/issues/30375
pr: https://github.com/milvus-io/milvus/pull/30119

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-01-31 15:49:03 +08:00
yah01 028721db25
enhance: optimize the loading strategy (#29910) (#30348)
as we have the pool size limit so we don't need to limit the concurrency
manually
pr: #29910

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-31 15:25:04 +08:00
chyezh 3e994242d6
fix: panic with datanode negetive wait group counter (#30136)
issue: #29170
pr: #30135

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-30 18:07:03 +08:00
chyezh 21c944beaa
enhance: add basic information of milvus into metrics (#29666)
add basic build information and runtime component dependency into
metrics.

issue: #29664
pr: #29665

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-29 15:49:04 +08:00
xige-16 9ab2ce0767
enhance: [Cherry-pick] Opt vector dimension mismatch error message (#30316)
Cherry-pick from master
pr: https://github.com/milvus-io/milvus/pull/29928

Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-29 14:47:03 +08:00
chyezh 77e123762f
enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30320)
1. add coordinator and proxy graceful stop timeout to 5s.
3. add other work node graceful stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth
4. change the order of datacoord component while stop.
5. `LivenessCheck` do not perform graceful shutdown now. 

issue: https://github.com/milvus-io/milvus/issues/30310
pr: #30317
also see: https://github.com/milvus-io/milvus/pull/30306

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-27 08:45:02 +08:00
yihao.dai e0f987ee9b
enhance: Allows proactive warming up of chunk cache (#30182) (#30289)
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.

issue: https://github.com/milvus-io/milvus/issues/30181

pr: https://github.com/milvus-io/milvus/pull/30182

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-26 09:57:01 +08:00
Bingyi Sun 2c4d0605ef
enhance: add a weight for growing row count when balancing segments (#30293)
Cherry-pick from master
pr: #30271

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-01-26 09:17:03 +08:00
congqixia d182a51653
fix: [Cherry-pick] Use correct pools for all CGO methods in segments pkg (#30275)
Cherry-pick from master
pr: #30274
See also #30273

This PR:
- Rename confusing `LoadIndexInfo` to `UpdateIndexInfo` for LocalSegment
- Use `DynamicPool` instead of `LoadPool` for `UpdateSealedSegmentIndex`
- Fix cgo call missing pool control

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 19:49:01 +08:00
congqixia 1a54571c10
enhance: [Cherry-pick] Add trace span for scheduling read tasks in QueryNode (#30266)
Cherry-pick from master
pr: #30265 

This PR adds a trace span for search/query task scheduling duration

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 15:39:01 +08:00
congqixia 9e8eb2aa51
fix: Revert leader checker related check (#30262)
See also #30150
PR reverted: #29984 #30152

Currently this scenario could not be covered by ut/it/e2e test cases
Revert it for now

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 12:39:02 +08:00
congqixia e3114b6a4d
enhance: [2.3] Utilize partition key optimization in reQuery (#30255)
Partial cherry-pick from master due to code branching
pr: #30253 
See also #30250

This PR add requery flag in query task. When reQuery flag is true, query
task shall skip partition name conversion and use pre-calculated
partitionIDs passed from search task.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 11:05:07 +08:00
SimFG 95cd6f20d0
fix: [2.3] wrong format expr for the delete rest api (#30218)
/kind improvement
issue: https://github.com/milvus-io/milvus/issues/30092
pr: #30217

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-24 11:27:05 +08:00
cai.zhang efea282111
feat: [Pick] Support tencent cloud object storage for milvus (#30210)
issue: #30162 
master pr: #30163

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-23 16:07:01 +08:00
congqixia 35e4165722
enhance: [2.3] make Load process traceable in querynode & segcore (#30187)
Cherry-pick from master, modified some files since branching
pr: #29858
See also #29803

This PR:
- Add trace span for LoadIndex & LoadFieldData in segment loader
- Add TraceCtx parameter for Index.Load in segcore
- Add span for ReadFiles & Engine Load for Memory/Disk Vector index

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-23 15:58:57 +08:00
yah01 4d0a6dbc25
fix: written file size is over the int32 range and raises error (#30057) (#30207)
we sum the total data size in int32, which could lead to an overflow
error
related #30056

pr: #30057

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 13:50:56 +08:00
yah01 9bd94c4fab
fix: the system rejects all queries and never recovers if enabled read rate limit (#30061) (#30196)
fix #30060
pr: #30061

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 10:37:00 +08:00
yah01 0e71923408
enhance: enable converting segcore error to merr (#29914) (#30178)
this converts the segcore error to merr if possible
pr: #29914

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:56:55 +08:00
yah01 c8a129756f
enhance: filter out the not needed collections while listing (#29690) (#30180)
this improves performance while many collections exist resolve #29631
pr: #29690

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:52:55 +08:00
MrPresent-Han 6aaccdd5f4
feat: support general capacity restrict for cloud-side resoure contro… (#30017)
related: #29844
pr: #https://github.com/milvus-io/milvus/pull/29845

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-22 16:18:56 +08:00
SimFG 2465d86138
enhance: [2.3] support related privilege for grant api (#30154)
/kind improvement
pr: #30153

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-22 14:42:55 +08:00
yah01 ce318f3286
enhance: make the error of parsing expression to `ParameterInvalid` (#29681) (#29795)
before this, the error is unexpected error
pr: #29681

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 13:36:55 +08:00
yihao.dai 917a4d74f3
fix: Use channel cp as the dml&start position for import segments (#30107) (#30133)
This PR discontinuing the subscription to the mq and, instead, employing
the channel checkpoint as the DML and starting position for the import
segments.

issue: https://github.com/milvus-io/milvus/issues/30106

pr: https://github.com/milvus-io/milvus/pull/30107

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-22 13:32:55 +08:00
yah01 a8d9b0ccba
enhance: optimize the loading index performance (#29894) (#30018)
this utilizes concurrent loading
pr: #29894

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 13:12:56 +08:00
congqixia bac1a1355b
fix: [Cherry-pick] collection properties not saved for alter collection (#30145) (#30156)
Cherry-pick from master
pr: #30145
Resolves: #30144

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-22 10:08:55 +08:00
yihao.dai b95f0cc0a1
enhance: Add a counter monitoring for the rate-limit requests (#30109) (#30132)
Add a counter monitoring metric for the ratelimited rpc requests with
labels: proxy nodeID, rpc request type, and state.

issue: https://github.com/milvus-io/milvus/issues/30052

pr: https://github.com/milvus-io/milvus/pull/30109

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-21 14:44:59 +08:00
PowderLi 3dc2585d9b
enhance: support dataType: array & json (#30077)
issue: #30075 
master pr: #30076

deal with the array<?> field data correctly

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-01-21 14:00:56 +08:00
wei liu b2997eb881
fix: Leader checker can't remove segment from leader view (#30152)
issue: #30150
pr: #30151

This PR fix three problems:

1. the load request generated by leader checker doesn't set load scope
2. leader checker use wrong node id when generate release task, which
cause the release task finished immediately
3. the release request generated by leader_checker doesn't set the force
flag, the operation to clean leader view on delegator will fail.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-20 18:58:58 +08:00
congqixia 079ddbfc01
enhance: [Cherry-pick] Shuffle candidates before channel assignment (#30066) (#30089)
Cherry-pick from master
pr: #30066

Shuffle candidates to reduce scenario that some channel allocated into
same node

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-19 12:06:54 +08:00
foxspy 0700434c58
fix: patching search cache param when index meta does not hold one (#30116)
patch search cache param from index configs when index meta could not
get the search cache size key

issue: #30113 
pr: #30119

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-01-19 11:50:56 +08:00
SimFG be1470a654
enhance: [2.3] Add load/release partitions to replicate msg stream (#30001)
/kind improvement
pr: #28399

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-18 22:50:55 +08:00
wei liu 71e24f0a7f
fix: Remove heartbeat lag logic during get shard leaders (#29999) (#30085)
issue: #29677 #29838
pr: #29999
during get shard leaders, if qeurynode doesn't ack the heartbeat than
10s, querycoord will treat it as unavailable, and won't return shard
leader on it. but when querynode has a full cpu usage, it's easily to
stuck for more than 10s without ack the heartbeat, which cause no shard
leader to search/query.

This PR remove heartbeat lag logic during get shard leaders

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-18 17:48:55 +08:00
congqixia 7f32576f36
enhance: [cherry-pick] replace magic number with ParamItem for dist handler (#30020) (#30070)
Cherry-pick from master
pr: #30020
See also #28817

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-18 15:58:54 +08:00
wei liu 7d73032582
enhance: refactor leader_observer to leader_checker (#29454) (#29984)
issue: #29453
pr: #29452
sync distribution by rpc will also call loadSegment/releaseSegment,
which may cause all kinds of concurrent case on same segment, such as
concurrent load and release on one segment.
This PR add leader_checker which generate load/release task to correct
the leader view, instead of calling sync distribution by rpc

---------

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-18 14:08:54 +08:00
congqixia ce1ba6808a
enhance: [cherry-pick] change some important request log level to Info (#30062) (#30071)
Cherry-pick from master
pr: #30062 
Some important request log level shall be at least Info level

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-18 12:44:55 +08:00
congqixia 14aa20b7f7
enhance: [cherry-pick] fix otel config param type & leak (#30068)
cherry pick from master
pr: #29810 #30055 

`SampleFraction` shall be float and all `C.CString` shall be freed

Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-18 12:43:05 +08:00