Commit Graph

8116 Commits (c5212a42b645c38096fd12be143167ae970c6b73)

Author SHA1 Message Date
congqixia 368180bce4
fix: [2.3] Check nodeID before update channel checkpoint (#31473) (#31508)
Cherry-pick from master
pr: #31473
See also #31470 #31506

This PR adds nodeID assignment verification before updating channel
checkpoints.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-23 07:07:07 +08:00
Jiquan Long ab059bb064
enhance: add more metrics (#31271) (#31511)
/kind improvement
pr: #31271 
fix: https://github.com/milvus-io/milvus/issues/31272

This pr add more metrics, which are:

Slow query count, which the duration considered as slow can be
configurable;
Number of deleted entities;
Number of entities per collection;
Number of loaded entities per collection;
Number of indexed entities;
Number of indexed entities, per collection, per index and whether it's a
vetor index;
Quota states (LongTimeTickDelay, MemoryExhuasted, DiskQuotaExhuasted)
per database;

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-22 16:11:07 +08:00
wei liu ef523bfef3
fix: Unstable ut TestGetClientFailed (#31296) (#31472)
issue: #31295
pr: #31296

This PR fix unstable ut TestGetClientFailed

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-22 11:01:07 +08:00
wei liu 0bf595a513
enhance: Speed up target recovery after query coord restart (#31240) (#31449)
issue: #28491
pr: #31240

after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.

This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-22 10:27:17 +08:00
wei liu f8496dbc73
fix: Balance channel stuck forever due to logic dead lock (#31202) (#31455)
issue: #30816
pr: #31202

cause balance channel will stuck until leader view catch up the current
target, then start to unsub the old delegator. which make sure that the
new delegator can provide search before release old delegator. but
another logic in segment_checker skip loading segment during balance
channel. so during balance channel, if query node crash, new delegator
can't catch up target forever, then stuck forever.

This PR remove the rule that skip loading segment during balance channel
to avoid the logic dead lock here.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-21 18:11:07 +08:00
wei liu c8658d17f8
fix: Grpcclient return unrecoverable error (#31256) (#31452)
issue: #31222
pr: #31256

grpcclient's `call` func return a unrecoverable error, then the caller's
retry policy also breaks due to this unrecoverable error.

This PR introduce `retry.Handle`, the new func use `func() (bool,
error)` as input parameters, which return `shouldRetry` directly, to
avoid grpcclient return a unrecoverable error

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-21 11:59:12 +08:00
wei liu 6b761204ce
fix: Set node unreachable when get shard client failed (#31277) (#31451)
issue: #30531
pr: #31277

cause get client from `shardClientMgr`, doesn't means query node is
unavailable. because of the ref counter policy in `shardClientMgr`,
which will clean the client, if no collection use qn as shard leader.

This PR fix that set node unreachable when get shard client failed.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-21 11:57:08 +08:00
wei liu 5994b6a7b0
fix: Search doesn't expire shard leader cache (#31380) (#31450)
issue: #31351
pr: #31380
This PR fixed that search doesn't expire shard leader cache when send
request to query node failed, which make every request keep trying to
connect a offline query node

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-21 11:55:07 +08:00
groot 1ca7cba222
enhance: Support MinIO TLS connection (#31292)
issue: https://github.com/milvus-io/milvus/issues/30709
master pr: #31311

Signed-off-by: yhmo <yihua.mo@zilliz.com>
Co-authored-by: Chen Rao <chenrao317328@163.com>
2024-03-21 11:15:20 +08:00
congqixia 94f3aec80a
enhance: [Cherry-pick] Add metrics for querycoord current target cp lag (#31391) (#31463)
Cherry-pick from master
pr: #31391 #31399
See also #31390

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-21 10:17:07 +08:00
wei liu fef430daed
fix: Wrong behavior of CurrentTargetFirst/NextTargetFirst in target manager(#31379) (#31419)
issue: #31162
pr: #31379

when give scope CurrentTargetFirst/NextTargetFirst, it's expected to
scan both current and next target.

This PR fixed wrong behavior of CurrentTargetFirst/NextTargetFirst in
target manager, which may cause unexpected task generated, and load
collection may stuck forever due to dirty leader view.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-20 23:39:07 +08:00
cai.zhang 52a7eb9548
fix: Fix bug for get segment index state (#31429)
issue: #31361 
master pr: #31427 
2.4 pr: #31428

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-03-20 15:05:06 +08:00
cai.zhang ef530a2324
enhance: When describing an index, fetch the index info in batches (#31239)
issue: #29313 
master pr: #31238

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-03-15 16:37:09 +08:00
Jiquan Long 50bfde92f2
fix: wrong num_entities used when mmap variable length data (#30848) (#31274)
https://github.com/milvus-io/milvus/issues/30728
pr: #30848

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-14 20:33:03 +08:00
jaime 5ddb0b435f
fix: revoke session may be ignored due to server context cancellation in advance (#31213)
issue: #31219
pr: #31220

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-14 19:05:04 +08:00
chyezh 7105e0b261
fix: lost dbname when only passing collection id to describeCollection (#31177)
issue: #30931
pr: #31167

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-11 19:51:03 +08:00
aoiasd e747f15c80
fix: flush insert data with nil buffer (#31159)
relate: https://github.com/milvus-io/milvus/issues/31165

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-03-11 17:43:03 +08:00
wei liu 855f71ac89
fix: Dirty sealed segment won't release after channel balance (#31095) (#31126)
issue: #31074
pr: #31095
This PR fix dirty sealed segment doesn't release after channel balance,
dirty sealed segment means segment doesn't exist in targets.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-11 15:01:11 +08:00
congqixia 3e7f2e8e7d
enhance: [Cherry-Pick] Use `ListIndexes` instead of `DescribeIndex` for qc broker (#31163)
Cherry pick from master 
pr: #31122

See also #31103

Since querycoord need index meta information from datacoord only, broker
shall use `ListIndexes` to skip segment index building check logic in
datacoord

This PR is also related to #30538, in which DescribeIndex caused lots of
memory usage and lead to OOM eventually

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-11 14:41:02 +08:00
pingliu 1dd4f4b4dc
enhance: jemalloc aarch64 platform use 64k pagesize. (#31114)
pr: https://github.com/milvus-io/milvus/pull/29522
enhance: jemalloc aarch64 platform use 64k pagesize.

Signed-off-by: ping.liu <ping.liu@zilliz.com>
2024-03-11 12:03:02 +08:00
congqixia 3c90475d55
enhance: [Cherry-pick] Add `ListIndexes` API from datacoord (#31104) (#31150)
Cherry-pick from master
pr: #31104
See also #31103

This PR add `listIndexes` API for datacoor server to list all indexes
for provided collection.
Comparing to the existing `DescribeIndex` API, the new one does NOT
check the segment index building progress to ease the burden when
invoking it

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-11 10:47:02 +08:00
Jiquan Long c37b7792f4
enhance: purge client infos periodically (#31037) (#31092)
https://github.com/milvus-io/milvus/issues/31007
pr: #31037 

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-08 10:17:01 +08:00
congqixia 383ff8b0b1
enhance: [2.3] Add flush trigger for channel cp updater (#31082)
See also #31024  #31058

Flush cost boosted from 2 seconds to 5 or more after the change of
channel updater. This PR add a manual trigger method to accelerate flush
procedure.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-07 15:15:01 +08:00
yihao.dai 3eeeae8519
fix: Fix errors in the Index service APIs are ignored (#31077) (#31086)
In Index service APIs, return error if occurs instead of always
returning nil. Additionally, add more tests to cover this scenario.

issue: https://github.com/milvus-io/milvus/issues/31069,
https://github.com/milvus-io/milvus/issues/31027

pr: https://github.com/milvus-io/milvus/pull/31077

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-06 22:55:00 +08:00
congqixia 53f5a67112
enhance: [Cherry-pick] Fix misleading log content & possible nil panic (#31021) (#31054)
Cherry pick from master
pr: #31021 

- Change load field log from "dy pool" to "load pool"
- Also defer delete when there is no error

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-06 16:09:01 +08:00
zhagnlu 095c94305c
fix: add GetSegments optimization to avoid meta mutex competition (#31026)
pr: #31025

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-03-05 14:49:01 +08:00
yihao.dai 91d17870d6
enhance: Prevent the backlog of channelCP update tasks, perform batch updates of channelCPs (#30941) (#31024)
This PR includes the following adjustments:

1. To prevent channelCP update task backlog, only one task with the same
vchannel is retained in the updater. Additionally, the lastUpdateTime is
refreshed after the flowgraph submits the update task, rather than in
the callBack function.
2. Batch updates of multiple vchannel checkpoints are performed in the
UpdateChannelCheckpoint RPC (default batch size is 128). Additionally,
the lock for channelCPs in DataCoord meta has been switched from key
lock to global lock.
3. The concurrency of UpdateChannelCheckpoint RPCs in the datanode has
been reduced from 1000 to 10.

issue: https://github.com/milvus-io/milvus/issues/30004

pr: https://github.com/milvus-io/milvus/pull/30941

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-05 14:27:01 +08:00
congqixia b7635ed989
enhance: [Cherry-pick] Change proxy connection manager to concurrent safe (#31009)
Cherry-pick from master
pr: #31008 
See also #31007

This PR:
- Add param item for connection manager behavior: TTL & check interval
- Change clientInfo map to concurrent map

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 14:13:00 +08:00
yihao.dai a5350f64a5
enhance: Reduce the memory usage of the timeTickSender (#30968) (#30991)
In the cache of the timeTickSender, retain only the latest stats instead
of storing stats for every time tick.

issue: https://github.com/milvus-io/milvus/issues/30967

pr: https://github.com/milvus-io/milvus/pull/30968

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-05 10:59:01 +08:00
congqixia 81b197267a
enhance: [Cherry-Pick] Add back load memory factor when esitmating memory resource (#30999)
Cherry-pick from master
pr: #30994
Segment load memory usage is underestimated due to removing the load
memroy factor. This PR adds it back to protect querynode OOM during some
extreme memory cases.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 09:15:00 +08:00
jaime 336e0ae45e
enhance: index meta use independent rather than global meta lock (#30986)
issue: https://github.com/milvus-io/milvus/issues/30837
pr: #30869

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-05 08:48:59 +08:00
chyezh df09222029
fix: starve lock caused by slow GetCompactionTo method when too much segments (#30965)
issue: #30823
pr: #30963

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-04 20:51:00 +08:00
XuanYang-cn bb2de0d964
fix: [cherry-pick] Clear DN unknown compaction tasks (#30972)
If DC restarted,  those unkonwn compaction tasks
will never get call back in DN, so that the segments in the compaction
task will be locked, unable to sync and compaction again, blocking cp
advance and compaction executing.

See also: #30137
pr: #30850

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-03-04 16:52:59 +08:00
wei liu db49b8524d
fix: Skip generate balance task when target not ready (#30725)
issue: #30723
pr: #30724

This PR skip generate balance task when collection's target isn't ready.
also refine the check stale logic in query coord's scheduler, if channel
exist in current or next target, task won't be canceled.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-04 11:38:59 +08:00
wei liu af54c3ba85
fix: Make datacoord client retry on index api (#30656)
pr: #30654

This PR add retry on all interface which belong to indexcoord in milvus
2.2 and. move to data coord in milvus 2.3, to prevent meet unimplemented
error during rolling upgrade from milvus 2.2 to 2.3.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-04 11:37:09 +08:00
cai.zhang 38e3d6af3e
enhance: Optimize DescribeIndex to reduce lock contention (#30975)
issue: #29313
issue: #30443
master pr: #30939

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-03-04 11:30:59 +08:00
SimFG b0569f430b
enhance: [2.3] retry to read when the s3 get the unexpect eof error (#30976)
issue: https://github.com/milvus-io/milvus/issues/30877
pr: #30861

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-04 10:42:59 +08:00
PowderLi c93f127c7d
fix: [cherry-pick] [restful v1] bug list (#30873)
master pr: #30871 issue: #30870
fix: vector field cannot be empty while insert
did a check whether the vector field is empty in advance

master pr: #30740
fix:
1. spelling mistake about metricsType #30643
2. int64 percious #20415
3. insert into collection which has multi vector fields #30674

enhance: support dataType: Float16Vector & BFloat16Vector #22837
#30980(master pr: #30969)
enhance: describe collection will show the field is partition key or not
#30789

---------

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-03-03 17:56:59 +08:00
SimFG ef84d40e54
enhance: [2.3] make the watch dm channel request better compatibility (#30954)
pr: #30952
issue: https://github.com/milvus-io/milvus/issues/30938

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-01 16:09:01 +08:00
wei liu b0c7f8653f
fix: Segment version doesn't update as expected (#30953)
issue: #30950 
pr: #30951

due to segment version doesn't update as expected.
This PR will update segment version until segment become loaded

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-01 14:21:10 +08:00
congqixia c3f831fce4
fix: [Cherry-pick] Disk resource is not requested for index loaded with disk (#30757) (#30948)
Cherry pick from master
pr: #30757
See also #30756

This PR:
- Request disk resource when index type, version loaded with disk
- Add attribute cache for index utility
- Add `typeutil.Pair`

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-01 13:07:00 +08:00
chyezh 483a32bced
feat: add collection level flush rate control (#29568)
flush rate control at collection level to avoid generate too much
segment.
0.1 qps by default.

issue: #29477
pr: #29567

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-03-01 10:23:01 +08:00
yihao.dai 2f76303989
enhance: Support varchar autoid for bulkinsertV1 (#30896) (#30913)
This PR is a supplement to PR
https://github.com/milvus-io/milvus/pull/30377.

pr: https://github.com/milvus-io/milvus/pull/30896

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-02-29 12:11:00 +08:00
Jiquan Long b0d8e21445
enhance: optimize the memory usage and speed up loading variable length data (#30787) (#30900)
pr: #30787 
/kind improvement

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-02-29 10:39:00 +08:00
PowderLi a4219cbb0f
fix: [cherry-pick] set proxy.http.acceptTypeAllowInt64: true as default (#30738)
issue: #30680
pr: #30720

also let the parameter item to be refreshable

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-02-29 09:59:07 +08:00
congqixia df16bf6acd
fix: [Cherry-pick] Remove time tick delay metrics when nodes go offline (#30833) (#30879)
Cherry-pick from master
pr: #30833
See also #30832

This PR removes time tick delay metrics when rootcoord GetMetrics
response does not have previously existed querynode/datanode

Also add unit tests for this case

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi.Xia <congqi.xia@zilliz.com>
2024-02-28 18:55:00 +08:00
Jiquan Long b10bec38c9
enhance: reduce 1x memory copy when loading json (#30753) (#30864)
/kind improvement
pr: #30753 

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-02-28 16:36:59 +08:00
wei liu ee705b7ce8
enhance: Correct misleading nodeID in GetComponentStates's log (#30732)
pr: #30731
This PR corrects the misleading nodeId in GetComponentStates's log

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-02-28 13:50:59 +08:00
chyezh 1c8d9fa686
fix: wrong context passing into NewClient, error handling lost in session_util (#30818)
issue: #30799
pr: #30817

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-28 10:41:00 +08:00
zhenshan.cao 2f4a13a7ae
enhance: Revert (#30197 #30690 #30415) (#30795)
Revert "enhance: reduce many I/O operations while loading disk index
(#30189) (#30690)" This reverts commit
d4c4bf946b.

Revert "enhance: limit the max pool size to 16 (#30371) (#30415)" This
reverts commit 52ac0718f0.

Revert "enhance: convert the `GetObject` util to async (#30166)
(#30197)" This reverts commit 4b7c5baab7.

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-02-24 09:07:46 +08:00
Xiaofan 2896f5eb69
enhance: [2.3] change frequent log to debug (#30781)
pr: #30782 
change the "pipeline fetch insert msg" log to debug

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-02-23 14:10:40 +08:00
chyezh a9625ec1ae
fix: nil ptr is used as nil interface in grpc client (#30755)
issue: #30715
pr: #30754

- Bug: Set nil struct pointer to describe nil interface.
Panic with segment violation when calling method on this nil struct
pointer.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-23 10:08:54 +08:00
zhagnlu e17775a20f
fix: fix upsert using wrong field to compute partition key (#30773)
pr: #30772

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-22 23:38:53 +08:00
cai.zhang ef086dc0ca
fix: [Pick] Skip filling segmentID in indexBuildCh to prevent flush blocked (#30749)
issue: #30580 
master pr: #30747

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-22 20:42:56 +08:00
congqixia 3d8b6a4d2e
fix: [Cherry-pick] Release loaded growing if WatchDmlChannel fail (#30735) (#30745)
Cherry pick from master
pr: #30735
See also #30734

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-22 16:44:55 +08:00
congqixia 31f33f67e0
fix: [cherry-pick] Update disk usage metrics after segment released (#30702) (#30707)
Cherry-pick from master
pr: #30702
See also #30701

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-21 10:54:53 +08:00
cai.zhang e8e221ca38
[Pick]enhance: Use virtual host for tencent cloud (#30685)
master pr: #30650

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-21 09:52:59 +08:00
yah01 d4c4bf946b
enhance: reduce many I/O operations while loading disk index (#30189) (#30690)
before this, every time writting the index chunk data into the disk,
there are 4 I/O operations:
- open the file
- seek to the offset
- write the data
- close the file

this optimized this to open only once and continiously write all data.

This also makes it concurrent to load the files from object storage

pr: #30189

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 17:40:52 +08:00
congqixia 8734bcc645
fix: [Cherry-pick] Prevent ChunkCache use absolute path in All-in-one mode (#30666) (#30679)
Cherry pick from master
pr: #30666
See also #30651

Append operator of `std::filesystem::path` will replace whole path when
the param of "/" operation is an absolute path.

In "All-in-one" mode, this shall cause ChunkCache removing the original
vector data file when building chunk cache during/after load procedure.

This PR changes the ChunkCache path generation logic to a separate
function in which will check whether the file path is absolute or not.
If the file path is absolute, it removes the root path prefix and return
concatenated file path.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-20 16:48:50 +08:00
yah01 52ac0718f0
enhance: limit the max pool size to 16 (#30371) (#30415)
according to our benchmark, concurrency level 16 is enough to fully
utilize the object storage network bandwidth
pr: #30371

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 15:58:52 +08:00
yah01 4b7c5baab7
enhance: convert the `GetObject` util to async (#30166) (#30197)
This makes it much easier to use
pr: #30166

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-20 11:30:52 +08:00
foxspy 35330ff8ea
enhance: Update Knowhere version (#30640)
/kind branch-feature

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-02-18 20:28:52 +08:00
Jiquan Long 26f012c564
fix: Add retry on unimplemented error for datacoord (#30554) (#30639)
issue: #30553
pr: #30554 

when datacoord with version 2.2 and querycoord with version 2.3 coexist
during rolling upgrade, `DescribeIndex/GetIndexInfo` will return
`unimplemented` error
This PR add retry on `DescribeIndex/GetIndexInfo`, to prevent load
collection failed during rolling upgrade from milvus 2.2 to 2.3.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: wei liu <wei.liu@zilliz.com>
2024-02-18 20:26:59 +08:00
zhenshan.cao 48707f3aac
fix: should return collectionName in response of ListAliases (#30533)
issue : https://github.com/milvus-io/milvus/issues/30369
pr: https://github.com/milvus-io/milvus/pull/30532

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-02-12 08:30:55 +08:00
zhagnlu a209d05537
fix: erase pk empty check when pk index replace raw data (#30432) (#30578)
pr: #30432

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-12 08:24:53 +08:00
chyezh be1bd9615a
enhance: add configurable memory index load predict memory usage factor (#30563)
pr: #30561

related pr: #30475

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-06 22:00:49 +08:00
congqixia 8fec7de472
fix: [Cherry-pick] Proxy restful api doesn't register (#30072) (#30559)
Cherry-pick from master
pr: #30072
issue: #30074
This PR fix that management restful api in proxy doesn't register to
http service

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: wei liu <wei.liu@zilliz.com>
2024-02-06 16:58:33 +08:00
wayblink b2d3278c56
enhance: Add log when garbage collection resumed (#30536)
/kind enhancement
pr: #30535

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-02-05 17:09:53 +08:00
foxspy 88d57f1db9
enhance: Update Knowhere version (#30513)
/kind improvement

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-02-04 22:13:07 +08:00
aoiasd cc2bc3f8f2
enhance: [Cherry-Pick] access log should get get client info by get method (#30503)
https://github.com/milvus-io/milvus/pull/30502

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-04 18:57:07 +08:00
congqixia f2310ab4ce
enhance: [Cherry-pick] Use dynamic pool for `NewLoadIndexInfo` (#30489) (#30497)
Cherry-pick from master
pr: #30489 
See also #30445

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-04 16:39:06 +08:00
aoiasd ad4a53d225
enhance: [Cherry-Pick] Fix some access log bugs (#30496)
pr: https://github.com/milvus-io/milvus/pull/30409
https://github.com/milvus-io/milvus/pull/29680

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-04 16:37:07 +08:00
cai.zhang 3c5ff624f8
fix: [pick]Only use bound indexnodes in bound mode (#30462)
master pr: #30461 
issue: #30463

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-03 21:59:05 +08:00
yah01 655e235230
enhance: calculate the accuracy memory usage while loading segment (#30473) (#30475)
the old version Knowhere would copy the index data while loading, we
need to consider this to avoid OOM.

Knowhere provides a util function to indicate whether it will load the
index with disk, if not, we need to double the memory usage prediction
for index data

pr: #30473

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-03 13:01:12 +08:00
yihao.dai 20608287b9
fix: Decoupling importing segment from flush process (#30402) (#30439)
This pr decoups importing segment from flush process by:
1. Exclude the importing segment from the flush policy, this approch
avoids notifying the datanode to flush the importing segment, which may
not exist.
2. When RootCoord call Flush, DataCoord directly set the importing
segment state to `Flushed`.

issue: https://github.com/milvus-io/milvus/issues/30359

pr: https://github.com/milvus-io/milvus/pull/30402

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-02-03 12:59:14 +08:00
yah01 f50799b7fd
fix: proxy may never setup if the port binded (#30035) (#30416)
the proxy miss-returned nil while failed to listen the port, then the
server continues to run but we can't connect to service resolve #30034
pr: #30035

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-02 16:21:06 +08:00
smellthemoon 692dcebac6
enhance: support varchar autoid when bulkinsert(#30377) (#30448)
related pr: #30377

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-02-02 16:11:08 +08:00
congqixia 69a82acc46
enhance: [Cherry-pick] Set delete scope for LoadSegment streaming data (#30245) (#30367)
Cherry pick from master
pr: #30245
See also #29474

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-02 16:05:06 +08:00
SimFG 73df0b872e
fix: [2.3] add more requests to the database interceptor (#30453)
issue: https://github.com/milvus-io/milvus/issues/30368
pr: #30452

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-02-02 16:03:06 +08:00
cqy123456 3036c19867
fix: can't not get search_cache_budget_gb in create index (#30353)
issue:https://github.com/milvus-io/milvus/issues/30375
pr: https://github.com/milvus-io/milvus/pull/30119

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-01-31 15:49:03 +08:00
yah01 028721db25
enhance: optimize the loading strategy (#29910) (#30348)
as we have the pool size limit so we don't need to limit the concurrency
manually
pr: #29910

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-31 15:25:04 +08:00
chyezh 3e994242d6
fix: panic with datanode negetive wait group counter (#30136)
issue: #29170
pr: #30135

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-30 18:07:03 +08:00
chyezh 21c944beaa
enhance: add basic information of milvus into metrics (#29666)
add basic build information and runtime component dependency into
metrics.

issue: #29664
pr: #29665

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-29 15:49:04 +08:00
xige-16 9ab2ce0767
enhance: [Cherry-pick] Opt vector dimension mismatch error message (#30316)
Cherry-pick from master
pr: https://github.com/milvus-io/milvus/pull/29928

Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-29 14:47:03 +08:00
chyezh 77e123762f
enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30320)
1. add coordinator and proxy graceful stop timeout to 5s.
3. add other work node graceful stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth
4. change the order of datacoord component while stop.
5. `LivenessCheck` do not perform graceful shutdown now. 

issue: https://github.com/milvus-io/milvus/issues/30310
pr: #30317
also see: https://github.com/milvus-io/milvus/pull/30306

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-27 08:45:02 +08:00
yihao.dai e0f987ee9b
enhance: Allows proactive warming up of chunk cache (#30182) (#30289)
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.

issue: https://github.com/milvus-io/milvus/issues/30181

pr: https://github.com/milvus-io/milvus/pull/30182

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-26 09:57:01 +08:00
Bingyi Sun 2c4d0605ef
enhance: add a weight for growing row count when balancing segments (#30293)
Cherry-pick from master
pr: #30271

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-01-26 09:17:03 +08:00
congqixia d182a51653
fix: [Cherry-pick] Use correct pools for all CGO methods in segments pkg (#30275)
Cherry-pick from master
pr: #30274
See also #30273

This PR:
- Rename confusing `LoadIndexInfo` to `UpdateIndexInfo` for LocalSegment
- Use `DynamicPool` instead of `LoadPool` for `UpdateSealedSegmentIndex`
- Fix cgo call missing pool control

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 19:49:01 +08:00
congqixia 1a54571c10
enhance: [Cherry-pick] Add trace span for scheduling read tasks in QueryNode (#30266)
Cherry-pick from master
pr: #30265 

This PR adds a trace span for search/query task scheduling duration

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 15:39:01 +08:00
congqixia 9e8eb2aa51
fix: Revert leader checker related check (#30262)
See also #30150
PR reverted: #29984 #30152

Currently this scenario could not be covered by ut/it/e2e test cases
Revert it for now

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 12:39:02 +08:00
congqixia e3114b6a4d
enhance: [2.3] Utilize partition key optimization in reQuery (#30255)
Partial cherry-pick from master due to code branching
pr: #30253 
See also #30250

This PR add requery flag in query task. When reQuery flag is true, query
task shall skip partition name conversion and use pre-calculated
partitionIDs passed from search task.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 11:05:07 +08:00
SimFG 95cd6f20d0
fix: [2.3] wrong format expr for the delete rest api (#30218)
/kind improvement
issue: https://github.com/milvus-io/milvus/issues/30092
pr: #30217

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-24 11:27:05 +08:00
cai.zhang efea282111
feat: [Pick] Support tencent cloud object storage for milvus (#30210)
issue: #30162 
master pr: #30163

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-23 16:07:01 +08:00
congqixia 35e4165722
enhance: [2.3] make Load process traceable in querynode & segcore (#30187)
Cherry-pick from master, modified some files since branching
pr: #29858
See also #29803

This PR:
- Add trace span for LoadIndex & LoadFieldData in segment loader
- Add TraceCtx parameter for Index.Load in segcore
- Add span for ReadFiles & Engine Load for Memory/Disk Vector index

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-23 15:58:57 +08:00
yah01 4d0a6dbc25
fix: written file size is over the int32 range and raises error (#30057) (#30207)
we sum the total data size in int32, which could lead to an overflow
error
related #30056

pr: #30057

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 13:50:56 +08:00
yah01 9bd94c4fab
fix: the system rejects all queries and never recovers if enabled read rate limit (#30061) (#30196)
fix #30060
pr: #30061

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-23 10:37:00 +08:00
yah01 0e71923408
enhance: enable converting segcore error to merr (#29914) (#30178)
this converts the segcore error to merr if possible
pr: #29914

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:56:55 +08:00
yah01 c8a129756f
enhance: filter out the not needed collections while listing (#29690) (#30180)
this improves performance while many collections exist resolve #29631
pr: #29690

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 16:52:55 +08:00
MrPresent-Han 6aaccdd5f4
feat: support general capacity restrict for cloud-side resoure contro… (#30017)
related: #29844
pr: #https://github.com/milvus-io/milvus/pull/29845

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-22 16:18:56 +08:00
SimFG 2465d86138
enhance: [2.3] support related privilege for grant api (#30154)
/kind improvement
pr: #30153

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-22 14:42:55 +08:00
yah01 ce318f3286
enhance: make the error of parsing expression to `ParameterInvalid` (#29681) (#29795)
before this, the error is unexpected error
pr: #29681

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 13:36:55 +08:00
yihao.dai 917a4d74f3
fix: Use channel cp as the dml&start position for import segments (#30107) (#30133)
This PR discontinuing the subscription to the mq and, instead, employing
the channel checkpoint as the DML and starting position for the import
segments.

issue: https://github.com/milvus-io/milvus/issues/30106

pr: https://github.com/milvus-io/milvus/pull/30107

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-22 13:32:55 +08:00
yah01 a8d9b0ccba
enhance: optimize the loading index performance (#29894) (#30018)
this utilizes concurrent loading
pr: #29894

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 13:12:56 +08:00
congqixia bac1a1355b
fix: [Cherry-pick] collection properties not saved for alter collection (#30145) (#30156)
Cherry-pick from master
pr: #30145
Resolves: #30144

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-22 10:08:55 +08:00
yihao.dai b95f0cc0a1
enhance: Add a counter monitoring for the rate-limit requests (#30109) (#30132)
Add a counter monitoring metric for the ratelimited rpc requests with
labels: proxy nodeID, rpc request type, and state.

issue: https://github.com/milvus-io/milvus/issues/30052

pr: https://github.com/milvus-io/milvus/pull/30109

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-21 14:44:59 +08:00
PowderLi 3dc2585d9b
enhance: support dataType: array & json (#30077)
issue: #30075 
master pr: #30076

deal with the array<?> field data correctly

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-01-21 14:00:56 +08:00
wei liu b2997eb881
fix: Leader checker can't remove segment from leader view (#30152)
issue: #30150
pr: #30151

This PR fix three problems:

1. the load request generated by leader checker doesn't set load scope
2. leader checker use wrong node id when generate release task, which
cause the release task finished immediately
3. the release request generated by leader_checker doesn't set the force
flag, the operation to clean leader view on delegator will fail.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-20 18:58:58 +08:00
congqixia 079ddbfc01
enhance: [Cherry-pick] Shuffle candidates before channel assignment (#30066) (#30089)
Cherry-pick from master
pr: #30066

Shuffle candidates to reduce scenario that some channel allocated into
same node

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-19 12:06:54 +08:00
foxspy 0700434c58
fix: patching search cache param when index meta does not hold one (#30116)
patch search cache param from index configs when index meta could not
get the search cache size key

issue: #30113 
pr: #30119

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-01-19 11:50:56 +08:00
SimFG be1470a654
enhance: [2.3] Add load/release partitions to replicate msg stream (#30001)
/kind improvement
pr: #28399

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-18 22:50:55 +08:00
wei liu 71e24f0a7f
fix: Remove heartbeat lag logic during get shard leaders (#29999) (#30085)
issue: #29677 #29838
pr: #29999
during get shard leaders, if qeurynode doesn't ack the heartbeat than
10s, querycoord will treat it as unavailable, and won't return shard
leader on it. but when querynode has a full cpu usage, it's easily to
stuck for more than 10s without ack the heartbeat, which cause no shard
leader to search/query.

This PR remove heartbeat lag logic during get shard leaders

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-18 17:48:55 +08:00
congqixia 7f32576f36
enhance: [cherry-pick] replace magic number with ParamItem for dist handler (#30020) (#30070)
Cherry-pick from master
pr: #30020
See also #28817

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-18 15:58:54 +08:00
wei liu 7d73032582
enhance: refactor leader_observer to leader_checker (#29454) (#29984)
issue: #29453
pr: #29452
sync distribution by rpc will also call loadSegment/releaseSegment,
which may cause all kinds of concurrent case on same segment, such as
concurrent load and release on one segment.
This PR add leader_checker which generate load/release task to correct
the leader view, instead of calling sync distribution by rpc

---------

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-18 14:08:54 +08:00
congqixia ce1ba6808a
enhance: [cherry-pick] change some important request log level to Info (#30062) (#30071)
Cherry-pick from master
pr: #30062 
Some important request log level shall be at least Info level

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-18 12:44:55 +08:00
congqixia 14aa20b7f7
enhance: [cherry-pick] fix otel config param type & leak (#30068)
cherry pick from master
pr: #29810 #30055 

`SampleFraction` shall be float and all `C.CString` shall be freed

Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-18 12:43:05 +08:00
zhenshan.cao 9aceff5a6e
fix: duplicate dynamic field data by mistake (#30043)
issue: #30000 
pr: https://github.com/milvus-io/milvus/pull/30042

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-17 00:20:55 +08:00
zhagnlu 9f6a19c56c
fix: increase expr recursion depth to avoid parse failed (#29860) (#30021)
pr: #29860

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-01-16 19:48:38 +08:00
cai.zhang 88c30b48ce
fix: [pick]Fix bug for read data from azure (#30006)
issue: #30005 
master pr: #30007

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-16 15:44:53 +08:00
PowderLi ff93e8b489
fix: [CHERRY-PICK] CollectionSchema.autoID is deprecated (#30011)
issue: [#30000](https://github.com/milvus-io/milvus/issues/30000)
related to: [milvus-proto
#202](https://github.com/milvus-io/milvus-proto/pull/202)
master pr: #30002

1. replace collSchema.AutoID with primaryField.AutoID
2. show `enableDynamic` & `enableDynamicField` at the same time
3. avoid data race about the access to metacache

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-01-16 14:32:53 +08:00
congqixia 1dbc2ab8ee
enhance: [Cherry-pick] make compactor use actual buffer size to decide when to sync(#29945) (#29971)
Cherry-pick from master
pr: #29945
See also: #29657

Datanode Compactor use estimated row number from schema to decide when
to sync the batch of data when executing compaction. This est value
could go way from actual size when the schema contains variable field(
say VarChar, JSON, etc.)

This PR make compactor able to check the actual buffer data size and
make it possible to sync when buffer is actually beyond max binglog
size.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-16 12:22:52 +08:00
congqixia 7fc7e1a0d5
enhance: [Cherry-pick] Use newer checkpoint when packing LoadSegmentRequest (#29922) (#29978)
Cherry-pick from master
pr: #29922 
See also: #29650

Either segment dml position & channel checkpoint could be newer in some
cases. This PR make PackLoadSegments use the newer one improving load
performance during cases where there are lots of upsert.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-16 12:08:53 +08:00
wei liu 81fdb6f472
enhance: Skip generate load segment task (#29724) (#29982)
issue: #29814
pr: #29724
if channel is not subscribed yet, the generated load segment task will
be remove from task scheduler due to the load segment task need to be
transfer to worker node by shard leader.

This PR skip generate load segment task when channel is not subscribed
yet.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-16 10:12:52 +08:00
chyezh df9b3376dc
fix: Use determined order to lock in BlockAll to avoid deadlock (#29972)
issue: #29104
pr: #29246

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-15 14:32:51 +08:00
chyezh 072b11355d
fix: SealedIndexingEntry in SealedIndexingRecord may leak without smart pointer protected (#29966)
may related issue: #29828
pr: #29932

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-15 10:30:52 +08:00
cai.zhang 434ac1f6d0
fix: [Pick]Fix error message for indexing (#29906)
issue: #29897 

master pr: #29898

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-14 13:30:52 +08:00
chyezh c8e3a48214
fix: querynode num entity metric is broken by illegal label (#29949)
issue: #29766
also see pr: #29825
pr: #29948

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-14 10:22:59 +08:00
congqixia 227071a754
enhance: [cherry-pick] reduce delete detail log to delete range (#29916) (#29930)
Cherry-pick from master
pr: #29916
Delete detail log will be large and hard to read when log level is
debug. This PR change the log to stringer and print only pk range,
number.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:18:51 +08:00
congqixia c21229b7bb
enhance: [cherry-pick] add trace span for wait tsafe (#29911) (#29929)
Cherry-pick from master
pr: #29911 
Add tracing span for search/query operation waiting tsafe duration

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 20:17:01 +08:00
aoiasd 128f197797
enhance: [Cherry-Pick] support access log print cluster prefix (#29646) (#29831)
relate: https://github.com/milvus-io/milvus/issues/29645
pr: https://github.com/milvus-io/milvus/pull/29646

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-01-12 18:58:52 +08:00
wei liu 86cddd24b5
enhance: Add ctx for load index logs (#29686) (#29905)
pr: #29686
This PR add ctx for load index logs

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-12 18:56:58 +08:00
SimFG d573f0ec1a
fix: [2.3] the delete msg disorder issue (#29917)
/kind improvement
pr: #29915

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-12 18:04:50 +08:00
wayblink e1446da83c
feat: [Cherry-pick] Implement DescribeAlias and ListAliases interfaces (#29896)
#22882
pr: #29641

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-01-12 16:30:51 +08:00
congqixia c56622dea7
enhance: move confusing warning log to error branch (#29891)
`flushInsertData` & `flushDeleteData` prints WARNING log even there is
no error returned. So move error branch into if block.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-12 15:50:52 +08:00
wei liu 16e7f51033
fix: Dynamic update rate limit config with wrong value (#29902)
pr: #29901 
when apply dynamic config changes, we should format the value to proper
unit
This PR fix update rate limit config with wrong value.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-12 15:10:51 +08:00
chyezh 98aae10273
fix: compact operation on datacoord meta should preform as a transcation (#29776)
issue: #29691
pr: #29775

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-12 14:54:52 +08:00
chyezh 7d3ec9f869
fix: unhealthy datacoord started with unhealthy channel manager (#29849)
issue: #29818
pr: #29848

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-12 14:24:54 +08:00
wei liu 5520bfbb05
enhance: Change some frequency log to rated level (#29720) (#29903)
pr: #29720
This PR change some frequency log to rated level

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-12 11:46:52 +08:00
yah01 4edcd4d22b
fix: the insert count is zero after set the pointer to nil (#29870) (#29881)
this leads to the EntitiesNum metric would be never reduced

fix: #29766
pr: #29870

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-12 10:20:51 +08:00
chyezh f0db26107c
fix: panic caused by type assert LocalSegment on Segment (#29018) (#29900)
- Make implementation of LocalWorker and RemoteWorker same.

issue: #29017, #29899
pr: #29018

Signed-off-by: yah01 <yah2er0ne@outlook.com>
Co-authored-by: yah01 <yah2er0ne@outlook.com>
2024-01-12 10:08:50 +08:00
jaime c0b711e9fb
enhance: Support read hardware metrics for cgroupv2 (#29847)
issue: #29846
pr: #29850

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-01-11 19:20:57 +08:00
congqixia 00c0a5a2ab
enhance: [Cherry-pick] make Load process traceable in querycoord (#29806) (#29869)
Cherry-pick from master
pr: #29806
See also #29803

This PR:
- Add trace span for collection/partition load
- Use TraceSpan to generate Segment/ChannelTasks when loading
- Refine BaseTask trace tag usage

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-11 18:00:52 +08:00
congqixia cd93954214
enhance: [Cherry-pick] pre-allocate result FieldData space to reduce growslice (#29726) (#29866)
Cherry-pick from master
pr: #29726

See also: #29113

Add a new utitliy function in `pkg/util/typetuil` to pre-allocate field
data slice capacity acoording to search limit. This shall avoid copying
the data during `AppendFieldData` when previous slice is out of space.
And shall also save CPU time during high paylog.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-11 17:59:01 +08:00
wei liu 603cd1fb3f
fix: Drop segment meta info with prefix (#29857)
pr: #29856
If segment has more than 128 log fils, drop segment will exceed etcd txn
ops limit, which will failed the drop segment request
This PR drop segment meta info with prefix, to avoid drop segment meta
failed

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-11 15:02:50 +08:00
zhenshan.cao 7cf2be09b5
fix: Restore the MVCC functionality. (#29749) (#29802)
When the TimeTravel functionality was previously removed, it
inadvertently affected the MVCC functionality within the system. This PR
aims to reintroduce the internal MVCC functionality as follows:

1. Add MvccTimestamp to the requests of Search/Query and the results of
Search internally.
2. When the delegator receives a Query/Search request and there is no
MVCC timestamp set in the request, set the delegator's current tsafe as
the MVCC timestamp of the request. If the request already has an MVCC
timestamp, do not modify it.
3. When the Proxy handles Search and triggers the second phase ReQuery,
divide the ReQuery into different shards and pass the MVCC timestamp to
the corresponding Query requests.

issue: #29656
pr: #29749

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-11 14:42:49 +08:00
yah01 e7e4561da8
fix: the entities num metric may be contributed more than once (#29767) (#29825)
the growing segments contribute to this metric while inserting and
putting into the manager, but the current impl inserts data before
putting the segments into manager, which leads to double contributions

fix: #29766
pr: #29767

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2024-01-11 10:24:51 +08:00
XuanYang-cn 1128b1dd67
fix: [cherry-pick]Save lite WatchInfo into etcd in DataNode (#29751)
See also: #29689
pr: #29687

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-01-10 20:48:50 +08:00
congqixia 6c9a5e347e
fix: [cherry-pick] Assertion all async invocations in test case (#29737) (#29782)
Cherry-pick from master
pr: #29737
Resolves: #29736

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-09 17:48:49 +08:00
yah01 38c61594c0
enhance: use GPU pool for gpu tasks (#29678) (#29706)
- this much improve the performance for GPU index
- this also reduce 1x copy while parsing index meta
pr: #29678

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-09 14:04:48 +08:00
zhenshan.cao 8c2ca3fb79
feat: Authorize users to query grant info of their roles (#29747) (#29762)
Once a role is granted to a user, the user should automatically possess
the privilege information associated with that role.

issue: #29710
pr: #29747

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-08 18:06:49 +08:00
congqixia 0c83440f99
enhance: [Cherry-pick] cache collection schema attributes to reduce proxy cpu (#29668) (#29692)
Cherry-pick from master
pr: #29668

See also #29113

The collection schema is crucial when performing search/query but some
of the information is calculated for every request.

This PR change schema field of cached collection info into a utility
`schemaInfo` type to store some stable result, say pk field,
partitionKeyEnabled, etc. And provided field name to id map for
search/query services.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-07 22:36:48 +08:00