This PR also fixes bugs in l0 compactor where
l0 results would never be removed from datanode
See also: #30099
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
don't store logPath in meta to reduce memory, when service get
segmentinfo, generate logpath from logid.
#28885
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
issue: #30000
related to: [milvus-proto
#202](https://github.com/milvus-io/milvus-proto/pull/202)
1. replace collSchema.AutoID with primaryField.AutoID
2. show `enableDynamic` & `enableDynamicField` at the same time
3. avoid data race about the access to metacache
Signed-off-by: PowderLi <min.li@zilliz.com>
issue: #29841
This PR fix leader checker use wrong check interval, which causes leader
checker trigger too frequency
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #29841
if segment loaded, submit load segment task for it isn't permitted, to
avoid load segment twice. but this logic blocks the leader checker to
correct leader view by `LoadSegment`
This PR remove the segment loaded check, to fix that leader checker
cann't submit load task
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
this avoids a corner case: after load index failed, this index can be
never loaded as it has been added into the segment's index map
Signed-off-by: yah01 <yang.cen@zilliz.com>
issue: #29677#29838
during get shard leaders, if qeurynode doesn't ack the heartbeat than
10s, querycoord will treat it as unavailable, and won't return shard
leader on it. but when querynode has a full cpu usage, it's easily to
stuck for more than 10s without ack the heartbeat, which cause no shard
leader to search/query.
This PR remove heartbeat lag logic during get shard leaders
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
the recent changes move the level 0 segments list to a new proto field,
which leads to the QueryCoord can't see the level 0 segments, handle the
new changes
fix#29907
Signed-off-by: yah01 <yang.cen@zilliz.com>
when apply dynamic config changes, we should format the value to proper
unit
This PR fix update rate limit config with wrong value.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also: #29650
Either segment dml position & channel checkpoint could be newer in some
cases. This PR make PackLoadSegments use the newer one improving load
performance during cases where there are lots of upsert.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also: #29657
Datanode Compactor use estimated row number from schema to decide when
to sync the batch of data when executing compaction. This est value
could go way from actual size when the schema contains variable field(
say VarChar, JSON, etc.)
This PR make compactor able to check the actual buffer data size and
make it possible to sync when buffer is actually beyong max binglog
size.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #29814
if channel is not subscribed yet, the generated load segment task will
be remove from task scheduler due to the load segment task need to be
transfer to worker node by shard leader.
This PR skip generate load segment task when channel is not subscribed
yet.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Delete detail log will be large and hard to read when log level is
debug. This PR change the log to stringer and print only pk range,
number.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
If segment has more than 128 log fils, drop segment will exceed etcd txn
ops limit, which will failed the drop segment request
This PR drop segment meta info with prefix, to avoid drop segment meta
failed
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #29793
Use `DocSetCollector` instead of `TopDocsCollector`, which will avoid
scoring and sorting.
---------
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
See also #29803
This PR:
- Add trace span for `LoadIndex` & `LoadFieldData` in segment loader
- Add `TraceCtx` parameter for `Index.Load` in segcore
- Add span for ReadFiles & Engine Load for Memory/Disk Vector index
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also: #27675
The bloom filter set initialized new BF with fixed configured `n`. This
value is always larger than the actual batch size and causes generated
BF using more memory.
This PR make write buffer to initialize BF with estimated batch size
from schema & configuration value.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
the growing segments contribute to this metric while inserting and
putting into the manager, but the current impl inserts data before
putting the segments into manager, which leads to double contributions
fix: #29766
Signed-off-by: yah01 <yah2er0ne@outlook.com>
See also #29803
This PR:
- Add trace span for collection/partition load
- Use TraceSpan to generate Segment/ChannelTasks when loading
- Refine BaseTask trace tag usage
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also: #29113
Add a new utitliy function in `pkg/util/typetuil` to pre-allocate field
data slice capacity acoording to search limit. This shall avoid copying
the data during `AppendFieldData` when previous slice is out of space.
And shall also save CPU time during high paylog.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
fix: #29757
In previous code, `ColumnBasedInsertMsgToInsertData` adds empty field if
the insertMsg parameter does not have the column schema defined. This
may lead to unexpected behavior of caller functions.
This PR:
- Add column missing check
- Add column length check
- Generate BlobInfo for ColumnBasedInsertMsgToInsertData result
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
When the TimeTravel functionality was previously removed, it
inadvertently affected the MVCC functionality within the system. This PR
aims to reintroduce the internal MVCC functionality as follows:
1. Add MvccTimestamp to the requests of Search/Query and the results of
Search internally.
2. When the delegator receives a Query/Search request and there is no
MVCC timestamp set in the request, set the delegator's current tsafe as
the MVCC timestamp of the request. If the request already has an MVCC
timestamp, do not modify it.
3. When the Proxy handles Search and triggers the second phase ReQuery,
divide the ReQuery into different shards and pass the MVCC timestamp to
the corresponding Query requests.
issue: #29656
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
Once a role is granted to a user, the user should automatically possess
the privilege information associated with that role.
issue: #29710
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
See also #27349
The segment level label in querynode used `Legacy` before segment level
was correctly passed in Load request. Now this attribute is still using
legacy so the metrics does not look right.
This PR add paramter for `NewSegment` and passes corrent values for each
invocation.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This pr will make milvus load delta logs concurrently, which should
decrease the latency of loading a segment.
/kind improvement
---------
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
related to : #29417
cardinal indexes upload index files in `Serialize` interface, and throw
exception when the `Serialize` failed.
Signed-off-by: xianliang <xianliang.li@zilliz.com>
issue: #29453
sync distribution by rpc will also call loadSegment/releaseSegment,
which may cause all kinds of concurrent case on same segment, such as
concurrent load and release on one segment.
This PR add leader_checker which generate load/release task to correct
the leader view, instead of calling sync distribution by rpc
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
related: #25324
Search GroupBy function, used to aggregate result entities based on a
specific scalar column.
several points to mention:
1. Temporarliy, the whole groupby is implemented separated from
iterative expr framework **for the first period**
2. In the long term, the groupBy operation will be incorporated into the
iterative expr framework:https://github.com/milvus-io/milvus/pull/28166
3. This pr includes some unrelated mocked interface regarding alterIndex
due to some unworth-to-mention reasons. All these un-associated content
will be removed before the final pr is merged. This version of pr is
only for review
4. All other related details were commented in the files comparison
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/29230
this pr do these things:
1. add gpu brute force;
2. limit gpu index only support l2 / ip;
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
issue: #29672
the storage account need privileges of actions
`Microsoft.Storage/storageAccounts/blobServices/containers/blobs/*` at
least
Signed-off-by: PowderLi <min.li@zilliz.com>
This PR defines the new import reader interfaces and implement a binlog
reader for import.
issue: https://github.com/milvus-io/milvus/issues/28521
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
See also #29699
Querycoord panicked when tried to pop from an empty heap. We assume the
heap shall not be empty, but in some branch, the candidate is never
pushed back.
This PR put pop & push in a closure and adds a defer call to push item
back.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #27675
`Allocator.Alloc` and `Allocator.AllocOne` might be invoked multiple
times if there were multiple blobs set in one sync task.
This PR add pre-fetch logic for all blobs and cache logIDs in sync task
so that at most only one call of the allocator is needed.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>