the array type can't be compacted, the system could continue with the
inserted segments, but these segments can be never compacted
fix#29503
pr: #29505
---------
Signed-off-by: yah01 <yah2er0ne@outlook.com>
Cherry-pick from master
pr: #29474
See also #27515
When Delegator processes delete data, it forwards delete data with only
segment id specified. When two segments has same segment id but one is
growing and the other is sealed, the delete will be applied to both
segments which causes delete data out of order when concurrent load
segment occurs.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
add all loading data into a buffer and then copy them into the a
fit-in-size memory
pr: https://github.com/milvus-io/milvus/pull/29387
---------
Signed-off-by: yah01 <yah2er0ne@outlook.com>
Cherry-pick from master
pr: #29315
See also #29113
- Unify partition info refresh logic
- Prevent parse partition names for each partition key search request
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
cherry-pick from master
pr: #29343
See also #29332
The segment may be released before or during the request when delegator
tries to forward delete request to yet. Currently, these two situation
returns different error code.
In this particular case, ErrSegmentNotLoaded and ErrSegmentNotFound
shall both be ignored preventing return search service unavailable by
mistake.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry pick from master
pr: #29328
See also #29327
Change channel checkpoint metrics to unix seconds instead of checkpoint
timestamp lag value
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
related to #28549
pr: #28626
1. avoid duplicated sync segments under syncing states
2. add jitter to avoid sync segments at the same time
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
many growing segments may be created in a short time and there is no
restriction to the process, the CGO call will leave many threads
related: https://github.com/milvus-io/milvus/issues/29282
pr: #29306
Signed-off-by: yah01 <yah2er0ne@outlook.com>
issue: #23726
pr: #29231
This PR add control config to querycoord's background auto balance
channel operation
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #29243
only rootCoord read the configuration item `builtinRoles`, so proxy
never know whether the role to be deleted is builtin.
Signed-off-by: PowderLi <min.li@zilliz.com>
issue: #29004
master pr: #29055
add a new parameter: `params`, which is a map[string]float64;
but now only 2 valid item: radius + range_filter;
Signed-off-by: PowderLi <min.li@zilliz.com>
See also #29156
FlushTs need to to be reset to MaxUint64 after channel checkpoint is
after this timestamp. Otherwise, the segment will be shattered and flush
queue will be filled with tasks
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #28622
pr: #29216
After we support balance segment with growing segment count #28623, if
we balance segment and channel at same time, some segments need to be
rebalanced after balance channel finish.
This PR skip balance segment when channel need be balanced.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #23726
pr: #28469
1. enable auto balance channel between nodes in querycoord
2. make `genSegmentPlan` reuse the `AssignSegment` logic
3. make `genChannelPlan` reuse the `AssignChannel` logic
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #29195
See also #29113
rand.Seed is deprecated and cost noticable CPU time during heavy payload
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
we found the load got stuck probably, and reviewed the logs.
the target observer seems not working, the reason is the taskDispatcher
removes the task in a goroutine, and modifies the task status after
committing the task into the goroutine pool, but this may happen after
the task removed, which leads to the task will never be removed
related #29086
pr: #29191
Signed-off-by: yah01 <yang.cen@zilliz.com>
issue: #28622
pr: #28623
query node with delegator will has more rows than other query node due
to delgator loads all growing rows.
This PR enable the balance segment which based on the num of growing
rows in leader view.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry pick from master
pr: #29154
See also #29177
Add a config item for partition name as regexp feature and disable it by
default
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry pick from master
pr: #29143
See also #29113
Using zap.Stringer log field will evaluate log field value only when log
level meets the configuration, which could save some CPU time in search
route
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This pull request enhances the logging functionality in the code related
to target updating. It adds more logs about the condition satisfying
when updating the target. The logs provide additional information about
the collection ID, replica number, channel readiness, segment readiness,
and leader view readiness. These logs will help in troubleshooting and
monitoring the target updating process.
pr: #29090
Signed-off-by: yah01 <yah2er0ne@outlook.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
issue: #29068
master pr: #29075
wait server to listen the http port
then check whether various urls can be accessed normally
Signed-off-by: PowderLi <min.li@zilliz.com>
issue: [milvus-proto
#212](https://github.com/milvus-io/milvus-proto/issues/212)
master pr: #28961
milvus can't use partition related privileges until upgrade
milvus-proto, even if them were added to milvus-proto
Signed-off-by: PowderLi <min.li@zilliz.com>
Cherry-pick from master
pr: #29117
See also #29113
This patch:
- Replace plain Enforcer with `casbin.SyncedEnforcer`
- Add implementation of persist.Adapter with `MetaCacheCasbinAdapter`
- Invoke enforcer.LoadPolicy when policy updated
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #29058
See also #29057
Add wrapper to maintain client&connection
When reset operation is needed, close method shall wait until all
on-going request return
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
In order to minimize the CPU usage of the coroutine and avoid frequent
execution of time-consuming operations in the flowgraph when the message
stream consists solely of "ttMsg," it is recommended to implement a
mechanism for quickly bypassing the subsequent flowgraph node processing
logic.
If "ttMsg" is continuously received for a certain period of time
(coldTime), the flowgraph enters skipMode. Once in skipMode, every
skipNum "ttMsg" messages are merged into one for processing. If a
non-"ttMsg" message is received while in skipMode, the flowgraph exits
skipMode.
pr: #28756
Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: wayblink <anyang.wang@zilliz.com>
issue: #28960
master pr: #28961
add new configuration: builtinRoles
user can define roles in config file: milvus.yaml
there is an example:
db_ro, only have read privileges, include load
db_rw, read and write privileges, include create/drop/rename collection
db_admin, not only read and write privileges, but also user
administration
Signed-off-by: PowderLi <min.li@zilliz.com>
issue: #28781#28329
master pr: #28782
1. There is no need to call `DescribeCollection`, if the collection's
schema is found in the globalMetaCache
2. did `GetProperties` to check the access to Azure Blob Service
Signed-off-by: PowderLi <min.li@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/28496 /kind bug
pr: #28502
The input parameters collection.partitions and collection.Field are both
nil, so these two metas have not been cleared.
Signed-off-by: xige-16 <xi.ge@zilliz.com>
Co-authored-by: xige-16 <xi.ge@zilliz.com>
pr: #28891
This PR removed too frequency log for such
`DescribeCollection/ShowPartition` operation from root coord
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #28683 master pr: #28505
issue: #28686 master pr: #28703
1. update the base image: milvusdb/milvus-env (#28505) to avoid
downloading installation packages in CI workload install vcpkg and
install some packages in advance
2. use the latest image
2. update azure-identity-cpp from beta to release
Signed-off-by: PowderLi <min.li@zilliz.com>
for now the assert method in segcore could accept a string information,
too many codes don't print the value they assert.
make it happy
related #28811
pr: #28812
Signed-off-by: yah01 <yah2er0ne@outlook.com>
See also #28924
The compaction task generated before datanode finish SaveBinlogPath grpc
call contains segments which are still in Growing state DataNode shall
verify each non-levelzero segments before submit compaction task to
executor
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
pr: #28829
issue: #28831
release old delegator before new delegator update it's distribution may
cause `channel not available` error
This PR will block release old delgator before new delegator finish
`syncDistribution`
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry pick from master
pr: #28802
Now segcore load system field info as well, the growing segment
assertion shall not pass with "+ 2" value
This will cause all growing segments load failure
Fix#28801
Related to #28478
See also #28524
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This check rejects load request if running out the pool workers, but
small segment would be loaded soon, another segments would been loading
again after a check interval, which leads to slow loading for collection
Block the request by go pool
pr: #28518
Signed-off-by: yah01 <yah2er0ne@outlook.com>
Cherry-pick from master
pr: #28661
See also #28660
This pr add request timeout config item for etcd kv request timeout
Sync the default timeout value to same value for etcdKV & tikv config
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #28601
See also #28022#28034
The load segment may reaches before watch dml channel, so the index meta
may be empty as well
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
each node in flow graph alloc a goroutine, but it is actually executed
sequentially and can be placed in one goroutine. InputNode will consume
msg form msgstream, alloc one goroutine.
issue: #24826
pr: #28233
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
before this, Milvus use container/system's memory info to get the memory
usage, which could be inaccurate.
we allocates the memory by private anon mmap,
then rss - shared would be the accurate memory usage
resolve https://github.com/milvus-io/milvus/issues/28553
pr: #28554
---------
Signed-off-by: yah01 <yah2er0ne@outlook.com>
master pr: #28416
issue: #28365
Fix bug for parsing error when a string enclosed in single quotes in an
expression contains multiple double quotes.
such as:
```
expr = "tag == '\"blue\"'"
```
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
create goroutine only once when getOrCreateMergedTimeTickerSender
pr: #28594
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
Cherry-pick from master
pr: #28590#28598
See also #28589#28596
Increase ref for collection during load and unref after load completed.
Use the same logic protection from services.go `LoadSegments`
Perform `Unref` after release sealed segments
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #28332
pr: #28396
during querycoord's recover, it try to call `DescribeCollection` and
`ShowPartitions` to root coord, to checker whether collection or
partition has been released in rootcoord. but if rootcoord isn't not
ready yet, the rpc will fail, the querycoord panic.
to fix this, we remove rpc call during querycoord's start
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
We first fill the data into a vector and then copy it into the proto,
for some types (exclude variable-length types and int8, int16),
data could be directly copied into the proto.
Sealed segment has been optimized in
https://github.com/milvus-io/milvus/pull/28106.
pr: #28323
Signed-off-by: yah01 <yah2er0ne@outlook.com>
Cherry-pick from master
pr: #28472
See also #28466
In `taskDispatcher.schedule`, same task may be resubmitted if the
previous round did not finish
In this case, TaskObserver.check may set current target by mistake,
which may cause the random search/query failure
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
/kind improvement
Before this, while retrieving data (query/search), we first copy the
data into a fixed vector, and then copy data from this into the proto
field.
Now we can directly copy the data into the proto field.
This optimization can't be done with int8, int16 due to the proto
doesn't provide the two types, we store them in int32s
Also, this can't be done with variable length field like string, JSON,
see https://github.com/protocolbuffers/protobuf/issues/10866. I tried
but it seems proto doesn't guarantee the memory layout as we expected,
it crashed
pr: #28106
Signed-off-by: yah01 <yah2er0ne@outlook.com>
Remove the "failCount" log field, which is ambiguous
replace the status (int32) with string, to improve the readability for
log of task removed
pr: #28331
Signed-off-by: yah01 <yah2er0ne@outlook.com>
cherry pick from master
pr: #28393
- Use explicit lifetime control methods: `Start` and `Stop`
- Allow control retry option
- Make sure tt sender worker exit after `Stop` return
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
The proto generated files is out of sync for image build env
This will cause --dirty="-dev" tag in Milvus build version
Sync changed files to avoid this case
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>