- Use ModTime from os.FileInfo directly instead of custom getModTime
implementation.
- Removed test case for getModTime().
Signed-off-by: Gofastasf gofastasf@gmail.com
Signed-off-by: Go gofastasf@gmail.com
Signed-off-by: Gofastasf <gofastasf@gmail.com>
issue: #39680
if compaction/gc happens, load collection may stuck due to
SegmentNotFound, we should trigger UpdateNextTarget to get a new data
view to execute loading operation.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #39681
this PR maintain workload effect in action instead of computing workload
effect from target, which may cause leak if target changes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Previous PR #39437 only print log and add index while load operation is
still executed. This PR return early when segment decides not to load PK
index.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #38399
- Add a pchannel level checkpoint for flush processing
- Refactor the recovery of flushers of wal
- make a shared wal scanner first, then make multi datasyncservice on it
Signed-off-by: chyezh <chyezh@outlook.com>
Use string array for SealedSegmentIDs to prevent precision loss in JSON
parsers. Large integers (int64) may be incorrectly rounded when parsed
as double.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See: #39697
In sync operations, the type conversions from message to insert data
always result in a memory copy, which is not necessary if the converting
type is identical.
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
enhance: Add schema update time verification for insert and upsert to
use cache
issue: https://github.com/milvus-io/milvus/issues/39093
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
issue: #36621#39417
1. Adjust the server-side cache size.
2. Add source information for configurations.
3. Add node ID for compaction and indexing tasks.
4. Resolve localhost access issues to fix health check failures for
etcd.
Signed-off-by: jaime <yun.zhang@zilliz.com>
issue: #38399
- broadcast message can carry multi resource key now.
- implement event-based notification for broadcast messages
- broadcast message use broadcast id as a unique identifier in message
- broadcasted message on vchannels keep the broadcasted vchannel now.
- broadcasted message and broadcast message have a common broadcast
header now.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #38399
- Make the wal scanner interface same with streaming scanner.
- Use wal if the wal is located at current node.
- Otherwise fallback the old logic.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #38399
- The stats may be kept after wal closing if the growing segment is not
dirty.
- Change the error handling of wal open to avoid redundant manager api
call.
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #39596
When updating the build param configuration, the `Formatter` could be
used to do so and completed avoid touching the `overlay` config items
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
The dsl fields are separated in sub request structs and cannot be
easiliy printed before. This PR adds a log helper to print the dsl
expressions of HybridSearchRequest.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #38399
- Embed the query node into streaming node to make delegator available
at streaming node.
- The embedded query node has a special server label
`QUERYNODE_STREAMING-EMBEDDED`.
- Change the balance strategy to make the channel assigned to streaming
node as much as possible.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: https://github.com/milvus-io/milvus/issues/37630
Reduce the frequency updating metrics to avoid holding the mutex for
long periods.
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
The error message was malformated or missing some meta info, say field
name. This PR recitfies some message format and add field name in error
message when type param check fails.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Previously the grep with regex does not work and failed to match lots of
.cpp files
This PR:
- use "-E" flag to use regex match
- commit the fixed result of current cpp code
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
When there are many segment tasks in the querycoord scheduler, the
traversal in `GetSegmentTaskDelta` checks becomes time-consuming. This
PR adds caching for segment deltas.
issue: https://github.com/milvus-io/milvus/issues/37630
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: Wei Liu <wei.liu@zilliz.com>
Read metadata such as segments, binlogs, and partitions concurrently at
the collection level.
issue: https://github.com/milvus-io/milvus/issues/37630
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
1. Limit the maximum number of restored segments to 1024.
2. Fail the import job if saving binlog fails.
3. Fail the import job if saving the import task fails to prevent
repeatedly generating dirty importing segments.
issue: https://github.com/milvus-io/milvus/issues/39331
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #39339
Extra indexes can be ignored for most cases since sorted pk column
already provided indexing features
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
By recording the active collection lists, The l0 compaction trigger
of view change and idle won't influence each other.
Also this pr replaces the L0View cache with real L0 segments' change.
Save some memory and make L0 compaction triggers more accurate.
See also: #39187
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Related to #39296
The case initialized with {100:8 ,101: 16}. After first assignment, the
slots become {100:8, 101:8} and the following result is not stable.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to previous pr #39279
When NewCollection returns nil, the error shall be returned and handled
by caller
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #39205
Previous PR #39206
This PR change wait timeout behavior to log error and return to avoid
making other collection read failure in only some collections have
deadlock
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This PR limits the maximum number of consumers per pchannel to 10 for
each QueryNode and DataNode.
issue: https://github.com/milvus-io/milvus/issues/37630
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
1. Provide partition&channel level indexing in the collection target.
2. Make `SegmentAction` not wait for distribution.
3. Remove scheduler and target manager mutex.
4. Optimize logging to reduce CPU overhead.
issue: https://github.com/milvus-io/milvus/issues/37630
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
1. Make the segment loader lock protect only the resource.
2. Optimize GetDiskUsage to avoid excessive overhead.
issue: https://github.com/milvus-io/milvus/issues/37630
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
1. DataNode: Skip generating BF during the insert phase (BF will be
regenerated during the sync phase).
2. QueryNode: Skip generating or maintaining BF for growing segments;
deletion checks will be handled in the segcore.
issue: https://github.com/milvus-io/milvus/issues/37630
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
1. Using secondary index to avoid retrieving all segments at
`GetSegmentsChanPart`.
2. Perform batch SetAllocations to reduce the number of times the meta
lock is acquired.
issue: https://github.com/milvus-io/milvus/issues/37630
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #39205
This PR merge `RLock` & `PinIfNotReleased` into `PinIf` function
preventing segment being released before any Read operation finished.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #38399
- make broadcast service available for msgstream by reusing the
architecture streaming service
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #39003
Previous PR #39004 has to clone & flip bitset due to bitset does not
support find0 operator. #39176 added this feature so clone & flip could
be removed now.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #38715
- Current milvus use a serialized index size(compressed) for estimate
resource for loading.
- Add a new field `MemSize` (before compressing) for index to estimate
resource.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #39124
`bitset::find_first()` and `bitset::find_next()` now accept one more
parameter, which allows to search for `0` bit instead of `1` bit
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
issue: #38970
cause the stopping balance channel still use the row_count_based policy,
which may causes channel unbalance in multi-collection case.
This PR impl a score based stopping balance channel policy.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
instead of marking as not supported,
`ChunkedSparseFloatColumn::DataByteSize` can simply use the impl of
super class.
issue: https://github.com/milvus-io/milvus/issues/39158
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #39001
Background:
Segment Load Version: Each segment load request assigns a timestamp as
its version. When multiple copies of a segment are loaded on different
QueryNodes, the leader checker uses this version to identify the latest
copy and updates the routing table in the leader view to point to it.
Delegator Router Version: When a delegator builds a route to a QueryNode
that has loaded a segment, it also records the segment's version.
Router Table Update Logic: If the leader checker detects that the
version of a segment in the routing table does not match the version in
the worker, it updates the routing table to point to the QueryNode with
the latest version. Additionally, it updates the segment's load version
in the QueryNode during this process.
Issue:
When a channel is undergoing load balancing, the leader checker may sync
the routing table to a new delegator. This sync operation modifies the
segment's load version, which invalidates the routing in the old
delegator. Subsequently, the leader checker updates the routing table in
the old delegator, breaking the routing in the new delegator. This cycle
continues, causing repeated updates and inconsistencies.
Fix:
This PR introduces two changes to address the issue:
1. Use NodeID to verify whether the delegator's routing table needs an
update, avoiding unnecessary modifications.
2. Ensure compatibility by using the latest segment's load version as
the version recorded in the routing table.
These changes resolve the cyclic updates and prevent the leader checker
from generating excessive duplicate tasks, ensuring routing stability
across delegators during load balancing.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #38399
- Add new rpc for transfer broadcast to streaming coord
- Add broadcast service at streaming coord to make broadcast message
sent automicly
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #39083
/kind improvement
Three new functions of the static web page:
1. The input box can be expanded and scrolled if it exceeds the maximum
size
2. Input history
3. It will simply check whether the quotation marks and brackets appear
in pairs
Signed-off-by: SimFG <bang.fu@zilliz.com>
issue: #38399
We want to support broadcast operation for both streaming and msgstream.
But msgstream can be only sent message from rootcoord and proxy.
So this pr move the streamingcoord to rootcoord to make easier
implementation.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #39003
Copying bitset value bit by bit is slow and CPU heavy, this PR utilizes
bitset operator "|=" to accelerate this procedure
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
When MvccTimestamp is set, it could be used as guarantee timestamp
directly instead of new ts allocated by scheduler reducing the waiting
time when delegator has tsafe lag
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
fix#38649
when partition load failed, the partition drop will also fail due to the
wrong error message
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
issue: #38399
- use last allocate but not last confirmed id to make barrier.
- move the barrier logic into the timetick allocator.
- try to sync up local allocator and remote allocator when first barrier
check not pass to speed up.
Signed-off-by: chyezh <chyezh@outlook.com>
Related to previous PR #38157
If mmapped row is too small, frequent fwrite call still cost too much
cpu time for context switching. This PR add buffered write to avoid this
bad case with extra buffer per variable field.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
related: #37031
* built-in privilege group privileges in listPrivilegeGroups() should be
the same as in milvus.yaml
* collections granted by collection level built-in privilege group
should be list in showCollections()
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
issue: #38718
The balancer calculates the workload of executing tasks as an ongoing
score for target nodes. However, a logic issue arises when
GetSegmentTaskDelta or GetChannelTaskDelta is called with
collectionID=-1, which incorrectly returns zero.
Due to the incorrect global score, the executing task's workload is not
properly reflected for each collection. Consequently, each collection
submits its own balance task, leading to the balancer assigning
excessive tasks to the same QueryNode.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>