Related to #35303
This PR add a param item to support change l0 forward behavior from bf
filtering and forward to remote load.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See #36122
This PR is designed to enhance log performance through two improvements:
1. Optimize JSON encoding by switching JSON serializer to
`json-iterator`.
2. Adding support of lazy initialization `WithLazy`.
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
Related to #36102
This PR use newly added `grpcSizeStatsHandler` to reduce calling
`proto.Size` since the request & response size info is recorded by grpc
framework.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
fix not append valid data when transfer to insert record and add a tiny
check when in groupBy field.
#35924
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
Related to #35927
There are serveral issue this PR addresses:
- Use `ResetTraceConfig` method instead init one in update event handler
- Implement dynamic stats.Handler to receive tracing config update event
- Update `enable_trace` flag when `ResetTraceConfig` is invoked
- Change `enable_trace` to `std::atomic<bool>` in case of data race
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33744
This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
issue: #33285
- using streaming service in insert/upsert/flush/delete/querynode
- fixup flusher bugs and refactor the flush operation
- enable streaming service for dml and ddl
- pass the e2e when enabling streaming service
- pass the integration tst when enabling streaming service
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
not support nullable==true in vector type. check it when create
collection.
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
issue: #35570
milvus support config cache to spped up config access, but only evict
param's cache when param has been updated. but milvus's param may rely
on other param's value, let's say ParamsA relys on paramsB, when paramsB
updated, it will evict paramB's cache, but the paramA's cache still keep
the old value.
This PR evict all config cache to solve the above issue, cause dynamic
update config won't be much frequetly.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33285
- support transaction on single wal.
- last confirmed message id can still be used when enable transaction.
- add fence operation for segment allocation interceptor.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Some mockery cmd is out-of-date and fail to work. This PR update these
commands to match current pkg.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #29419
* If a sparse vector with 0 non-zero value is inserted, no ANN search on
this sparse vector field will return it as a result. User may retrieve
this row via scalar query or ANN search on another vector field though.
* If the user uses an empty sparse vector as the query vector for a ANN
search, no neighbor will be returned.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #35170
This PR enable to set load configs in cluster level, such as replicas
and resource groups. then when load collections will use the load
config.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33285
- move streaming related proto into pkg.
- add v2 message type and change flush message into v2 message.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- implement streaming service client.
- implement producing and consuming service client by streaming coord
client and streaming node client.
Signed-off-by: chyezh <chyezh@outlook.com>
`stackerror` pkg is imported by accident and could be replaced by
cockroachdb errors lib. This PR removes import line and fix this
problem.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
At most cases, data in each channel is almost evenly distributed, we
could utilize the channel num info to optimize searh param in queryHook
Signed-off-by: chasingegg <chao.gao@zilliz.com>
1.fix compaction task not be cleaned correctly
2.add a new parameter to control compaction gc loop interval
3.remove some useless configs of clustering compaction
bug: #34764
Signed-off-by: wayblink <anyang.wang@zilliz.com>
See also #34746
This PR add segment level field in response of
`GetPersistentSegmentInfo` and `GetQuerySegmentInfo`
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33285
- register streaming coord service into datacoord.
- add new streaming node role.
- add global static switch to enable streaming service or not.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- add specialized mutable and immutable message, make type safe.
- add version based constructor and type.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #34798
after we remove the task priority on query coord, to avoid load/release
segment blocked by too much balance task, we limit the balance task size
in each round. at same time, we reduce the balance interval to trigger
balance more frequently.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
related with #34917
This PR updates the func utils get IP address function to support
getting a IPv6 address if it is available and return it. I've tested
that this works locally when running inside a docker container that has
an IPv6 network address.
This PR should address
https://github.com/milvus-io/milvus/discussions/20217
Thank you for review!
Signed-off-by: David Pichler <david.pichler@grainger.com>
Co-authored-by: David Pichler <david.pichler@grainger.com>
issue: #33285
- make message builder and message conversion type safe
- add adaptor type and function to adapt old msgstream msgpack and
message interface
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Seals the largest growing segment if the total size of growing segments
of each shard exceeds the size threshold(default 4GB). Introducing this
policy can help keep the size of growing segments within a suitable
level, alleviating the pressure on the delegator.
issue: https://github.com/milvus-io/milvus/issues/34554
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
See also #34670
This PR add quota configuration for l0 segment entry number per
collection. If l0 compaction cannot keep up the insertion/upsertion
rate, this feature could back press the related rate.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33285
- add idAlloc interface
- fix binary unsafe bug for message
- fix service discovery lost when repeated address with different server
id
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #34595
When consuming insert data on the delegator node, QueryCoord will move
out some sealed segments to manage its memory usage. After the growing
segment gets flushed, some sealed segments from other workers will be
moved back to the delegator node. To avoid the frequent movement of
segments, we estimate the maximum growing row count and preserve a
fixed-size memory in the delegator node.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33285
- add two grpc resolver (by session and by streaming coord assignment
service)
- add one grpc balancer (by serverID and roundrobin)
- add lazy conn to avoid block by first service discovery
- add some utility function for streaming service
Signed-off-by: chyezh <chyezh@outlook.com>
See also #34574
Add jitter for segment seal proportion to avoid seal operation burst in
short period of time.
This PR also fix license header in paramtable pkg.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: # #34545
Print warn log instead of check health fail if orphan channel cp meta is
found in health check request.
Signed-off-by: jaime <yun.zhang@zilliz.com>
When the main dispatcher has not yet consumed data, curTs is 0. During
this time, merging dispatchers should not be allowed; otherwise, the
data of the solo dispatcher will be skipped.
issue: https://github.com/milvus-io/milvus/issues/34255
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
See also #34483
Some lint issues are introduced due to lack of static check run. This PR
fixes these problems.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33285
- implement producing and consuming server of message
- implement management operation for streaming node server
---------
Signed-off-by: chyezh <chyezh@outlook.com>
See also #33785
When config item is not present in paramtable, CAS fails due to
GetConfig returns error.
This PR make this returned err instance of ErrKeyNotFound and check
error type in \`CASCachedValue\` methods.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
related: #30376
fix: paritionIDs lost when no setting paritions
enhance: refine metrics for segment prune
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
issue: #34304
cosine is more widely used in float vectors, and cosine and hamming
distance are 'metrics' which have good geometric properties
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Check if the segment exists during FlushSegments and add some key logs
in write path.
issue: https://github.com/milvus-io/milvus/issues/34255
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Some lint issue is not detect due to recent static check pipeline issue.
This PR fixes these problem and Go milvusclient testcases.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Converting the same msgposition's vchannel to a pchannel multiple times
would result in an invalid pchannel, leading to seek failure and panic.
This PR:
1. Make a copy of msgposition in msgdispatcher.
2. Check if channel is already a pchannel, no further channel conversion
is performed.
issue: https://github.com/milvus-io/milvus/issues/34221
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #33285
- use reader but not consumer for pulsar
- advanced test framework
- move some streaming related package into pkg
---------
Signed-off-by: chyezh <chyezh@outlook.com>
See also #33823
`EvictCacheValueByFormat` may be block by on going `CASCacheValue` and
cause possible deadlock
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #33483
Wrap `C.InitTrace` & `C.SetTrace` with timeout preventing otlp
initializtion hangs forever when endpoint is not set correctly
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32995
To speed up the construction and querying of Bloom filters, we chose a
blocked Bloom filter instead of a basic Bloom filter implementation.
WARN: This PR is compatible with old version bf impl, but if fall back
to old milvus version, it may causes bloom filter deserialize failed.
In single Bloom filter test cases with a capacity of 1,000,000 and a
false positive rate (FPR) of 0.001, the blocked Bloom filter is 5 times
faster than the basic Bloom filter in both querying and construction, at
the cost of a 30% increase in memory usage.
- Block BF construct time {"time": "54.128131ms"}
- Block BF size {"size": 3021578}
- Block BF Test cost {"time": "55.407352ms"}
- Basic BF construct time {"time": "210.262183ms"}
- Basic BF size {"size": 2396308}
- Basic BF Test cost {"time": "192.596229ms"}
In multi Bloom filter test cases with a capacity of 100,000, an FPR of
0.001, and 100 Bloom filters, we reuse the primary key locations for all
Bloom filters to avoid repeated hash computations. As a result, the
blocked Bloom filter is also 5 times faster than the basic Bloom filter
in querying.
- Block BF TestLocation cost {"time": "529.97183ms"}
- Basic BF TestLocation cost {"time": "3.197430181s"}
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
related: #33137
adding has_more_result_tag for various level's reduce to rectify
reduce_stop_for_best
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
YAML will automatically parse "off" as a boolean variable. We should
avoid using "off" in the future.
issue: https://github.com/milvus-io/milvus/issues/32772
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #30040
This PR introduce two database level props:
1. database.replica.number
2. database.resource_groups
User can set those two database props by AlterDatabase API, then can
load collection without specified replica_num and resource groups. then
it will use database level load param when try to load collections.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
1. use a small warmup pool to reduce the impact of warmup
2. change the warmup pool to nonblocking mode
3. disable warmup by default
4. remove the maximum size limit of 16 for the load pool
issue: https://github.com/milvus-io/milvus/issues/32772
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: xiaofanluan <xiaofan.luan@zilliz.com>
issue: #29419
also re-enabled an e2e test using restful api, which is previously
disabled due to https://github.com/milvus-io/milvus/issues/32214.
In restful api, the accepted json formats of sparse float vector are:
* `{"indices": [1, 100, 1000], "values": [0.1, 0.2, 0.3]}`
* {"1": 0.1, "100": 0.2, "1000": 0.3}
for accepted indice and value range, see
https://milvus.io/docs/sparse_vector.md#FAQ
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
when milvus process delete record, it need to find record's corresponded
segment by bloom filter, and higher bloom filter fp rate will cause
delete record forwards to wrong segments.
This PR Decrease bloom filter's default fp to 0.001.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #33122
This pr add param item `mq.ignoreBadPosition` to control behavior when
mq failed to parse message id from checkpoint
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Query slot of compaction in datanode, and transfer the control logic for
limiting compaction tasks from datacoord to the datanode.
issue: https://github.com/milvus-io/milvus/issues/32809
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
See also #33062
This PR:
- Add `lock.RWMutex` & `lock.Mutex` alias to switch implementation based
on build flags
- When build flags has `test` in it, use `go-deadlock` to detect
possible deadlocks
- Replace all `sync.RWMutex` & `sync.Mutex` in datacoord pkg
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Bumping version to v2.4.2. Also bump milvus-proto version to v2.4.3.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32206, #32801
- search failure with some assertion, segment not loaded and resource
insufficient.
- segment leak when query segments
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #32910
* split replica's node list to channels when create replicas
* balance nodes among channels when node change happens
* implement channel level balance, let balance happens in channel level
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #32748
This PR:
- Add `metautil.Channel` utiltiy which convert virtual name to physical
channel name, collectionID and shard idx
- Add channel mapper interface & implementation to convert limited
physical channel name into int index
- Apply `metautil.Channel` filter in querynode segment manager logic
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32663
- Use new param to control request resource timeout for lazy load.
- Remove the timeout parameter of `Do`, remove `DoWait`. use `context`
to control the timeout.
- Use `VersionedNotifier` to avoid notify event lost and broadcast,
remove the redundant goroutine in cache.
related dev pr: #32684
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #32598
Use `WithBlock` may fail fast when create etcd client to some invalid
etcd endpoints and make it easier to check problem.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
related to #32165
1. for all the manager, support collection level index
2. remove collection level filter to avoid extra cpu usage when
collection number increases
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>