Limit the maximum concurrency of channel tasks for each DataNode to
prevent excessive subscriptions from causing DataNode OOM.
issue: https://github.com/milvus-io/milvus/issues/37665
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #37665
Thread number went rocket high when there is lots of kafka consumers on
datanode. Since the internal implementation is CGO, using which directly
will make cgo thread leaked.
This PR add a worker pool for kafka API utilzing CGO calls to limit
thread number.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
When there are a large number of segments, the metrics consume a lot of
memory. This PR Remove segment-level tag from monitoring metrics.
issue: https://github.com/milvus-io/milvus/issues/37636
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: ##36621
- For simple types in a struct, add "string" to the JSON tag for
automatic string conversion during JSON encoding.
- For complex types in a struct, replace "int64" with "string."
Signed-off-by: jaime <yun.zhang@zilliz.com>
Related to #35853
This PR contains following changes:
- Add function and related proto and helper functions
- Remove the insert column missing check and leave it to server
- Add text as search input data
- Add some unit tests for logic above
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #34298
Viper uses yaml.v2 as the parser. This PR will adopt the parsing logic
from Viper to handle YAML files, ensuring maximum consistency in
parsing.
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
issue: #33285
- Modify the proto of consumer of streaming service.
- Make VChannel as a required option for streaming
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #37172
- add redo interceptor to implement append context refresh. (make new
timetick)
- add create segment handler for flusher.
- make empty segment flushable and directly change it into dropped.
- add create segment message into wal when creating new growing segment.
- make the insert operation into following seq: createSegment -> insert
-> insert -> flushSegment.
- make manual flush into following seq: flushTs -> flushsegment ->
flushsegment -> manualflush.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #37166
cause the misuse of timer.Reset, which cause dispatcher failed to send
msg to virtual channel buffer, and dispatcher do splitting again and
again, which hold the dispatcher manager's lock, block watching channel
progress.
This PR fix the misuse of timer.Reset
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #37404#37402
IP address in paramtable need validation and fail fast with reasonable
error message
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cgo API cost is not observerable since not metrics is related to them.
This PR add metrics for some sync cgo call related to load & write
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #34298
fix key: null defined in the yaml file.
viper will parse it as "", and yaml v3 will parse it as "null".
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
issue: #34298
because all vector index config checker has been moved into
vector_index_checker, then the useless checkers can be removed.
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
issue:https://github.com/milvus-io/milvus/issues/27576
# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.
# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.
---------
Signed-off-by: tasty-gumi <1021989072@qq.com>
Related to #36102
Previous PR #36107 add grpc inteceptor to observe rpc stats. Using same
strategy, this pr add gin middleware to observer restful v2 rpc stats.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.
This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.
The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.
See also: #37108, #37015
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #36621
1. Add API to access task runtime metrics, including:
- build index task
- compaction task
- import task
- balance (including load/release of segments/channels and some leader
tasks on querycoord)
- sync task
2. Add a debug model to the webpage by using debug=true or debug=false
in the URL query parameters to enable or disable debug mode.
Signed-off-by: jaime <yun.zhang@zilliz.com>
Related to #35303
This PR add metrics for querynode delegator delete buffer information,
which is related to dml quota logic.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #35576
This pr is to cover those cases when queryHook optimize search params
and make the result size insufficient, add retry search mechanism and
add related metrics for alarming.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Knowhere introduced FAISS-based HNSQ Flat/SQ/PQ/PRQ. This PR adds param
check in Milvus side.
This PR actually do nothing but let the check pass, since:
1. All indexes param check will be moved to Knowhere recently. @foxspy
2. The index name will be changed from `FAISS_HNSW_xx` to `HNSW_xx`
after QA test. @alexanderguzhva
Signed-off-by: Li Liu <li.liu@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/35853
* BM25 Function now takes no params, k1, b should be passed via index
params
* support BM25 full text search when metric type is not present in
search request
* add more strict validation with functions at collection creation time
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
This PR splits sealed segment to chunked data to avoid unnecessary
memory copy and save memory usage when loading segments so that loading
can be accelerated.
To support rollback to previous version, we add an option
`multipleChunkedEnable` which is false by default.
Signed-off-by: sunby <sunbingyi1992@gmail.com>
issue: #35922
add an enable_tokenizer param to varchar field: must be set to true so
that a varchar field can enable_match or used as input of BM25 function
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
See #36550
This PR made 2 changes:
1. Introducing a prioritization mechanism, if
`dataCoord.compaction.taskPrioritizer` is set to `level`, compaction
tasks are always executed as the priority of L0>Mix>Clustering
2. `dataCoord.compaction.maxParallelTaskNum` now controls the
parallelism of executing tasks, not the task number of queue +
executing.
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
5 percent of free memory is too less for l0 compaction. This pr will
raise it to 50 percent.
See also: #36614
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
1. Optimize import scheduling strategic:
a. Revise slot weights, calculating them based on the number of files
and segments for both import and pre-import tasks.
b. Ensure that the DN executes tasks in ascending order of task ID.
2. Add time cost metric and log.
issue: https://github.com/milvus-io/milvus/issues/36600,
https://github.com/milvus-io/milvus/issues/36518
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Native support for Google cloud storage using the Google Cloud Storage
libraries. Authentication is performed using GCS service account
credentials JSON.
Currently, Milvus supports Google Cloud Storage using S3-compatible APIs
via the AWS SDK. This approach has the following limitations:
1. Overhead: Translating requests between S3-compatible APIs and GCS can
introduce additional overhead.
2. Compatibility Limitations: Some features of the original S3 API may
not fully translate or work as expected with GCS.
To address these limitations, This enhancement is needed.
Related Issue: #36212
issue: #35821
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.
milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.
Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35859
This PR introduce two new param: toleranceFactor and checkRequestNum,
after every checkRequestNum request has been assigned, try to compute
querynode's workload score.
if the diff is less than the toleranceFactor, replica selection policy
will fallback to round_robin, which reduce the average cost to about
500ns.
if the diff is larger than the toleranceFactor, replica selection policy
will compute querynode's score to select the target node with smallest
score in every assigment.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #35303
This PR add a param item to support change l0 forward behavior from bf
filtering and forward to remote load.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See #36122
This PR is designed to enhance log performance through two improvements:
1. Optimize JSON encoding by switching JSON serializer to
`json-iterator`.
2. Adding support of lazy initialization `WithLazy`.
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
Related to #36102
This PR use newly added `grpcSizeStatsHandler` to reduce calling
`proto.Size` since the request & response size info is recorded by grpc
framework.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
fix not append valid data when transfer to insert record and add a tiny
check when in groupBy field.
#35924
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
Related to #35927
There are serveral issue this PR addresses:
- Use `ResetTraceConfig` method instead init one in update event handler
- Implement dynamic stats.Handler to receive tracing config update event
- Update `enable_trace` flag when `ResetTraceConfig` is invoked
- Change `enable_trace` to `std::atomic<bool>` in case of data race
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33744
This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
issue: #33285
- using streaming service in insert/upsert/flush/delete/querynode
- fixup flusher bugs and refactor the flush operation
- enable streaming service for dml and ddl
- pass the e2e when enabling streaming service
- pass the integration tst when enabling streaming service
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
not support nullable==true in vector type. check it when create
collection.
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
issue: #35570
milvus support config cache to spped up config access, but only evict
param's cache when param has been updated. but milvus's param may rely
on other param's value, let's say ParamsA relys on paramsB, when paramsB
updated, it will evict paramB's cache, but the paramA's cache still keep
the old value.
This PR evict all config cache to solve the above issue, cause dynamic
update config won't be much frequetly.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33285
- support transaction on single wal.
- last confirmed message id can still be used when enable transaction.
- add fence operation for segment allocation interceptor.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Some mockery cmd is out-of-date and fail to work. This PR update these
commands to match current pkg.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #29419
* If a sparse vector with 0 non-zero value is inserted, no ANN search on
this sparse vector field will return it as a result. User may retrieve
this row via scalar query or ANN search on another vector field though.
* If the user uses an empty sparse vector as the query vector for a ANN
search, no neighbor will be returned.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #35170
This PR enable to set load configs in cluster level, such as replicas
and resource groups. then when load collections will use the load
config.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33285
- move streaming related proto into pkg.
- add v2 message type and change flush message into v2 message.
Signed-off-by: chyezh <chyezh@outlook.com>