[skip ci] Translate doc proxy-reduce-cn.md to english (#11448)

Signed-off-by: shaoyue.chen <shaoyue.chen@zilliz.com>
2021-11-09 19:46:31 +08:00 · 2021-11-09 19:46:31 +08:00 · 4be47880ba
parent 98f8976f0c
commit 4be47880ba
1 changed files with 40 additions and 1 deletions
--- a/docs/developer_guides/proxy-reduce.md
+++ b/docs/developer_guides/proxy-reduce.md
@ -13,4 +13,43 @@ For each query, the top k hit results are in descending order of score. The larg
 Therefore, we will only discuss how the proxy merges the results for one query result. For NQ query results, we can loop through NQ or process them in parallel.

 So the problem degenerates to how to get the maximum number of 10 (TOPK) results from these four sorted arrays. As shown in the figure below:
-![final_result](./figs/reduce_results.png)
+![final_result](./figs/reduce_results.png)
+
+## K-Way Merge Algorithm
+
+Pesudocode of this algorithm is shown below:
+
+```golang
+n = 4
+multiple_results = [[topk results 1], [topk results 2], [topk results 3], [topk results 4]]
+locs = [0, 0, 0, 0]
+topk_results = []
+for i -> topk:
+	score = min_score
+	choice = -1
+	for j -> n:
+		choiceOffset = locs[j]
+		if choiceOffset > topk:
+			// all result from this way has been got, got from other way
+			continue
+		score_this_way = multiple_results[j][choiceOffset]
+		if score_this_way > score:
+			choice = j
+			score = score_this_way
+	if choice != -1:
+		// update location
+		locs[choice]++
+		topk_results = append(topk_results, choice)
+```
+
+This algorithm is originated from the merging phase of merge sort. The common point of the two is that the results have been sorted when merging, and the difference is that merge sort merges two-way results, proxy reduce merges multiple results.
+
+In contrast, in merge sort, two pointers are used to record the offsets of the two-way results, and proxy reduce uses multiple pointers `locs` to record the offsets of the `k-way` results.
+
+In our specific situation, n indicates that there are 4 results to be merged, `multiple_results` is an array of four `topk`, and each `choiceOffset` in `locs` records the offset of each way being merged.
+
+The `score_this_way` corresponding to this offset records the maximum value of the current way, so when you take a larger `score`, you only need to pick one of the four maximum values.
+
+This ensures that the result we take each time is the largest among the remaining results.
+
+This algorithm will scan all Search Results linearly at most, hence the time complexity of this algorithm is n \* topk.