feat: unambiguously reversible partition keys

This commit changes the format of partition keys when generated with
non-default partition key templates ONLY. A prior fixture test is
unchanged by this commit, ensuring the default partition keys remain
the same.

When a custom partition key template is provided, it may specify one or
more parts, with the TagValue template causing values extracted from tag
columns to appear in the derived partition key.

This commit changes the generated partition key in the following ways:

    * The delimiter of multi-part partition keys; the character used to
      delimit partition key parts is changed from "/" to "|" (the pipe
      character) as it is less likely to occur in user-provided input,
      reducing the encoding overhead.

    * The format of the extracted TagValue values (see below).

Building on the work of custom partition key overrides, where an
immutable partition template is resolved and set at table creation time,
the changes in this PR enable the derived partition key to be
unambiguously reversed into the set of tag (column_name, column_value)
tuples it was generated from for use in query pruning logic. This is
implemented by the build_column_values() method in this commit, which
requires both the template, and the derived partition key.

Prior to this commit, a partition key value extracted from a tag column
was in the form "tagname_x" where "x" is the value and "tagname" is the
name of the tag column it was extracted from. After this commit, the
partition key value is in the form "x"; the column name is removed from
the derived string to reduce the catalog storage overhead (a key driver
of COGS). In the case of a NULL tag value, the sentinel value "!" is
inserted instead of the prior "tagname_" marker. In the case of an empty
string tag value (""), the sentinel "^" value is inserted instead of the
"tagname_-" marker, ensuring the distinction between an empty value and
a not-present tag is preserved.

Additionally tag values utilise percent encoding to encode reserved
characters (part delimiter, empty sentinel character, % itself) to
eliminate deserialisation ambiguity.

Examples of how this has changed derived partition keys, for a template
of [Time(YYYY-MM-DD), TagValue(region), TagValue(bananas)]:

    Write: time=1970-01-01,region=west,other=ignored
        Old: "1970-01-01-region_west-bananas"
        New: "1970-01-01|west|!"

    Write: time=1970-01-01,other=ignored
        Old: "1970-01-01-region-bananas"
        New: "1970-01-01|!|!"
pull/24376/head
Dom Dwyer 2023-05-29 14:47:25 +02:00
parent 57ba3c8cf5
commit 27bef292a3
No known key found for this signature in database
GPG Key ID: E4C40DBD9157879A
7 changed files with 549 additions and 83 deletions

4
Cargo.lock generated
View File

@ -1383,6 +1383,8 @@ dependencies = [
"observability_deps",
"once_cell",
"ordered-float 3.7.0",
"paste",
"percent-encoding",
"proptest",
"schema",
"sqlx",
@ -3589,6 +3591,8 @@ dependencies = [
"iox_time",
"itertools",
"mutable_batch_lp",
"paste",
"percent-encoding",
"rand 0.8.5",
"schema",
"snafu",

View File

@ -19,7 +19,9 @@ sqlx = { version = "0.6", features = ["runtime-tokio-rustls", "postgres", "uuid"
thiserror = "1.0.40"
uuid = { version = "1", features = ["v4"] }
workspace-hack = { version = "0.1", path = "../workspace-hack" }
percent-encoding = "2.2.0"
[dev-dependencies] # In alphabetical order
assert_matches = "1"
paste = "1.0.12"
proptest = "1.2.0"

View File

@ -15,14 +15,116 @@
//! [`TablePartitionTemplateOverride`] stores only the override, if provided,
//! and implicitly resolves to the default partitioning scheme if no override is
//! specified (indicated by the presence of [`Option::None`] in the wrapper).
//!
//! ## Default Partition Key
//!
//! The default partition key format is specified by [`PARTITION_BY_DAY_PROTO`],
//! with a template consisting of a single part: a YYYY-MM-DD representation of
//! the time row timestamp.
//!
//! ## Partition Key Format
//!
//! Should a partition template be used that generates a partition key
//! containing more than one part, those parts are delimited by the `|`
//! character ([`PARTITION_KEY_DELIMITER]), chosen to be an unusual character
//! that is unlikely to occur in user-provided column values in order to
//! minimise the need to encode the value in the common case, while still
//! providing legible / printable keys. Should the user-provided column value
//! contain the `|` key, it is [percent encoded] (in addition to `!` below, and
//! the `%` character itself) to prevent ambiguity.
//!
//! It is an invariant that the resulting partition key derived from a given
//! template has the same number and ordering of parts.
//!
//! If the partition key template references a [`TemplatePart::TagValue`] column
//! that is not present in the row, a single `!` is inserted, indicating a NULL
//! template key part. If the value of the part is an empty string (""), a `^`
//! is inserted to ensure a non-empty partition key is always generated. Like
//! the `|` key above, any occurrence of these characters in a user-provided
//! column value is percent encoded.
//!
//! Because this serialisation format can be unambiguously reversed, the
//! [`build_column_values()`] function can be used to obtain the set of
//! [`TemplatePart::TagValue`] the key was constructed from.
//!
//! ### Reserved Characters
//!
//! Reserved characters that are percent encoded (in addition to non-printable
//! characters), and their meaning:
//!
//! * `|` - partition key part delimiter ([`PARTITION_KEY_DELIMITER`])
//! * `!` - NULL/missing partition key part ([`PARTITION_KEY_VALUE_NULL`])
//! * `^` - empty string partition key part ([`PARTITION_KEY_VALUE_EMPTY`])
//! * `%` - required for unambiguous reversal of percent encoding
//!
//! These characters are defined in [`ENCODED_PARTITION_KEY_CHARS`] and chosen
//! due to their low likelihood of occurrence in user-provided column values.
//!
//! ### Examples
//!
//! When using the partition template below:
//!
//! ```text
//! [
//! TemplatePart::TimeFormat("%Y"),
//! TemplatePart::TagValue("a"),
//! TemplatePart::TagValue("b")
//! ]
//! ```
//!
//! The following partition keys are derived:
//!
//! * `time=2023-01-01, a=bananas, b=plátanos` -> `2023|bananas|plátanos
//! * `time=2023-01-01, b=plátanos` -> `2023|!|plátanos`
//! * `time=2023-01-01, another=cat, b=plátanos` -> `2023|!|plátanos`
//! * `time=2023-01-01` -> `2023|!|!`
//! * `time=2023-01-01, a=cat|dog, b=!` -> `2023|cat%7Cdog|%21`
//! * `time=2023-01-01, a=%50` -> `2023|%2550|!`
//! * `time=2023-01-01, a=` -> `2023|^|!`
//!
//! When using the default partitioning template (YYYY-MM-DD) there is no
//! encoding necessary, as the derived partition key contains a single part, and
//! no reserved characters.
//!
//! [percent encoded]: https://url.spec.whatwg.org/#percent-encoded-bytes
use generated_types::influxdata::iox::partition_template::v1 as proto;
use once_cell::sync::Lazy;
use std::sync::Arc;
use percent_encoding::{percent_decode_str, AsciiSet, CONTROLS};
use std::{borrow::Cow, sync::Arc};
/// Allocationless and protobufless access to the parts of a template needed to actually do
/// partitioning.
#[derive(Debug)]
/// The sentinel character used to delimit partition key parts in the partition
/// key string.
pub const PARTITION_KEY_DELIMITER: char = '|';
/// The sentinel character used to indicate an empty string partition key part
/// in the partition key string.
pub const PARTITION_KEY_VALUE_EMPTY: char = '^';
/// The `str` form of the [`PARTITION_KEY_VALUE_EMPTY`] character.
pub const PARTITION_KEY_VALUE_EMPTY_STR: &str = "^";
/// The sentinel character used to indicate a missing partition key part in the
/// partition key string.
pub const PARTITION_KEY_VALUE_NULL: char = '!';
/// The `str` form of the [`PARTITION_KEY_VALUE_NULL`] character.
pub const PARTITION_KEY_VALUE_NULL_STR: &str = "!";
/// The minimal set of characters that must be encoded during partition key
/// generation when they form part of a partition key part, in order to be
/// unambiguously reversible.
///
/// See module-level documentation & [`build_column_values()`].
pub const ENCODED_PARTITION_KEY_CHARS: AsciiSet = CONTROLS
.add(PARTITION_KEY_DELIMITER as u8)
.add(PARTITION_KEY_VALUE_NULL as u8)
.add(PARTITION_KEY_VALUE_EMPTY as u8)
.add(b'%'); // Required for reversible unambiguous encoding
/// Allocationless and protobufless access to the parts of a template needed to
/// actually do partitioning.
#[derive(Debug, Clone)]
#[allow(missing_docs)]
pub enum TemplatePart<'a> {
TagValue(&'a str),
@ -140,6 +242,73 @@ where
}
}
/// Reverse a `partition_key` generated from the given partition key `template`,
/// reconstructing the set of tag values in the form of `(column name, column
/// value)` tuples that the `partition_key` was generated from.
///
/// The `partition_key` MUST have been generated by `template`.
///
/// Values are returned as a [`Cow`], avoiding the need for value copying if
/// they do not need decoding. See module docs for encoding/decoding.
///
/// # Panics
///
/// This method panics if a column value is not valid UTF8 after decoding.
pub fn build_column_values<'a>(
template: &'a TablePartitionTemplateOverride,
partition_key: &'a str,
) -> impl Iterator<Item = (&'a str, Cow<'a, str>)> {
// Exploded parts of the generated key on the "/" character.
//
// Any uses of the "/" character within the partition key's user-provided
// values are url encoded, so this is an unambiguous field separator.
let key_parts = partition_key.split(PARTITION_KEY_DELIMITER);
// Obtain an iterator of template parts, from which the meaning of the key
// parts can be inferred.
let template_parts = template.parts();
// Invariant: the number of key parts generated from a given template always
// matches the number of template parts.
//
// The key_parts iterator is not an ExactSizeIterator, so an assert can't be
// placed here to validate this property.
// Produce an iterator of (template_part, template_value)
template_parts
.zip(key_parts)
.filter_map(|(template, mut value)| {
// Perform re-mapping of sentinel values.
match value {
PARTITION_KEY_VALUE_NULL_STR => {
// Skip null or empty partition key parts, indicated by the
// presence of a single "!" character as the part value.
return None;
}
PARTITION_KEY_VALUE_EMPTY_STR => {
// Re-map the empty string sentinel "^"" to an empty string
// value.
value = "";
}
_ => {}
}
match template {
TemplatePart::TagValue(col_name) => Some((col_name, value)),
TemplatePart::TimeFormat(_) => None,
}
})
// Reverse the urlencoding of all value parts
.map(|(name, value)| {
(
name,
percent_decode_str(value)
.decode_utf8()
.expect("invalid partition key part encoding"),
)
})
}
/// In production code, the template should come from protobuf that is either from the database or
/// from a gRPC request. In tests, building protobuf is painful, so here's an easier way to create
/// a `TablePartitionTemplateOverride`.
@ -168,6 +337,129 @@ mod tests {
use assert_matches::assert_matches;
use sqlx::Encode;
/// Generate a test that asserts "partition_key" is reversible, yielding
/// "want" assuming the partition "template" was used.
macro_rules! test_build_column_values {
(
$name:ident,
template = $template:expr, // Array/vec of TemplatePart
partition_key = $partition_key:expr, // String derived partition key
want = $want:expr // Expected build_column_values() output
) => {
paste::paste! {
#[test]
fn [<test_build_column_values_ $name>]() {
let template = $template.into_iter().collect::<Vec<_>>();
let template = test_table_partition_override(template);
// normalise the values into a (str, string) for the comparison
let want = $want
.into_iter()
.map(|(k, v)| {
let v: &str = v;
(k, v.to_string())
})
.collect::<Vec<_>>();
let got = build_column_values(&template, $partition_key)
.map(|(k, v)| (k, v.to_string()))
.collect::<Vec<_>>();
assert_eq!(got, want);
}
}
};
}
test_build_column_values!(
module_doc_example_1,
template = [
TemplatePart::TimeFormat("%Y"),
TemplatePart::TagValue("a"),
TemplatePart::TagValue("b"),
],
partition_key = "2023|bananas|plátanos",
want = [("a", "bananas"), ("b", "plátanos")]
);
test_build_column_values!(
module_doc_example_2,
template = [
TemplatePart::TimeFormat("%Y"),
TemplatePart::TagValue("a"),
TemplatePart::TagValue("b"),
],
partition_key = "2023|!|plátanos",
want = [("b", "plátanos")]
);
test_build_column_values!(
module_doc_example_4,
template = [
TemplatePart::TimeFormat("%Y"),
TemplatePart::TagValue("a"),
TemplatePart::TagValue("b"),
],
partition_key = "2023|!|!",
want = []
);
test_build_column_values!(
module_doc_example_5,
template = [
TemplatePart::TimeFormat("%Y"),
TemplatePart::TagValue("a"),
TemplatePart::TagValue("b"),
],
partition_key = "2023|cat%7Cdog|%21",
want = [("a", "cat|dog"), ("b", "!")]
);
test_build_column_values!(
module_doc_example_6,
template = [
TemplatePart::TimeFormat("%Y"),
TemplatePart::TagValue("a"),
TemplatePart::TagValue("b"),
],
partition_key = "2023|%2550|!",
want = [("a", "%50")]
);
test_build_column_values!(
unambiguous,
template = [
TemplatePart::TimeFormat("%Y"),
TemplatePart::TagValue("a"),
TemplatePart::TagValue("b"),
],
partition_key = "2023|is%7Cnot%21ambiguous%2510|!",
want = [("a", "is|not!ambiguous%10")]
);
test_build_column_values!(
empty_tag_only,
template = [TemplatePart::TagValue("a")],
partition_key = "!",
want = []
);
#[test]
fn test_null_partition_key_char_str_equality() {
assert_eq!(
PARTITION_KEY_VALUE_NULL.to_string(),
PARTITION_KEY_VALUE_NULL_STR
);
}
#[test]
fn test_empty_partition_key_char_str_equality() {
assert_eq!(
PARTITION_KEY_VALUE_EMPTY.to_string(),
PARTITION_KEY_VALUE_EMPTY_STR
);
}
/// This test asserts the default derived partitioning scheme with no
/// overrides.
///

View File

@ -17,7 +17,9 @@ snafu = "0.7"
hashbrown = { workspace = true }
itertools = "0.10"
workspace-hack = { version = "0.1", path = "../workspace-hack" }
percent-encoding = "2.2.0"
[dev-dependencies]
mutable_batch_lp = { path = "../mutable_batch_lp" }
paste = "1.0.12"
rand = "0.8"

View File

@ -1,15 +1,23 @@
//! Functions for partitioning rows from a [`MutableBatch`]
//!
//! The returned ranges can then be used with [`MutableBatch::extend_from_range`]
//! The returned ranges can then be used with
//! [`MutableBatch::extend_from_range`].
//!
//! The partitioning template, derived partition key format, and encodings are
//! described in detail in the [`data_types::partition_template`] module.
use crate::{
column::{Column, ColumnData},
MutableBatch,
};
use chrono::{format::StrftimeItems, TimeZone, Utc};
use data_types::partition_template::{TablePartitionTemplateOverride, TemplatePart};
use schema::{InfluxColumnType, TIME_COLUMN_NAME};
use std::ops::Range;
use data_types::partition_template::{
TablePartitionTemplateOverride, TemplatePart, ENCODED_PARTITION_KEY_CHARS,
PARTITION_KEY_DELIMITER, PARTITION_KEY_VALUE_EMPTY_STR, PARTITION_KEY_VALUE_NULL_STR,
};
use percent_encoding::utf8_percent_encode;
use schema::TIME_COLUMN_NAME;
use std::{borrow::Cow, ops::Range};
/// Returns an iterator identifying consecutive ranges for a given partition key
pub fn partition_batch<'a>(
@ -19,93 +27,110 @@ pub fn partition_batch<'a>(
range_encode(partition_keys(batch, template.parts()))
}
/// A [`TablePartitionTemplateOverride`] is made up of one of more [`TemplatePart`]s that are
/// rendered and joined together by hyphens to form a single partition key.
/// A [`TablePartitionTemplateOverride`] is made up of one of more
/// [`TemplatePart`]s that are rendered and joined together by
/// [`PARTITION_KEY_DELIMITER`] to form a single partition key.
///
/// To avoid allocating intermediate strings, and performing column lookups for every row,
/// each [`TemplatePart`] is converted to a [`Template`].
/// To avoid allocating intermediate strings, and performing column lookups for
/// every row, each [`TemplatePart`] is converted to a [`Template`].
///
/// [`Template::fmt_row`] can then be used to render the template for that particular row
/// to the provided string, without performing any additional column lookups
/// [`Template::fmt_row`] can then be used to render the template for that
/// particular row to the provided string, without performing any additional
/// column lookups
#[derive(Debug)]
enum Template<'a> {
TagValue(&'a Column, &'a str),
MissingTag(&'a str),
TagValue(&'a Column),
TimeFormat(&'a [i64], StrftimeItems<'a>),
/// This batch is missing a partitioning tag column.
MissingTag,
}
impl<'a> Template<'a> {
/// Renders this template to `out` for the row `idx`
/// Renders this template to `out` for the row `idx`.
fn fmt_row<W: std::fmt::Write>(&self, out: &mut W, idx: usize) -> std::fmt::Result {
match self {
Template::TagValue(col, col_name) if col.valid.get(idx) => {
out.write_str(col_name)?;
out.write_char('_')?;
match &col.data {
ColumnData::F64(col_data, _) => write!(out, "{}", col_data[idx]),
ColumnData::I64(col_data, _) => write!(out, "{}", col_data[idx]),
ColumnData::U64(col_data, _) => write!(out, "{}", col_data[idx]),
ColumnData::String(col_data, _) => {
write!(out, "{}", col_data.get(idx).unwrap())
}
ColumnData::Bool(col_data, _) => match col_data.get(idx) {
true => out.write_str("true"),
false => out.write_str("false"),
},
ColumnData::Tag(col_data, dictionary, _) => {
out.write_str(dictionary.lookup_id(col_data[idx]).unwrap())
}
}
}
Template::TagValue(_, col_name) | Template::MissingTag(col_name) => {
out.write_str(col_name)
}
Template::TagValue(col) if col.valid.get(idx) => match &col.data {
ColumnData::Tag(col_data, dictionary, _) => out.write_str(never_empty(
Cow::from(utf8_percent_encode(
dictionary.lookup_id(col_data[idx]).unwrap(),
&ENCODED_PARTITION_KEY_CHARS,
))
.as_ref(),
)),
other => panic!(
"partitioning only works on tag columns, but column was type `{other:?}`"
),
},
Template::TimeFormat(t, format) => {
let formatted = Utc
.timestamp_nanos(t[idx])
.format_with_items(format.clone());
write!(out, "{formatted}")
.format_with_items(format.clone()) // Cheap clone of refs
.to_string();
out.write_str(
Cow::from(utf8_percent_encode(
formatted.as_str(),
&ENCODED_PARTITION_KEY_CHARS,
))
.as_ref(),
)
}
// Either a tag that has no value for this given row index, or the
// batch does not contain this tag at all.
Template::TagValue(_) | Template::MissingTag => {
out.write_str(PARTITION_KEY_VALUE_NULL_STR)
}
}
}
}
/// Return `s` if it is non-empty, else [`PARTITION_KEY_VALUE_EMPTY_STR`].
#[inline(always)]
fn never_empty(s: &str) -> &str {
if s.is_empty() {
return PARTITION_KEY_VALUE_EMPTY_STR;
}
s
}
/// Returns an iterator of partition keys for the given table batch
fn partition_keys<'a>(
batch: &'a MutableBatch,
template_parts: impl Iterator<Item = TemplatePart<'a>>,
) -> impl Iterator<Item = String> + 'a {
let time = batch.column(TIME_COLUMN_NAME).expect("time column");
let time = match &time.data {
ColumnData::I64(col_data, _) => col_data.as_slice(),
x => unreachable!("expected i32 for time got {}", x),
// Extract the timestamp data.
let time = match batch.column(TIME_COLUMN_NAME).map(|v| &v.data) {
Ok(ColumnData::I64(data, _)) => data.as_slice(),
Ok(v) => unreachable!("incorrect type for time column: {v:?}"),
Err(e) => panic!("error reading time column: {e:?}"),
};
let cols: Vec<_> = template_parts
.map(|part| match part {
TemplatePart::TagValue(name) => batch.column(name).map_or_else(
|_| Template::MissingTag(name),
|col| match col.influx_type {
InfluxColumnType::Tag => Template::TagValue(col, name),
other => panic!(
"Partitioning only works on tag columns, \
but column `{name}` was type `{other:?}`"
),
},
),
// Convert TemplatePart into an ordered array of Template
let template = template_parts
.map(|v| match v {
TemplatePart::TagValue(col_name) => batch
.column(col_name)
.map_or_else(|_| Template::MissingTag, Template::TagValue),
TemplatePart::TimeFormat(fmt) => Template::TimeFormat(time, StrftimeItems::new(fmt)),
})
.collect();
.collect::<Vec<_>>();
// Yield a partition key string for each row in `batch`
(0..batch.row_count).map(move |idx| {
let mut string = String::new();
for (col_idx, col) in cols.iter().enumerate() {
// Evaluate each template part for this row
for (col_idx, col) in template.iter().enumerate() {
col.fmt_row(&mut string, idx)
.expect("string writing is infallible");
if col_idx + 1 != cols.len() {
string.push('-');
// If this isn't the last element in the template, insert a field
// delimiter.
if col_idx + 1 != template.len() {
string.push(PARTITION_KEY_DELIMITER);
}
}
string
})
}
@ -146,7 +171,9 @@ where
#[cfg(test)]
mod tests {
use super::*;
use crate::writer::Writer;
use data_types::partition_template::{build_column_values, test_table_partition_override};
use rand::prelude::*;
fn make_rng() -> StdRng {
@ -224,7 +251,7 @@ mod tests {
let template_parts = [
TemplatePart::TimeFormat("%Y-%m-%d %H:%M:%S"),
TemplatePart::TagValue("region"),
TemplatePart::TagValue("bananas"),
TemplatePart::TagValue("bananas"), // column not present
];
writer.commit();
@ -234,20 +261,17 @@ mod tests {
assert_eq!(
keys,
vec![
"1970-01-01 00:00:00-region-bananas".to_string(),
"1970-01-01 00:00:00-region_west-bananas".to_string(),
"1970-01-01 00:00:00-region-bananas".to_string(),
"1970-01-01 00:00:00-region_east-bananas".to_string(),
"1970-01-01 00:00:00-region-bananas".to_string()
"1970-01-01 00:00:00|!|!".to_string(),
"1970-01-01 00:00:00|west|!".to_string(),
"1970-01-01 00:00:00|!|!".to_string(),
"1970-01-01 00:00:00|east|!".to_string(),
"1970-01-01 00:00:00|!|!".to_string()
]
)
}
#[test]
#[should_panic(
expected = "Partitioning only works on tag columns, but column `region` was type \
`Field(String)`"
)]
#[should_panic(expected = "partitioning only works on tag columns, but column was type")]
fn partitioning_on_fields_panics() {
let mut batch = MutableBatch::new();
let mut writer = Writer::new(&mut batch, 5);
@ -270,4 +294,151 @@ mod tests {
let _keys: Vec<_> = partition_keys(&batch, template_parts.into_iter()).collect();
}
#[test]
fn test_partition_key() {
let mut batch = MutableBatch::new();
let mut writer = Writer::new(&mut batch, 1);
let tag_values = [("col_a", "value")];
let template_parts = TablePartitionTemplateOverride::new(None, &Default::default());
// Timestamp: 2023-05-29T13:03:16Z
writer
.write_time("time", vec![1685365396931384064].into_iter())
.unwrap();
for (col, value) in tag_values {
writer
.write_tag(col, Some(&[0b00000001]), vec![value].into_iter())
.unwrap();
}
writer.commit();
let keys: Vec<_> = partition_keys(&batch, template_parts.parts()).collect();
assert_eq!(keys, vec!["2023-05-29".to_string()])
}
// Generate a test that asserts the derived partition key matches
// "want_key", when using the provided "template" parts and set of "tags".
//
// Additionally validates that the derived key is reversible into the
// expected set of "want_reversed_tags" from the original inputs.
macro_rules! test_partition_key {
(
$name:ident,
template = $template:expr, // Array/vec of TemplatePart
tags = $tags:expr, // Array/vec of (tag_name, value) tuples
want_key = $want_key:expr, // Expected partition key string
want_reversed_tags = $want_reversed_tags:expr // Array/vec of (tag_name, value) reversed from $tags
) => {
paste::paste! {
#[test]
fn [<test_partition_key_ $name>]() {
let mut batch = MutableBatch::new();
let mut writer = Writer::new(&mut batch, 1);
let template = $template.into_iter().collect::<Vec<_>>();
let template = test_table_partition_override(template);
// Timestamp: 2023-05-29T13:03:16Z
writer
.write_time("time", vec![1685365396931384064].into_iter())
.unwrap();
for (col, value) in $tags {
writer
.write_tag(col, Some(&[0b00000001]), vec![value].into_iter())
.unwrap();
}
writer.commit();
let keys: Vec<_> = partition_keys(&batch, template.parts()).collect();
assert_eq!(keys, vec![$want_key.to_string()]);
// Reverse the encoding.
let reversed = build_column_values(&template, &keys[0]);
// normalise the tags into a (str, string) for the comparison
let want = $want_reversed_tags
.into_iter()
.map(|(k, v)| {
let v: &str = v;
(k, v.to_string())
})
.collect::<Vec<_>>();
let got = reversed
.map(|(k, v)| (k, v.to_string()))
.collect::<Vec<_>>();
assert_eq!(got, want);
}
}
};
}
test_partition_key!(
simple,
template = [
TemplatePart::TimeFormat("%Y"),
TemplatePart::TagValue("a"),
TemplatePart::TagValue("b"),
],
tags = [("a", "bananas"), ("b", "are_good")],
want_key = "2023|bananas|are_good",
want_reversed_tags = [("a", "bananas"), ("b", "are_good")]
);
test_partition_key!(
non_ascii,
template = [
TemplatePart::TimeFormat("%Y"),
TemplatePart::TagValue("a"),
TemplatePart::TagValue("b"),
],
tags = [("a", "bananas"), ("b", "plátanos")],
want_key = "2023|bananas|pl%C3%A1tanos",
want_reversed_tags = [("a", "bananas"), ("b", "plátanos")]
);
test_partition_key!(
single_tag_template_tag_not_present,
template = [TemplatePart::TagValue("a")],
tags = [("b", "bananas")],
want_key = "!",
want_reversed_tags = []
);
test_partition_key!(
single_tag_template_tag_empty,
template = [TemplatePart::TagValue("a")],
tags = [("a", "")],
want_key = "^",
want_reversed_tags = [("a", "")]
);
test_partition_key!(
missing_tag,
template = [TemplatePart::TagValue("a"), TemplatePart::TagValue("b")],
tags = [("a", "bananas")],
want_key = "bananas|!",
want_reversed_tags = [("a", "bananas")]
);
test_partition_key!(
unambiguous,
template = [
TemplatePart::TimeFormat("%Y"),
TemplatePart::TagValue("a"),
TemplatePart::TagValue("b"),
TemplatePart::TagValue("c"),
TemplatePart::TagValue("d"),
TemplatePart::TagValue("e"),
],
tags = [("a", "|"), ("b", "!"), ("d", "%7C%21%257C"), ("e", "^")],
want_key = "2023|%7C|%21|!|%257C%2521%25257C|%5E",
want_reversed_tags = [("a", "|"), ("b", "!"), ("d", "%7C%21%257C"), ("e", "^")]
);
}

View File

@ -357,14 +357,9 @@ mod tests {
.collect::<HashMap<_, _>>();
let expected = HashMap::from([
(
PartitionKey::from("oranges-1970-01-tag2_C"),
vec!["bananas".into()],
),
(
PartitionKey::from("oranges-2016-06-tag2_D"),
vec!["bananas".into()],
),
(PartitionKey::from("!|1970-01|C"), vec!["bananas".into()]),
(PartitionKey::from("!|2016-06|D"), vec!["bananas".into()]),
// This table does not have a partition template override
(PartitionKey::from("1970-01-01"), vec!["platanos".into()]),
(PartitionKey::from("2016-06-13"), vec!["platanos".into()]),
]);

View File

@ -986,7 +986,7 @@ async fn test_namespace_partition_template_implicit_table_creation() {
let table_id = ctx.table_id("bananas_test", "plantains").await.get();
assert_eq!(table_batches.len(), 1);
assert_eq!(table_batches[0].table_id, table_id);
assert_eq!(partition_key, "tag1_A");
assert_eq!(partition_key, "A");
});
}
@ -1051,7 +1051,7 @@ async fn test_namespace_partition_template_explicit_table_creation_without_parti
let table_id = ctx.table_id("bananas_test", "plantains").await.get();
assert_eq!(table_batches.len(), 1);
assert_eq!(table_batches[0].table_id, table_id);
assert_eq!(partition_key, "tag1_A");
assert_eq!(partition_key, "A");
});
}
@ -1120,7 +1120,7 @@ async fn test_namespace_partition_template_explicit_table_creation_with_partitio
let table_id = ctx.table_id("bananas_test", "plantains").await.get();
assert_eq!(table_batches.len(), 1);
assert_eq!(table_batches[0].table_id, table_id);
assert_eq!(partition_key, "tag2_B");
assert_eq!(partition_key, "B");
});
}
@ -1185,6 +1185,6 @@ async fn test_namespace_without_partition_template_table_with_partition_template
let table_id = ctx.table_id("bananas_test", "plantains").await.get();
assert_eq!(table_batches.len(), 1);
assert_eq!(table_batches[0].table_id, table_id);
assert_eq!(partition_key, "tag2_B");
assert_eq!(partition_key, "B");
});
}