A string field w/ a trailing slash before the quote would parse incorrectly
because the quote would be seen as escaped. We have to treat \\ as an
escape sequence within strings in order to handle this.
The FieldIterator is used to scan over the fields of a point, providing
information, and delaying parsing/decoding the value until it is needed.
This change uses this new type to avoid the allocation of a map for the
fields which is then thrown away as soon as the points get converted
into columns within the datastore.
Over a longer period of writes, this allocation shows up quite
a bit in profiles since the slice needs to be resized frequently.
This scans the slice to count how many lines are going to be parsed
in order to pre-allocate the slice capacity. It's slightly slower,
but creates less garbage in the long run.
The v2 UDP client will attempt to split points that exceed the
configured payload size. It will only do this for points that have a
timestamp specified.
Negative timestamps are now supported. We also now refuse two
nanoseconds that are at the edge of the minimum time window. One of the
nanoseconds we do not accept is because we need MinInt64 to be used for
some internal comparisons in the TSM engine and it was causing an
underflow when we subtracted one from the minimum time. The second is so
we can have one minimum time that signifies the default minimum that
nobody can write to (so we can implicitly rewrite the timestamp on
aggregate queries) but still use the explicit timestamp if it is given
to us by the user. We aren't able to tell the difference between if the
user provided it or if it was implicit without those values being
different.
If the default minimum time is used with an aggregate query, we rewrite
the time to be the epoch for backwards compatibility since we believe
that's more important than supporting that extra nanosecond.
The highest time represented by a nanosecond needs to be used for an
exclusive range, so the maximum time needs to be one less than the
possible maximum number of nanoseconds representable by an int64 so that
we don't lose a point at that one time.
Previously worked in the open source version because the timestamp used
for finding a shard would be truncated by the retention policy so the
lookup time didn't run into this edge case because it didn't rest on the
truncation boundary. Since that point didn't really belong in that shard
group and was placed there by mistake, it's best to fix this bug since
the timestamp used to create the shard group should be capable of
retrieving it.
Influx does not support fields with empty names or points
with no fields.
NewPoint is changed to validate that all field names are non-empty.
AddField is removed because we now require that all fields are
specified on construction.
NewPointFromByte is changed to return an error if a unmarshaled
binary point does not have any fields.
newFieldsFromBinary is changed to prevent an infinite loop that
can arise while attempting to parse corrupt binary point data.
TestNewPointsWithBytesWithCorruptData is changed to reflect the
change in the behaviour of NewPointFromByte.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
These tests check that NewPoint and NewPointFromBytes return an error if:
* arguments specify a point with no fields
* arguments specify a point with a field that has an empty name
These tests also check that Point.Fields() always returns, even in the presence
of corrupt binary data read with NewPointFromBytes.
These tests fail at this commit and are fixed by a subsequent commit.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Quotes are handled differently in the line protocol depending on when
they are encountered. Quotes in field values matter, quotes anywhere
else don't.
`scanLine()` didn't understand this difference and treated all quotes
the same as ones for tag values. This resulted in `scanLine()` reading
the wrong amount of data sometimes when quotes were involved.
This fixes#5204.
Quotes are not supposed to be significant in field keys, but are
significant in field values. The code as it currently was would
consider quotes in a key to be significant, but the later parser that
would unmarshal the fields from the byte string did not consider those
quotes to be significant. This meant that the following string:
"a=1
The line protocol parser would see a mismatched quote instead of a valid
input to the line protocol. But more nefariously, the following string:
"a=1"=2
The line protocol parser would ignore the first equals since it is
located in the quotation marks and think this was a valid input. It
would then pass it on to the field parser who would panic and die when
it tried to parse `1"=2` as a number.
Fixes#4076.
Float values are not supported in the existing engine and the tsm1
engines. This changes NewPoint to return an error if a field value
contains a NaN field. It also allows us to validate fields to prevent
other unsupported types from sneaking in through other input plugins.