add dot support in DSL indentifier #3989

JerryKwan · 2023-10-19T01:39:21Z

add dot support in DSL indentifier

Description

In the previous implementation, the identifier only supports { a..z | A..Z | 0..9 | _ }, and does not support set partition_key on key in json field and key with dot

How was this PR tested?

Closes:#3380

add dot support in DSL indentifier

fmassot · 2023-10-19T07:35:33Z

@trinity-1686a can you take a look?

trinity-1686a

Hi,
your changes seems to work fine for a json like

{
  "abc.def": 123
}

but not for

{
  "abc": {
    "def": 123
  }
}

(in that case, hash_attribute search for the literal key abc.def and finds nothing)

We want to support both use cases (probably with slightly different syntax, like having abc\.def to search for the literal key abc.def, and abc.def for nested keys).

Do you want to implement the whole thing (that is, a bit more change to the dsl, and teach hash_attribute how to handle nested keys), or leave things mostly as is and keep the issue open for now? (in the 2nd case, I'd still prefer for the dsl to want . for not nested keys to be escaped, so it's not a breaking change when support for nested key is implemented)

JerryKwan · 2023-10-19T13:39:44Z

@trinity-1686a Good catch.
Do you mean the function named hash_attribute?
https://github.com/quickwit-oss/quickwit/blob/main/quickwit/quickwit-doc-mapper/src/routing_expression/mod.rs#L72-L81
I think it would be better to do some changes to support nested keys, and considering the key names with dot (.)
Although I don't think using dot (.) in some key names is a good practice

fmassot · 2023-10-19T13:48:47Z

Although I don't think using dot (.) in some key names is a good practice

We do not have the choice, users send data with arbitrary keys.

I agree with @trinity-1686a on the syntax:

abc\.def to search for the literal key abc.def
abc.def for nested keys

JerryKwan · 2023-10-19T13:52:10Z

Okay, let's use abc\.def to search for the literal key and abc.def for nested keys

trinity-1686a · 2023-10-19T14:02:28Z

(I took the liberty of editing your message, the \ got eaten my markdown and made the message unclear)

refactor hash_attribute to hash nested keys

add test case

minor bug fix add more test cases

JerryKwan · 2023-10-22T10:50:48Z

@trinity-1686a @fmassot
I did some changes to hash_attribute(), and it should support the usage of abd.def for escaped dot and abd.def for nested keys
Would you please help to review again? thank you

quickwit/quickwit-doc-mapper/src/routing_expression/mod.rs

fmassot

We are on the indexing critical path here so we need to be extra careful when doing extra allocation. This make this issue an interesting problem! Can you review your code and try to remove all allocations?

fmassot · 2023-10-22T22:46:26Z

@JerryKwan: you could start by adding a benchmark to monitor the performance (thanks @guilload for the suggestion!). There is a benchmark for the doc mapper, we could probably use it.

JerryKwan · 2023-10-25T03:56:37Z

@fmassot @guilload
Thanks for the suggestions.
Based on we are on the indexing critical path, we should minimize unnecessary allocations whenever possible. Any suggestions about how we can do it?

optimize routing expression add benchmark test for routing expression

JerryKwan · 2023-11-10T07:03:14Z

@fmassot @trinity-1686a
Would you please help to review again? thank you

fmassot · 2023-11-12T21:30:38Z

@JerryKwan can you share the bench results before/after?

JerryKwan · 2023-11-13T02:27:36Z

@fmassot
This is the bench result before change on my dev environment

test tag_pruning::test::test_match_tag_field_name ... ignored

test result: ok. 0 passed; 0 failed; 180 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/doc_to_json_bench.rs (target/release/deps/doc_to_json_bench-02707d090c296519)
Gnuplot not found, using plotters backend
Benchmarking simple-json-to-doc/simple-json-to-doc: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
simple-json-to-doc/simple-json-to-doc
                        time:   [952.95 µs 957.72 µs 963.15 µs]
                        thrpt:  [86.808 MiB/s 87.301 MiB/s 87.738 MiB/s]
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe
Benchmarking simple-json-to-doc/simple-json-to-doc-tantivy: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60.
simple-json-to-doc/simple-json-to-doc-tantivy
                        time:   [1.1479 ms 1.1519 ms 1.1565 ms]
                        thrpt:  [72.293 MiB/s 72.583 MiB/s 72.839 MiB/s]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

And this is the one after changes

test tag_pruning::test::test_extract_tags_from_query_range_query_conjunction ... ignored
test tag_pruning::test::test_match_tag_field_name ... ignored

test result: ok. 0 passed; 0 failed; 187 ignored; 0 measured; 0 filtered out; finished in 0.01s

     Running benches/doc_to_json_bench.rs (target/release/deps/doc_to_json_bench-02707d090c296519)
Gnuplot not found, using plotters backend
Benchmarking simple-json-to-doc/simple-json-to-doc: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.5s, enable flat sampling, or reduce sample count to 50.
simple-json-to-doc/simple-json-to-doc
                        time:   [964.23 µs 968.95 µs 974.78 µs]
                        thrpt:  [86.750 MiB/s 87.272 MiB/s 87.700 MiB/s]
                 change:
                        time:   [+3.8428% +6.3502% +8.8367%] (p = 0.00 < 0.05)
                        thrpt:  [-8.1193% -5.9710% -3.7006%]
                        Performance has regressed.
Benchmarking simple-json-to-doc/simple-json-to-doc-tantivy: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
simple-json-to-doc/simple-json-to-doc-tantivy
                        time:   [1.0735 ms 1.0766 ms 1.0803 ms]
                        thrpt:  [78.280 MiB/s 78.542 MiB/s 78.774 MiB/s]
                 change:
                        time:   [-7.0396% -6.5161% -6.0295%] (p = 0.00 < 0.05)
                        thrpt:  [+6.4163% +6.9703% +7.5727%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

     Running benches/routing_expression_bench.rs (target/release/deps/routing_expression_bench-7249b0fa809dafee)
Gnuplot not found, using plotters backend
simple-routing-expression/simple-json-to-doc
                        time:   [4.9654 ms 5.0183 ms 5.0766 ms]
                        thrpt:  [51.631 MiB/s 52.231 MiB/s 52.787 MiB/s]
Found 15 outliers among 100 measurements (15.00%)
  15 (15.00%) high severe
simple-routing-expression/simple-eval-hash
                        time:   [3.5201 ms 3.5315 ms 3.5442 ms]
                        thrpt:  [73.955 MiB/s 74.221 MiB/s 74.461 MiB/s]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

quickwit/quickwit-doc-mapper/benches/routing_expression_bench.rs

quickwit/quickwit-doc-mapper/src/routing_expression/mod.rs

JerryKwan · 2023-11-24T06:22:48Z

@trinity-1686a Thanks for reviewing, I will refactor the codes asap, thank you

trinity-1686a · 2024-01-25T16:22:50Z

hi @JerryKwan, it looks like you haven't had time to make progress on this PR. Would you mind if I take over?

JerryKwan · 2024-01-26T01:01:19Z

@trinity-1686a
Sorry for the trouble introduced. please help to take it over,thank you

trinity-1686a · 2024-01-26T09:42:09Z

no problem 😉

allocations were removed

quickwit/quickwit-doc-mapper/src/routing_expression/mod.rs

add dot support in DSL indentifier

2ea75db

add dot support in DSL indentifier

fmassot requested a review from trinity-1686a October 19, 2023 07:35

trinity-1686a reviewed Oct 19, 2023

View reviewed changes

JerryKwan added 3 commits October 20, 2023 19:49

refactor hash_attribute

1fc8d16

refactor hash_attribute to hash nested keys

add test case

5f26d63

add test case

minor bug fix and add more test cases

88f614d

minor bug fix add more test cases

fmassot reviewed Oct 22, 2023

View reviewed changes

quickwit/quickwit-doc-mapper/src/routing_expression/mod.rs Outdated Show resolved Hide resolved

fmassot previously requested changes Oct 22, 2023

View reviewed changes

JerryKwan added 2 commits November 9, 2023 16:53

Merge remote-tracking branch 'origin/main' into issue3380

d9a12f2

optimize routing expression

cec38ec

optimize routing expression add benchmark test for routing expression

trinity-1686a reviewed Nov 23, 2023

View reviewed changes

quickwit/quickwit-doc-mapper/benches/routing_expression_bench.rs Outdated Show resolved Hide resolved

quickwit/quickwit-doc-mapper/src/routing_expression/mod.rs Outdated Show resolved Hide resolved

quickwit/quickwit-doc-mapper/src/routing_expression/mod.rs Outdated Show resolved Hide resolved

trinity-1686a added 2 commits January 26, 2024 11:24

remove some unnecessary allocations

162991d

Merge branch 'main' into JerryKwan/issue3380

9d11f5c

trinity-1686a requested a review from guilload January 26, 2024 10:34

use simpler hash and update resulting values

5670d64

guilload reviewed Jan 26, 2024

View reviewed changes

trinity-1686a added 2 commits January 29, 2024 10:50

remove clone in hashing json map

f8c9d4c

Merge branch 'main' into JerryKwan/issue3380

88971ef

guilload approved these changes Jan 29, 2024

View reviewed changes

Merge branch 'main' into issue3380

b47e5d3

trinity-1686a merged commit b91a3ea into quickwit-oss:main Jan 29, 2024

trinity-1686a mentioned this pull request Jan 29, 2024

Support partition_key on key within json field or key with dots #3380

Closed

add dot support in DSL indentifier #3989

add dot support in DSL indentifier #3989

Uh oh!

Conversation

JerryKwan commented Oct 19, 2023

Description

How was this PR tested?

Uh oh!

fmassot commented Oct 19, 2023

Uh oh!

trinity-1686a left a comment

Choose a reason for hiding this comment

Uh oh!

JerryKwan commented Oct 19, 2023

Uh oh!

fmassot commented Oct 19, 2023

Uh oh!

JerryKwan commented Oct 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trinity-1686a commented Oct 19, 2023

Uh oh!

JerryKwan commented Oct 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fmassot left a comment

Choose a reason for hiding this comment

Uh oh!

fmassot commented Oct 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JerryKwan commented Oct 25, 2023

Uh oh!

JerryKwan commented Nov 10, 2023

Uh oh!

fmassot commented Nov 12, 2023

Uh oh!

JerryKwan commented Nov 13, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JerryKwan commented Nov 24, 2023

Uh oh!

trinity-1686a commented Jan 25, 2024

Uh oh!

JerryKwan commented Jan 26, 2024

Uh oh!

trinity-1686a commented Jan 26, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JerryKwan commented Oct 19, 2023 •

edited

Loading

JerryKwan commented Oct 22, 2023 •

edited

Loading

fmassot commented Oct 22, 2023 •

edited

Loading