Skip to content

Support keys compression in Dragonfly #4883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
romange opened this issue Apr 3, 2025 · 0 comments · Fixed by #5179
Closed

Support keys compression in Dragonfly #4883

romange opened this issue Apr 3, 2025 · 0 comments · Fixed by #5179
Assignees

Comments

@romange
Copy link
Collaborator

romange commented Apr 3, 2025

See #4880 for motivation and as a prerequisite

feature description

  1. being able to enable keys huffman encoding via a run-time flag.
  2. can also disable it (via curl /flagz).
  3. Once enabled it samples the key space by scanning N ( hardcoded ) keys, builds the histogram the derivative data structures needed for encoding/decoding.
  4. The data structures can be built once during the process lifetime because decoding is dependent on them.
  5. Bonus: The data structures are immutable so might be shared among threads, but need to verify if it's easy to do (someone still needs to delete them upon service shutdown).
  6. Once they appear, CompactObject::SetString should be able to use huffman encoding.
  7. encoding stats like total raw size, total compressed size, tries, successful - should be exposed via "info".
@romange romange self-assigned this May 6, 2025
romange added a commit that referenced this issue May 6, 2025
One of the building blocks for #4883

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue May 6, 2025
Move the code in debugcmd.cc into HuffmanEncoder.

One of the building blocks for #4883

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue May 7, 2025
Move the code in debugcmd.cc into HuffmanEncoder.

One of the building blocks for #4883

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue May 7, 2025
Move the code in debugcmd.cc into HuffmanEncoder.

One of the building blocks for #4883

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue May 25, 2025
Fixes #4883

For example, without huffman encoding `debug POPULATE 100000 xxxxxxxxxxxxxxxxx 20` requires 2399200 bytes of memory for keys:
`type_used_memory_string:2399200`
the subsequent `debug compression EXPORT` outputs base640-encoded table `GBDgCpXW/////66rio2Ue++9927Vqa21Fg==`

finally, running dragonfly with `--huffman_table="keys:GBDgCpXW/////66rio2Ue++9927Vqa21Fg=="`
will completely eliminate keys allocations due to huffman encoding and small string optimization.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue May 26, 2025
Fixes #4883

For example, without huffman encoding `debug POPULATE 100000 xxxxxxxxxxxxxxxxx 20` requires 2399200 bytes of memory for keys:
`type_used_memory_string:2399200`
the subsequent `debug compression EXPORT` outputs base640-encoded table `GBDgCpXW/////66rio2Ue++9927Vqa21Fg==`

finally, running dragonfly with `--huffman_table="keys:GBDgCpXW/////66rio2Ue++9927Vqa21Fg=="`
will completely eliminate keys allocations due to huffman encoding and small string optimization.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue May 26, 2025
Fixes #4883

For example, without huffman encoding `debug POPULATE 100000 xxxxxxxxxxxxxxxxx 20` requires 2399200 bytes of memory for keys:
`type_used_memory_string:2399200`
the subsequent `debug compression EXPORT` outputs base64-encoded table `GBDgCpXW/////66rio2Ue++9927Vqa21Fg==`

And when, we rundragonfly with `--huffman_table="keys:GBDgCpXW/////66rio2Ue++9927Vqa21Fg=="`
this completely eliminates keys allocations due to huffman encoding and small string optimization.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue May 26, 2025
Fixes #4883

For example, without huffman encoding `debug POPULATE 100000 xxxxxxxxxxxxxxxxx 20` requires 2399200 bytes of memory for keys:
`type_used_memory_string:2399200`
the subsequent `debug compression EXPORT` outputs base64-encoded table `GBDgCpXW/////66rio2Ue++9927Vqa21Fg==`

And when, we run dragonfly with `--huffman_table="keys:GBDgCpXW/////66rio2Ue++9927Vqa21Fg=="`
this completely eliminates keys allocations due to huffman encoding and small string optimization.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
romange added a commit that referenced this issue May 28, 2025
Fixes #4883

For example, without huffman encoding `debug POPULATE 100000 xxxxxxxxxxxxxxxxx 20` requires 2399200 bytes of memory for keys:
`type_used_memory_string:2399200`
the subsequent `debug compression EXPORT` outputs base64-encoded table `GBDgCpXW/////66rio2Ue++9927Vqa21Fg==`

And when, we run dragonfly with `--huffman_table="keys:GBDgCpXW/////66rio2Ue++9927Vqa21Fg=="`
this completely eliminates keys allocations due to huffman encoding and small string optimization.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant