Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support storing trailing \0 byte at the end of string #187

Closed
khng300 opened this issue Jun 18, 2023 · 5 comments
Closed

Support storing trailing \0 byte at the end of string #187

khng300 opened this issue Jun 18, 2023 · 5 comments

Comments

@khng300
Copy link
Contributor

khng300 commented Jun 18, 2023

Hi, not sure if I miss anything but I recently discovered cista::generic_string did not store the \0 byte at the end of a long string (or string that just hit the short_length_limit length limit). As a workaround I currently draft my own string type for the purpose.

Is there any plan to work this out? Or do we need to propose a new type?

(Not really related but just a side-topic: What about support storing \0 within the content of a short string?)

@felixguendling
Copy link
Owner

felixguendling commented Jun 18, 2023

Correct. Currently cista::string/string_view have both the behavior of std::string_view in the way that it doesn't store the terminating \0 like C-style strings do. The reason is that usually this terminating \0 is not something you want to have serialized into a compact binary buffer. A terminating \0 is not necessary in case you know the exact length (which is the case in cista::string). The only reason you might want to have the terminating \0 would be compatibility to library code written in C. In all other cases, you do not want to have the overhead of storing/transmitting obviously redundant information (size + \0 terminator).

It is, however, not that hard to trick cista::string into storing your extra \0. One way would be to call the constructor that takes a char const* and a length. There, you can set the length to the length of the string including the terminating \0. You might want to create a wrapper around cista::string that uses this trick in a few more places. But I don't think it's necessary to create a completely new type for this purpose.

basic_string(char const* s, typename base::msize_t const len)

@ChemistAion
Copy link

ChemistAion commented Jun 22, 2023

Consider an idea of automatic "null-terminator with size" for small-string optimization (by Andrei Alexandrescu):
https://youtu.be/kPR8h4-qZdk?t=410

With a little bit of "mixing" and use it as could be embedded for non small-string (adding extra 4/8 bytes at the end for size/null-terminator, as above).

This will help to use cista::string.data()/.begin() directly for const char* inputs, since now we have to go through conversion to std::string_view/std::string.c_str().

@felixguendling
Copy link
Owner

felixguendling commented Jun 29, 2023

That technique makes sense. Currently, the cista::string does not have a capacity (only size). The idea is that for serialization, the capacity and size would always be the same, so there's no point in having an extra field. If you use the data structure as a replacement for std::string that's a different story.

Overall I think it makes sense to write a new generic class, that can work as a vector and a string with the "small-vector" or "small-string" optimization. Making this generic has the advantage that cista::vector would not need to allocate memory in case the data fits into its fields. Another advantage would be to be able to change CharT and have a cista::wstring. Doing Andrei's optimization would also be nice.

However, currently I am busy with another project. So don't expect this to happen very soon (this also applies to the other issues you opened which would probably also benefit from this change).

@ChemistAion
Copy link

Thank you for your analysis, I appreciate your insights.
...in the meantime, I will try to propose: no so generic solution - focusing specifically on non-heap string in the next few days.

khng300 added a commit to khng300/cista that referenced this issue Oct 29, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 5, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 12, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 12, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 12, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 12, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 24, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 25, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 25, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 25, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 25, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 25, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
khng300 added a commit to khng300/cista that referenced this issue Nov 25, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
felixguendling#187 (comment).

See felixguendling#187.
felixguendling pushed a commit that referenced this issue Nov 25, 2023
This new type is able to store a trailing \0 character, without
compromising one byte for storage when the string is a small-string.

Storage of NUL character within data is also supported.

This is inspired by
#187 (comment).

See #187.
@khng300 khng300 closed this as completed Nov 26, 2023
@ChemistAion
Copy link

@khng300 Wow, superb work! I will be conducting comprehensive tests on my end throughout the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants