Skip to content

Add code tokenizer #3647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 17, 2023
Merged

Add code tokenizer #3647

merged 4 commits into from
Jul 17, 2023

Conversation

fmassot
Copy link
Collaborator

@fmassot fmassot commented Jul 17, 2023

Fix #3628

@fmassot fmassot force-pushed the fmassot/code-tokenizer branch from 2a549aa to b15c91f Compare July 17, 2023 01:47
@fmassot fmassot requested a review from fulmicoton July 17, 2023 01:51
@fmassot fmassot force-pushed the fmassot/code-tokenizer branch from b15c91f to 16691c1 Compare July 17, 2023 01:57
@@ -78,6 +79,220 @@ fn create_quickwit_fastfield_normalizer_manager() -> TokenizerManager {
tokenizer_manager
}

/// TODO: add docs.
#[derive(Clone, Default)]
pub struct CodeTokenizer(Token);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub struct CodeTokenizer(Token);
pub struct CodeTokenizer {
token: Token
}

@fmassot fmassot enabled auto-merge (squash) July 17, 2023 12:44
@fmassot fmassot disabled auto-merge July 17, 2023 12:51
@fmassot fmassot merged commit 9586c5b into main Jul 17, 2023
@fmassot fmassot deleted the fmassot/code-tokenizer branch July 17, 2023 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Code friendly tokenizer.
2 participants