v4.5.0 #1285
AngledLuffa
announced in
Announcements
v4.5.0
#1285
Replies: 2 comments
-
Hi, congrats on the release! Is there a plan for publication to Maven central? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hopefully in the next few days! We were just making sure there are no
horrible bugs
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
CoreNLP 4.5.0
Main features are improved lemmatization of English, improved tokenization of both English and non-English flex-based languages, and some updates to tregex, tsurgeon, and semgrex
All PTB and German tokens normalized now in PTBLexer (previously only German umlauts).
This makes the tokenizer 2% slower, but should avoid issues with resume' for example
d46fecd
log4j removed entirely from public CoreNLP (internal "research" branch still has a use)
f05cb54
Fix NumberFormatException showing up in NER models: java.lang.NumberFormatException: Bad number put into wordToNumber #547 5ee2c39
Fix "seconds" in the lemmatizer: e7a073b
Fix double escaping of & in the online demos: 8413fa1
Report the cause of an error if "tregex" is asked for but no parse annotator is added: 4db80c0
Merge ssplit and cleanxml into the tokenize annotator (done in a backwards compatible manner): Cleanxml #1259
Custom tregex pattern, ROOT tregex pattern, and tsurgeon operation for simultaneously moving a subtree and pruning anything left behind, used for processing the Italian VIT treebank in stanza: Add a moveprune operation which prunes an empty node if needed after … #1263
Refactor tokenization of punctuation, filenames, and other entities common to all languages, not just English: 3c40ba3 58a2288 8b97d64
Improved tokenization of number patterns, names with apostrophes such as Sh'reyan, non-American phone numbers, invisible commas 9476a8e 6193934 afb1ea8 7c84960
Significant lemmatizer improvements: adjectives & adverbs, along with some various other special cases Ud feats #1266
Include graph & semgrex indices in the results for a semgrex query (will make the results more usable) 45b47e2
Trim words in the NER training process. spaces can still be inside a word, but random whitespace won't ruin the performance of the models 0d9e9c8
This discussion was created from the release v4.5.0.
Beta Was this translation helpful? Give feedback.
All reactions