Skip to content

Commit fb359fc

Browse files
ivanzz1001zdenop
authored andcommitted
Update unicharset_extractor.cpp (#1153)
* change IsWhitespace to IsUTF8Whitespace To solve "Phase UP: Generating unicharset and unichar properties files" ERROR #1147 please reference: [#1147](#1147) * Update unicharset_extractor.cpp fix the "Phase UP: Generating unicharset and unichar properties files" ERROR * Update unicharset_extractor.cpp fix "Phase UP: Generating unicharset and unichar properties files" ERROR #1147 * Update unicharset_extractor.cpp fix the encoding invalid problem and fix the comment
1 parent 1b0379c commit fb359fc

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

training/unicharset_extractor.cpp

+3-1
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,9 @@ static void AddStringsToUnicharset(const GenericVector<STRING>& strings,
5050
/*report_errors*/ true,
5151
strings[i].string(), &normalized)) {
5252
for (const string& normed : normalized) {
53-
if (normed.empty() || IsWhitespace(normed[0])) continue;
53+
54+
// normed is a UTF-8 encoded string
55+
if (normed.empty() || IsUTF8Whitespace(normed.c_str())) continue;
5456
unicharset->unichar_insert(normed.c_str());
5557
}
5658
} else {

0 commit comments

Comments
 (0)