-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offer search term spelling corrections #731
Comments
Here is a quick proof-of-concept in Python, showing that Xapian's builtin functionality would cover some common misspellings as conducted by people learning German, either as their first or as a second language. https://github.com/gremid/xapian-spelling-suggestions/ Two changes two libzim's index code would be necessary:
|
Do we want to have spelling suggestion to both fulltext and suggestion (title) searches ? |
Title suggestions would be sufficient from our, that is the DWDS perspecitve. As a dictionary, headword/title search is the main use case. Also the app is already sizable in comparison to the average, so saving some space by only indexing titles would also be in our immediate interest. |
@mgautierfr Only suggestions. I don't think we should make this optional (because the additional index data are not that big), but we need a way to be the libzim backward compatible. |
Xapian provides two methods to add and retrieve spelling suggestion:
Proposition:At libzim level, there is really few to do:
While this technically add spelling suggestion feature to libzim, the majority of the work has to be done in dependent projects:
Note that suggestion is totally independent of the language (no stopwords, stem are used by xapian) It is up to caller code (zim-tools, libkwix, kiwix-tools) to properly remove stop words and ask for suggestion and use them when appropriated. Testing:
|
This is a common feature of mean free text search engines and this can be helpful.
Xapian provides a core feature for that
https://docs.huihoo.com/xapian/docs/spelling.html
Original ticket on Sourceforge https://sourceforge.net/p/kiwix/bugs/849/
The text was updated successfully, but these errors were encountered: