Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offer search term spelling corrections #731

Open
kelson42 opened this issue Oct 23, 2022 · 5 comments
Open

Offer search term spelling corrections #731

kelson42 opened this issue Oct 23, 2022 · 5 comments

Comments

@kelson42
Copy link
Contributor

This is a common feature of mean free text search engines and this can be helpful.

Xapian provides a core feature for that
https://docs.huihoo.com/xapian/docs/spelling.html

Original ticket on Sourceforge https://sourceforge.net/p/kiwix/bugs/849/

Download the Kiwix application for android and I installed it. I
downloaded the Wiktionary in Spanish, unzipped it and upload it to the
external memory of the smart phone. Since I read the application file
and all is well.

But when I write the wrong word Wiktionary does not correct
me. Example: in Spanish is written: ZAPATO. If SAPATO write the
application tells me: Error: "failed load SAPATO article", but does
not correct me should show me the options. You mean I have to be an
expert in the language to find, does not help me that way because the
objective is to correct me when I'm wrong.

If I do the same on the computer shows me as options:
1- Zapato
2 - Calzado
3 - Pasta de zapatos
4 - It is possible to improve the application for android?
5 - I'm failing at something?
@gremid
Copy link

gremid commented Sep 2, 2024

Here is a quick proof-of-concept in Python, showing that Xapian's builtin functionality would cover some common misspellings as conducted by people learning German, either as their first or as a second language.

https://github.com/gremid/xapian-spelling-suggestions/

Two changes two libzim's index code would be necessary:

  1. During indexing the title of a ZIM entry has to be added to a spelling dictionary which is later used for lookups.
  2. During retrieval and in case that there are no results for a given (exact) query, the spelling dictionary would be queried for suggestions.

@mgautierfr
Copy link
Collaborator

Do we want to have spelling suggestion to both fulltext and suggestion (title) searches ?

@gremid
Copy link

gremid commented Jan 14, 2025

Title suggestions would be sufficient from our, that is the DWDS perspecitve. As a dictionary, headword/title search is the main use case. Also the app is already sizable in comparison to the average, so saving some space by only indexing titles would also be in our immediate interest.

@kelson42
Copy link
Contributor Author

@mgautierfr Only suggestions. I don't think we should make this optional (because the additional index data are not that big), but we need a way to be the libzim backward compatible.

@mgautierfr
Copy link
Collaborator

Xapian provides two methods to add and retrieve spelling suggestion:

  • WritableDatabase::add_spelling, at db creation add a word to be considered has spelling suggestion
  • Database::get_spelling_suggestion, at runtime, give one (and only one) suggestion for a given word.

Proposition:

At libzim level, there is really few to do:

  • Add a metadata to the db to tell suggestion is available.
  • When adding a item (suggestion title) to the database, we add the words of the item (minus stop words) to the db.
  • Add a method to SuggestionSearcher::has_spelling_suggestion to tell if spelling suggestion is available.
  • Add a method SuggestionSearcher::get_spelling_suggestion to get the spelling suggestion for a word. This method would simply forward the call to get_spelling_suggestion.
    If spelling is not available (old db), get_spelling_suggestion will return an empty string.

While this technically add spelling suggestion feature to libzim, the majority of the work has to be done in dependent projects:

  • Check for spelling suggestion for each word of the query
  • Ask user if they what to use correction or do automatic correction ?
  • Handle multi lang (which stop words to use ?)
  • Rerun suggestion search with corrected query
  • Improve UX

Note that suggestion is totally independent of the language (no stopwords, stem are used by xapian) It is up to caller code (zim-tools, libkwix, kiwix-tools) to properly remove stop words and ask for suggestion and use them when appropriated.

Testing:

  • Create a new testing zim file with spelling suggestion
  • Test spelling suggestion is available or not depending of the zim file "version"
  • Test spelling suggestion when applicable

@kelson42 kelson42 modified the milestones: 9.3.0, 10.0.0 Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants