Skip to content

mlprimitives.custom.text.TextCleaner fails if text is empty #228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
csala opened this issue Jan 16, 2020 · 0 comments · Fixed by #229
Closed

mlprimitives.custom.text.TextCleaner fails if text is empty #228

csala opened this issue Jan 16, 2020 · 0 comments · Fixed by #229
Assignees
Labels
bug There is an error in the code that needs to be fixed
Milestone

Comments

@csala
Copy link
Contributor

csala commented Jan 16, 2020

When the collection of texts to clean contains an empty string "", the mlprimitives.custom.text.TextCleaner._remove_stopwords crashes.

In [1]: from mlprimitives.custom.text import TextCleaner                                                                                                                                                                                                                       

In [2]: cleaner = TextCleaner()                                                                                                                                                                                                                                                

In [3]: cleaner.produce(['not empty', ''])                                                                                                                                                                                                                                     
---------------------------------------------------------------------------
LangDetectException                       Traceback (most recent call last)
<ipython-input-3-342ec016e729> in <module>
----> 1 cleaner.produce(['not empty', ''])
...
~/.virtualenvs/MLPrimitives/lib/python3.6/site-packages/langdetect/detector.py in _detect_block(self)
    148         ngrams = self._extract_ngrams()
    149         if not ngrams:
--> 150             raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')
    151 
    152         self.langprob = [0.0] * len(self.langlist)

LangDetectException: No features in text.
@csala csala self-assigned this Jan 16, 2020
@csala csala added the bug There is an error in the code that needs to be fixed label Jan 16, 2020
@csala csala added this to the 0.2.4 milestone Jan 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug There is an error in the code that needs to be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant