-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all: add dpi parameter as manual override to image metadata #108
Conversation
Codecov Report
@@ Coverage Diff @@
## master #108 +/- ##
==========================================
- Coverage 39.14% 37.57% -1.58%
==========================================
Files 9 9
Lines 894 942 +48
Branches 190 204 +14
==========================================
+ Hits 350 354 +4
- Misses 492 528 +36
- Partials 52 60 +8
Continue to review full report at Codecov.
|
10871e0
to
3a684a5
Compare
@kba when you merge, please don't forget to update As for versioning, I think 108, 109 and 110 are all new features, therefore I suggest |
I'll try to, but I'm grateful for contributions. You have the best overview to outline your PR in broad strokes. Updated for 0.7.0 in master. |
ok, first, for 0.7.0, I believe these need to move from
then for 0.8.0, how about this:
|
Fixes #102.
This does influence segmentation and OSD a lot, but not so sure about (LSTM) recognition. Maybe we should document this somewhere – it's one of very few parameters we have to influence the quality of Tesseract's layout analysis (i.e. deliberately setting a fake value can improve results).
Note that Tesseract is optimised for modern fonts (which are typically smaller than historic ones) and thus biased against historic prints. So setting higher than factual DPI could be a useful recommendation for block (and maybe even line) segmentation. But this is still conjecture and I have only little evidence supporting it – someone would have to make systematic measurements first.