Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: add dpi parameter as manual override to image metadata #108

Merged
merged 1 commit into from
Jan 24, 2020

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Jan 23, 2020

Fixes #102.

This does influence segmentation and OSD a lot, but not so sure about (LSTM) recognition. Maybe we should document this somewhere – it's one of very few parameters we have to influence the quality of Tesseract's layout analysis (i.e. deliberately setting a fake value can improve results).

Note that Tesseract is optimised for modern fonts (which are typically smaller than historic ones) and thus biased against historic prints. So setting higher than factual DPI could be a useful recommendation for block (and maybe even line) segmentation. But this is still conjecture and I have only little evidence supporting it – someone would have to make systematic measurements first.

@bertsky bertsky added the enhancement New feature or request label Jan 23, 2020
@bertsky bertsky requested review from kba and wrznr January 23, 2020 23:06
@codecov
Copy link

codecov bot commented Jan 23, 2020

Codecov Report

Merging #108 into master will decrease coverage by 1.57%.
The diff coverage is 7.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #108      +/-   ##
==========================================
- Coverage   39.14%   37.57%   -1.58%     
==========================================
  Files           9        9              
  Lines         894      942      +48     
  Branches      190      204      +14     
==========================================
+ Hits          350      354       +4     
- Misses        492      528      +36     
- Partials       52       60       +8
Impacted Files Coverage Δ
ocrd_tesserocr/segment_table.py 0% <0%> (ø) ⬆️
ocrd_tesserocr/crop.py 12.93% <0%> (-0.84%) ⬇️
ocrd_tesserocr/deskew.py 16.19% <0%> (-1.16%) ⬇️
ocrd_tesserocr/segment_word.py 72.88% <12.5%> (-7.89%) ⬇️
ocrd_tesserocr/segment_line.py 70.76% <12.5%> (-6.82%) ⬇️
ocrd_tesserocr/recognize.py 50.72% <12.5%> (-0.96%) ⬇️
ocrd_tesserocr/segment_region.py 57.03% <12.5%> (-2.48%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ccde94...3a684a5. Read the comment docs.

@bertsky
Copy link
Collaborator Author

bertsky commented Jan 24, 2020

@kba when you merge, please don't forget to update CHANGELOG.md this time.

As for versioning, I think 108, 109 and 110 are all new features, therefore I suggest 0.8.0.

@kba
Copy link
Member

kba commented Jan 24, 2020

please don't forget to update CHANGELOG.md this time

I'll try to, but I'm grateful for contributions. You have the best overview to outline your PR in broad strokes. Updated for 0.7.0 in master.

@kba kba merged commit 7d7315d into OCR-D:master Jan 24, 2020
@bertsky
Copy link
Collaborator Author

bertsky commented Jan 24, 2020

I'll try to, but I'm grateful for contributions. You have the best overview to outline your PR in broad strokes.

ok, first, for 0.7.0, I believe these need to move from Added to Changed:

then for 0.8.0, how about this:

@bertsky bertsky deleted the dpi-overrides branch February 21, 2020 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Additional parameter for DPI override
3 participants