-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Added JPEG quality option parameter (-c jpg_quality=n) #1265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If my memory serves me well, @jbreiden (author of the pdf renderer code) didn't like this feature. |
https://web.archive.org/web/20150413012101/https://code.google.com/p/tesseract-ocr/issues/detail?id=1300
|
Let's talk about this for a bit. Tesseract's PDF module tries really hard to inline images instead of transcoding them, which means that the JPEG quality parameter should be rarely used. @tleegwater can you tell us what sort of image files you are feeding to Tesseract? Some sort of TIFF? Maybe attach one if possible? I mainly want to make sure we don't have an accidental transcode situation. |
I'm feeding TIFF file to Tesseract. And I'm aware that as soon as I feed it JPEG, or something else that's supported, Tesseract will try to inline the inputfile. |
That's fine, and I'm fine with this changelist. Can you please tell me what flavor of TIFF you are working with? Uncompressed? LZW? Pack? Deflate? JPEG? CCITT Group 4? You can find out with |
We use no compression at all for our TIFF's We are generating them ourselves so there's no chance we'll get anything other than Uncompressed. |
Got it. Please be aware that the built in JPEG encoder is standard libjpeg. If you ever want more precise control (for example, turning off chroma subsampling) a fancier encoder like Guetzli, then JPEG encode the images before feeding to Tesseract. https://research.googleblog.com/2017/03/announcing-guetzli-new-open-source-jpeg.html |
Merge branch 'master' into jpg_quality_option * master: (577 commits) fix issue tesseract-ocr#1889 Add badges for download , licence and lgtm Replace macro MINGW by __MINGW32__ EquationDetectBase: Define virtual destructor in .cpp file BlobGrid: Define virtual destructor in .cpp file GridBase: Define virtual destructor in .cpp file AlignedBlob: Define virtual destructor in .cpp file TransposedArray: Define virtual destructor in .cpp file IndexMapBiDi: Define virtual destructor in .cpp file Add missing include file (fixes linker error for Visual Studio) NthItemTest: Add definition for virtual destructor HeapTest: Add definition for virtual destructor IcuErrorCode: Define virtual destructor in .cpp file Validator: Define virtual destructor in .cpp file Dawg: Define virtual destructor in .cpp file CUtil: Define virtual destructor in .cpp file IndexMap: Define virtual destructor in .cpp file CCUtil: Define virtual destructor in .cpp file MATRIX: Define virtual destructor in .cpp file CCStruct: Define virtual destructor in .cpp file ...
I think the API break is unnecessary.
|
Thanks for pointing to this issue. What it better for maintaining compatibility of C-API:
|
It's not just C API break, it's also a C++ break. |
As said. I think the C++ code can be changed to not break previous API. |
Ok. I got it. First I think about extending API, but it is not need because jpg quality is handled by tesseract parameter.... |
Done. Please check. |
I didn't test it, but the change LGTM. |
* 'master' of https://github.com/tesseract-ocr/tesseract: Remove code for _MSC_VER < 1900 keep API compatibility with #1265 Update googletest submodule to release v1.8.1 Update test submodule Always use isascii() with isspace() Avoid crash with --psm 0 and LSTM traineddata SVPaint: Remove empty block Classify: Don't hide debug parameter UNICHARMAP: Remove comparison which is always false svpaint: Change a variable from global to local pgedit: remove unused declaration of display_bln_lines Plumbing: Remove comparison which is always false Release candidate 2 use pdf L_FLATE_ENCODE only for png input; fixes #1961
I needed to be able to specify the JPEG quality level in PDF output files, so I made this parameter optional. Default JPEG quality will still be 85.