Skip to content

Commit c3b18cf

Browse files
committed
Improve description of configs and parameters in tesseract(1)
Try to make the relationship between configs, -c and --print-parameters clearer by always using parameter and not variable. Include the filenames created by each config.
1 parent ec8f02c commit c3b18cf

File tree

1 file changed

+22
-21
lines changed

1 file changed

+22
-21
lines changed

doc/tesseract.1.asc

+22-21
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ IN/OUT ARGUMENTS
3636
The basename of the output file (to which the appropriate extension
3737
will be appended). By default the output will be a text file
3838
with `.txt` added to the basename unless there are one or more
39-
'configfile' options which explicitly specify the desired output.
39+
parameters set which explicitly specify the desired output.
4040

4141
'stdout'::
4242
Instruction to send output data to standard output.
@@ -54,7 +54,7 @@ OPTIONS
5454
Specify the location of user patterns file.
5555

5656
'-c configvar=value'::
57-
Set value for control parameter. Multiple -c arguments are allowed.
57+
Set value for parameter 'configvar'. Multiple -c arguments are allowed.
5858

5959
'-l lang'::
6060
The language to use. If none is specified, English is assumed.
@@ -86,20 +86,21 @@ OPTIONS
8686
3 = Default, based on what is available.
8787

8888
'configfile'::
89-
The name of a config to use. A config is a plaintext file which
90-
contains a list of variables and their values, one per line, with a
91-
space separating variable from value. Interesting config files
92-
include: +
93-
* `alto` - Output in ALTO format (file extension `.xml`).
94-
* `hocr` - Output in hOCR format (file extension `.hocr`).
95-
* `pdf` - Output PDF (file extension `.pdf`).
96-
* `tsv` - Output TSV (file extension `.tsv`).
97-
* `txt` - Output plain text (file extension `.txt`).
98-
* `get.images` - Write images.
99-
* `logfile` - Write debug file `tesseract.log`.
100-
* `lstm.train` - Used for LSTM training.
101-
* `makebox` - Output box file.
102-
* `quiet` - Write debug file to /dev/null.
89+
The name of a config to use. A config is a plain text file which
90+
contains a list of parameters and their values, one per line,
91+
with a space separating parameter from value. +
92+
Interesting config files include:
93+
94+
* `alto` - Output in ALTO format ('outputbase'`.xml`).
95+
* `hocr` - Output in hOCR format ('outputbase'`.hocr`).
96+
* `pdf` - Output PDF ('outputbase'`.pdf`).
97+
* `tsv` - Output TSV ('outputbase'`.tsv`).
98+
* `txt` - Output plain text ('outputbase'`.txt`).
99+
* `get.images` - Write processed input images to file (`tessinput.tif`).
100+
* `logfile` - Redirect debug messages to file (`tesseract.log`).
101+
* `lstm.train` - Output files used by LSTM training ('outputbase'`.lstmf`).
102+
* `makebox` - Write box file ('outputbase'`.box`).
103+
* `quiet` - Redirect debug messages to /dev/null.
103104

104105
It is possible to select several config files, for example
105106
`tesseract image.png demo hocr pdf txt` will create three output files
@@ -334,14 +335,14 @@ Tesseract 4 LSTM OCR engine.
334335
CONFIG FILES AND AUGMENTING WITH USER DATA
335336
------------------------------------------
336337
337-
Tesseract config files consist of lines with variable-value pairs (space
338-
separated). The variables are documented as flags in the source code like
338+
Tesseract config files consist of lines with parameter-value pairs (space
339+
separated). The parameters are documented as flags in the source code like
339340
the following one in tesseractclass.h:
340341
341342
STRING_VAR_H(tessedit_char_blacklist, "",
342343
"Blacklist of chars not to recognize");
343344
344-
These variables may enable or disable various features of the engine, and
345+
These parameters may enable or disable various features of the engine, and
345346
may cause it to load (or not load) various data. For instance, let's suppose
346347
you want to OCR in English, but suppress the normal dictionary and load an
347348
alternative word list and an alternative list of patterns -- these two files
@@ -371,8 +372,8 @@ load_freq_dawg F
371372
user_words_suffix user-words
372373
user_patterns_suffix user-patterns
373374
374-
Now, if you pass the word 'bazaar' as a trailing command line parameter
375-
to Tesseract, Tesseract will not bother loading the system dictionary nor
375+
Now, if you pass the word 'bazaar' as a 'configfile' to Tesseract,
376+
Tesseract will not bother loading the system dictionary nor
376377
the dictionary of frequent words and will load and use the eng.user-words
377378
and eng.user-patterns files you provided. The former is a simple word list,
378379
one per line. The format of the latter is documented in dict/trie.h

0 commit comments

Comments
 (0)