Skip to content

Commit a36a5f9

Browse files
committed
Minor edits to Readme
1 parent f8ebff2 commit a36a5f9

File tree

1 file changed

+35
-31
lines changed

1 file changed

+35
-31
lines changed

README.md

+35-31
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,52 @@
11
Note that this is a text-only and possibly out-of-date version of the
22
wiki ReadMe, which is located at:
33

4-
https://github.com/tesseract-ocr/tesseract/blob/master/README
4+
https://github.com/tesseract-ocr/tesseract/blob/master/README.md
55

66
Introduction
77
============
88

99
This package contains the Tesseract Open Source OCR Engine.
10-
Originally developed at Hewlett Packard Laboratories Bristol and
11-
at Hewlett Packard Co, Greeley Colorado, all the code
10+
Originally developed at Hewlett-Packard Laboratories Bristol and
11+
at Hewlett-Packard Co, Greeley Colorado, all the code
1212
in this distribution is now licensed under the Apache License:
1313

14-
* Licensed under the Apache License, Version 2.0 (the "License");
15-
* you may not use this file except in compliance with the License.
16-
* You may obtain a copy of the License at
17-
* http://www.apache.org/licenses/LICENSE-2.0
18-
* Unless required by applicable law or agreed to in writing, software
19-
* distributed under the License is distributed on an "AS IS" BASIS,
20-
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
21-
* See the License for the specific language governing permissions and
22-
* limitations under the License.
14+
Licensed under the Apache License, Version 2.0 (the "License");
15+
you may not use this file except in compliance with the License.
16+
You may obtain a copy of the License at
17+
18+
http://www.apache.org/licenses/LICENSE-2.0
19+
20+
Unless required by applicable law or agreed to in writing, software
21+
distributed under the License is distributed on an "AS IS" BASIS,
22+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
23+
See the License for the specific language governing permissions and
24+
limitations under the License.
2325

2426

2527
Dependencies and Licenses
2628
=========================
2729

28-
Leptonica is required. (www.leptonica.com). Tesseract no longer compiles
29-
without Leptonica.
30+
[Leptonica](http://www.leptonica.com) is required. Tesseract no longer
31+
compiles without Leptonica.
32+
3033
Libtiff is no longer required as a direct dependency.
3134

3235

3336
Installing and Running Tesseract
3437
--------------------------------
3538

3639
All Users Do NOT Ignore!
40+
3741
The tarballs are split into pieces.
3842

3943
tesseract-x.xx.tar.gz contains all the source code.
4044

41-
tesseract-x.xx.<lang>.tar.gz contains the language data files for <lang>.
45+
tesseract-x.xx.`<lang>`.tar.gz contains the language data files for `<lang>`.
4246
You need at least one of these or Tesseract will not work.
4347

4448
Note that tesseract-x.xx.tar.gz unpacks to the tesseract-ocr directory.
45-
tesseract-x.xx.<lang>.tar.gz unpacks to the tessdata directory which
49+
tesseract-x.xx.`<lang>`.tar.gz unpacks to the tessdata directory which
4650
belongs inside your tesseract-ocr directory. It is therefore best to
4751
download them into your tesseract-x.xx directory, so you can use unpack
4852
here or equivalent. You can unpack as many of the language packs as you
@@ -52,7 +56,7 @@ before you run make install. If you unpack them as root to the
5256
destination directory of make install, then the user ids and access
5357
permissions might be messed up.
5458

55-
boxtiff-2.xx.<lang>.tar.gz contains data that was used in training for
59+
boxtiff-2.xx.`<lang>`.tar.gz contains data that was used in training for
5660
those that want to do their own training. Most users should NOT download
5761
these files.
5862

@@ -63,8 +67,8 @@ Tesseract wiki https://github.com/tesseract-ocr/tesseract/wiki
6367
Windows
6468
-------
6569

66-
Please use installer (for 3.00 and above). Tesseract is library with
67-
command line interface. If you need GUI, please check AddOns wiki page
70+
Please use the installer (for 3.00 and above). Tesseract is a library with a
71+
command line interface. If you need a GUI, please check the AddOns wiki page.
6872

6973
TODO-UPDATE-WIKI-LINKS
7074

@@ -74,15 +78,15 @@ If you are building from the sources, the recommended build platform is
7478
VC++ Express 2008 (optionally 2010).
7579

7680
The executables are built with static linking, so they stand more chance
77-
of working out of the box on more windows systems.
81+
of working out of the box on more Windows systems.
7882

7983
The executable must reside in the same directory as the tessdata
8084
directory or you need to set up environment variable TESSDATA_PREFIX.
8185
Installer will set it up for you.
8286

8387
The command line is:
8488

85-
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]
89+
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]
8690

8791
If you need interface to other applications, please check wrapper section
8892
on AddOns wiki page:
@@ -98,19 +102,19 @@ Non-Windows (or Cygwin)
98102
You have to tell Tesseract through a standard unix mechanism where to
99103
find its data directory. You must either:
100104

101-
./autogen.sh
102-
./configure
103-
make
104-
make install
105-
sudo ldconfig
105+
./autogen.sh
106+
./configure
107+
make
108+
make install
109+
sudo ldconfig
106110

107111
to move the data files to the standard place, or:
108112

109-
export TESSDATA_PREFIX="directory in which your tessdata resides/"
113+
export TESSDATA_PREFIX="directory in which your tessdata resides/"
110114

111115
In either case the command line is:
112116

113-
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]
117+
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]
114118

115119
New there is a tesseract.spec for making rpms. (Thanks to Andrew Ziem for
116120
the help.) It might work with your OS if you know how to do that.
@@ -126,8 +130,8 @@ instead of `./configure` above.
126130

127131
History
128132
=======
129-
The engine was developed at Hewlett Packard Laboratories Bristol and
130-
at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some
133+
The engine was developed at Hewlett-Packard Laboratories Bristol and
134+
at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some
131135
more changes made in 1996 to port to Windows, and some C++izing in 1998.
132136
A lot of the code was written in C, and then some more was written in C++.
133137
Since then all the code has been converted to at least compile with a C++
@@ -138,7 +142,7 @@ lists, but has the big negative that if you do get a segmentation violation,
138142
it is hard to debug.
139143

140144
The most recent change is that Tesseract can now recognize 39 languages,
141-
including Arabic, Hindi, Vietnamese, plus 3 Fraktur variants
145+
including Arabic, Hindi, Vietnamese, plus 3 Fraktur variants,
142146
is fully UTF8 capable, and is fully trainable. See TrainingTesseract for
143147
more information on training.
144148

0 commit comments

Comments
 (0)