【NTT Data】OCR40/OCR400

The OCR40 and OCR400, ,which used as computer terminals for reading handwritten alphabets, numerals, katakana characters, and symbols with greater accuracies than previous models, were designed with the aim of expanding OCR applications to banking systems. In 1991, these OCRs were installed in customer-service operation that accepted annual public revenue payments in a banking data communication system. NTT Data offered two models: the OCR40, a general-purpose page reader for customer-service operation, and the OCR400, an ultra-fast document reader for high-volume batch processing.

From the mid-1980s onward, the financial sector was pushing for ways to mechanize such operations as processing deposits that had reached massive volumes, and expectations became high for a terminal OCR that could recognize amounts of money and ID information with high accuracies without requiring any condition about hand-writing. The OCR40 and the OCR400 had been designed to fulfill these expectations. They supplemented the Feature Concentration Method, which had been used on the DT-OCR100 series of terminal OCRs with a track record in Ministry of Labor and Social Insurance Agency data communication systems, with a new feature extraction method that focused on the contour shape of character strokes. This adaptation resulted in a magnitude improvement (about ten times) over previous OCRs in reading accuracy, particularly with handwritten numerals (compared with NTT Data’s DT-OCR100 series, which was the highest standard at the time).

The OCR40 and the OCR400 were distinctive for their highly accurate reading of handwritten alphabets and katakana characters and their recognition techniques. They had strong points in their practicality and general versatility in transmittal form processing and other data entry operations, with the addition of advanced preprocessing such as character sagmentaion, a rich set of data check functions (validation of numerical and alphabet strings), and an image scan function for handwritten fields. Furthermore, the EA series of terminal OCRs, the successor to the DT-OCR100 series, installed in Ministry of Labor systems, and the CB series of terminal OCRs installed in Social Insurance Agency systems were developed from the general-purpose OCR40 page reader and utilized its superior handwritten character recognition accuracy.

The main technologies implemented in the OCR40 and the OCR400 were as follows:
  • Higher recognition accuracies achieved with a combination of Feature Concentration Method and Contour Feature Extraction Method: In the former method, global features focused on the character pattern background were extracted. On the other hand, the latter method extracted local features of character strokes. By using structure information from the character background and from the character strokes in a complementary fashion, achieved higher recognition accuracies.
  • Advanced preprocessing such as character segmentation: High recognition accuracies of ordinary handwritten characters (i.e., normal handwritten characters filled in without concern for OCR reading limitations) was ensured with preprocessing such as character segmentation and density corrections. Character isolation correctly aligned character stroke segments that extended beyond field boxes or extended into other field boxes, and density corrections were carried out to obtain patterns that were robust against broken or messy character lines.
  • Reduced costs and improved reading speeds with parallel implementations of recognition units: Recognition was carried out on relatively low-speed, single-board recognition units. If high speed recognition was necessary, an appropriate number of the units could be used in a single OCR. This achieved better cost performance on the OCR40. To make the OCR400 into an ultra-fast document reader, the company chained together reader units (consisting of a scanning photoelectric conversion unit, a preprocessing unit, and multiple recognition units) each of which was equivalent to an OCR40 to read each line on an input form.
  • Added a rich set of data check functions: The readers performed a number of digit checks, including number range checks, checks of whether fields had been completed, and modulus value checks. When field reading errors were detected, the readers prompted the operator to make corrections in order to reduce read errors.
  • Added an image scan function for handwritten fields: Addresses, names, and other information associated with ID information could be entered in kanji characters instead of katakana characters for operator convenience and an image scan function was added for these fields.
Main specifications of the OCR40 and the OCR 400
Parameter OCR40
(a general-purpose page OCR reader)
OCR400
(an ultra-fast document OCR reader)
Recognition method Combination of Feature Concentration Method and Contour Feature Extraction Method
Read speeds Character reading Up to 180 characters per second Up to 1,500 characters per second
Form processing Up to 21 sheets per minute
(reading A4 forms with 10 lines of 30 characters each)
Up to 180 sheets per minute
Readable characters Handwritten

Alphabets, numerals, katakana characters,symbols
(mixed reading supported),
handwritten marks (within specified entry boxes)

Printed Numerals and symbols (OCR-B size I)
Forms Sizes and thicknesses From 82.5 x 105 mm to 364 x 297 mm (l x w)
ream weight from 55 to 110 kg
From 102 x 185 mm to 240 x 230 mm (l x w)
ream weight from 70 to 110 kg
Paper quality OCR paper, fine paper (brand specification)
Paper feed Continuous (automatic) /
One-sheet (manual feed)
Continuous (automatic)
Form capacities Hopper Up to 65 or 130 sheets (selectable) Up to 280 sheets
Stackers Accept: 130 sheets
Reject: 20 sheets
Accept 1: 280 sheets
Accept 2: 280 sheets
Reject: 28 sheets
Other functions Image scanning: compressed image of any area (OCR40 only),
image of read field
Data checks: check digits, numerical ranges, double marks / no marks
Sequence number printing Six-digit numbering
(printed on reverse side of the form, print location selectable)
Interface Connectable to console (CWS 110A) with the GPIB interface
Dimensions and weight 63 x 100 x 105 cm (w x d x h), 242 kg
Reader unit:
140 x 68 x 140 cm (w x d x h), about 325 kg
Recognition unit (in two cabinets):
1,676 x 68 x 140 cm (w x d x h), about 840 kg
Power supply 100±10 V (single phase), approximately 1.6 KVA 200± 20 V (single phase), approximately 11 KVA

NTT Data OCR40 general-purpose page readerNTT Data OCR400 ultra-fast document reader prototype version
(left: recognition unit, right: reader unit)
NTT Data OCR400 ultra-fast document reader production version
(left: recognition unit (one cabinet), right: reader unit)

In the explanation for OCR, terminology from OCR Catalogue Glossary (Version 2) published by the Japan Electronics and Information Technology Industries Association is used.Please refer to this Glossary for meanings of terms used.