The F6335A, announced in April 1991, was an OCR form reader that could recognize handwritten characters including kanji characters. The reader (which included the capability to read handwritten kanji characters) was integrated into a tabletop scanner and connected to either a workstation or a personal computer. The main applications for the F6335A were business processes that used standardized forms. [Note 1] Fujitsu produced the F6335A with the following two product configurations:
- DATAEYE-150:A standard model that emphasized cost performance. It processed up to 12 A4 handwritten sheets per minute.
- DATAEYE-170:A high-speed model with support for A3 forms. It processed up to 20 A4 handwritten sheets per minute.
At the time of its release, OCR was increasingly being used in various data input operations such as insurance applications or order slips for the delivery industry. Given the sensitivity of these applications, OCR customers were demanding higher recognition accuracies, in addition to support for handwritten kanji characters and larger forms. Fujitsu designed the F6335A with the following features to address these demands.
- Expanded the range of forms that could be read [Note 2]: supported paper thicknesses from thin paper to dry-sealed postcards, paper sizes from A8 to A3, and forms created with word processors
- Created an integrated (compact) circuit board for the multiple compression method [Note 3] to enable high-precision, high-performance recognition of handwritten kanji characters
- Able to quickly scan images of red-ink seals with a red image scanner using a three-color light source (added a green source to the previous red and blue sources)[Note 4]
- Image filing of complete forms
- Included a function to detect positional offsets [Note 5] (enabling the unit to read word-processor forms with black outline boxes, photocopied forms, and plain paper faxes)
- Added a user dictionary function to the address dictionary and name dictionary for high-precision character recognition [Note 6] (cut the previous model’s error rate by half)
Note 1: The primary OCR application at the time was processing forms with standardized formats, such as application forms or deposit and withdrawal slips.
Note 2: Scanner technologies (paper feed technologies and optical scanning technologies) were key in being able to handle differences in paper sizes, thicknesses, and surface processing. For example, to scan a larger form requires control over more paper-feed rollers, which requires advances in technologies to control all the rollers at a fixed speed. And faster technologies are needed when reading large forms at high speeds because more data needs to be processed in a given time.
Note 3: The multiple compression method was first implemented in the handwritten kanji character recognition unit on the FACOM 6679A. But on the F6335A, the unit was integrated on a circuit board, making it more compact and capable of faster speeds.
Note 4: Fujitsu had already developed red image scanning functionality for its OCR unit announced in 1983, but for the F6335A, the company first used a method that scanned images with different color components in a single scan by placing light sources in two different positions. OCR forms were printed with a dropout color that scanners could not read for entry boxes. Red was used as the dropout color for ordinary two-color light sources (red and blue), but this meant the OCR units could not read red-ink seals and other red images. Optics that could handle the red dropout colors were needed to read red images. On the F6335A, Fujitsu located a green light source in the ejection assembly in addition to the red and blue light sources in the scanner unit. As a result, the unit could process all three light source images in a single scan.
Note 5: The technology to detect out-of-position forms was part of the technologies to improve scanning accuracy. Specifically, Fujitsu improved a series of technologies that, starting with the “set of points” obtained from the mechanical scan, (1) aligned the overall form using the standard reference marks printed on the form, (2) removed surplus information such as dirt and smudges, (3) adjusted the reading locations by detecting black outline boxes, (4) removed black outline boxes (dropout boxes), (5) extracted character strings in the adjusted reading locations, and (6) recognized characters in the “clusters of points” thought to be individual characters from the character strings. Technologies (1) through (6) were connected together.
Note 6: To correctly recognize a character, it is sufficient to assess the validity of a character by its meaning when combined with the characters before and after it, much like humans do. For example, recognition accuracies can be increased by checking addresses and names that often appear on forms against prepared address and name dictionaries. The F6335A offered prepackaged address and name dictionaries and an advanced reference method. It also came with a user dictionary function that let users add reference words to suit the user’s application or circumstances.