Announced in May 1988, the FACOM 6365 was a document reader that could read non-standard documents printed in Japanese without requiring definitions specifying the document structure. [Note 1] The reader was connected and used with workstations and personal computers.
With the spread of Japanese-language word processors, documents printed in Japanese were becoming commonplace in offices at the time, replacing handwritten documents. This change created increasing needs for high-speed reading of books, magazines, and other documents printed in Japanese and functions [Note 2] to convert the scanned information into Japanese document data that could be used with word processors and other devices.
To read various styles of documents printed in Japanese it was not enough to recognize characters as images necessary. It was necessary to identify the regions to be read and recognized without requiring predefined format information, such as that for OCR forms, so that there was no need to standardize the style of scanned regions. To accomplish this, the FACOM 6365 included technology to enable the following features:
- Distinguish between text, figures, charts, and photos mixed within a single document
- Understand various column layouts
- Identify blocks of text
- Order blocks according to the flow of text from the layout of the blocks [Note 3]
- Accelerate identification and recognition of blocks, lines of text, and characters (dedicated LSI is used for this purpose)
In addition to the technology to identify character recognition regions describe above, Fujitsu also enhanced character recognition performance with the following technologies:
- Read text with mixed fonts and read text with a mixture of point sizes [Note 4]
- Technology that compares recognition results with Japanese grammatical rules using Japanese syntax post-processing [Note 5]
The main features of the FACOM 6365 were as follows:．
- Maximum original size: B4
- Readable character types: 3,911 characters (letters, numbers, hiragana characters, katakana characters, symbols, kanji characters (all JIS first standard characters and some JIS second standard characters))
- Character recognition speed: 20 characters per second
- Processing speed: 3 sheets per minute (including the time for Japanese syntax post-processing)
Note 1: When using standard forms, characters usually read based on definitions specifying the form’s layout (format). When working with non-standard ordinary documents, naturally it is impossible to define the format in advance.
Note 2: From a computer’s perspective, printed characters are nothing more than image data consisting of simple dots and lines. In order to make a printed document editable with a word processor or other device, it is necessary to separate text regions and non-text regions (photos, pictures, figures) from the scanned document and to assign character codes corresponding to the characters in the text regions.
Note 3: The FACOM 6365 automatically identified and ordered text blocks, but it provided the ability to make manual corrections. The Japanese syntax post-processing, described below, assumes that the text blocks have been identified and ordered correctly.
Note 4: This technology was able to recognize characters in a mixture of different fonts, such as Mincho, Gothic, and Textbook typefaces, in different point sizes (from 7 to 28 point), and in different character pitches. This technology was based on Fujitsu’s handwritten character recognition techniques.
Note 5: This technology finalized the recognition of characters from the perspective of their syntactic validity at the post-processing stage. When sentences or single words span multiple blocks, the possibility of false recognitions increases. To prevent this from happening, this technology evaluated the validity of the recognized character against the surrounding characters and against grammatical rules and narrowed down the candidate characters. Fujitsu applied this technology to OCR for the first time on the FACOM 6365.