PROPOSED MICROFORM DIGITIZER SYSTEM

Proposed by Robert M. Doerr

INTRODUCTION


Although the presentation of images of data on the Internet is now practical, 
obviously the text data that can be made available via the Internet 
is limited to those data that are available in text form. 
For the most part, the options for converting available data to text are 
to hand-type material into a computer, or, 
if the material is in printed form (as distinct from handwritten), 
sometimes to scan it and use optical character recognition (OCR). 
Flat-bed scanners have become inexpensive and excellent, 
and effective OCR software is available.

However, much of the World's stock of old, valuable data is to be found only on microfilm. 
For the foreseeable future, it appears that, to provide text, 
there is no alternative for hand-written originals but to hand-copy 
from the film to the computer. 
One primary intent here is to facilitate such digitization.

This is to propose a system to enhance the utility of microforms, 
especially microfilms, in the digital age, and 
particularly to facilitate digitization of data from microfilm.

A key consideration is that it is far easier to copy from one window on a 
computer monitor to another on the same monitor than to look 
back-and-forth between a paper document and the monitor. 
(My own experience is that, working from paper documents, 
it is highly advantageous to scan the documents and place a scanned image 
in one window on the monitor, and type into a second window, 
thus avoiding looking back-and-forth from the paper copy to the monitor.)

With microforms, a crude alternative is to make paper copies 
on a digital microform reader/printer, 
then scan the copy and place the image in one window on the monitor 
and type into another window. 
But making many copies on a microform reader/printer is sometimes very costly, 
e.g., up to $1 per page, and hard on the user's neck, 
for reader/printers have vertical projection screens.

If the original is of printed matter, 
it is possible to use a microfilm reader/printer to make a paper copy 
and then OCR the output. 
Again, making many copies on a microform reader/printer is slow and costly.

Graphics software enables image enhancement to enable, in turn, more accurate transcription.

A modern digital microform reader/printer can be connected, 
by use of a 'connectivity package', to a computer for printing and OCR, 
but such a system involves looking back and forth between two screens, 
the one on the reader/printer and the computer monitor, 
and switching back and forth between controls on both the reader/printer 
and the computer keyboard and mouse.

Different users of microforms have different needs; 
the proposed system is intended to accommodate all. 
Almost all users require real-time viewing of the film.

Some need only a quick look at a well-indexed microfilm to hand-copy 
a small amount of information. An example may be an index on film. 
That user, accustomed to the controls 'fast forward', 'forward', 'fast rewind', 
'rewind', 'zoom' and 'focus', wants simple controls and would prefer a 
microfilm reader with a powered film-winding mechanism.

Some need make a paper copy of a selected page from each of a substantial set of microfilms. 
An example may be printing the pages for a certain surname 
from a set of annual city directories. 
The user would prefer a microfilm reader/printer with a powered film-winding mechanism 
and simple controls to include 'print'.

Some search long and hard for a record that may not be well-defined in advance, 
a record to be recognized only when found. 
An example may be an un-indexed list with many mis-spellings in the data. 
This user may prefer a hand-wound microfilm reader to avoid accidental film movements.

Some need find a record, whether or not in a well-indexed dataset, 
and make a paper copy thereof. An example may be a census entry. 
The user would prefer a microfilm reader/printer with a powered film-winding mechanism. 
In general, it would be preferable not to restrict printing to rectangular areas.

Some users seek to transcribe or index entire rolls of microfilm, say, 
for publication or web-page use. Such a user may or may not require a powered film drive, 
but would benefit greatly from computer assistance and 
from having the image of the film and the collected data appear 
in adjacent windows on the monitor.

Some users seek to copy all, or selected, images from a microform. 
These may be whole documents or portions thereof. 
One goal might be to re-sequence the images. 
Another goal would be to create logical sub-sets of the images. 
Images in a set may range in size from business card to double legal.

Some users may seek to extract, say, for publication or web-page use, 
data from many separate places on a roll of film, 
say birth, marriage, divorce, death, burial and social items from a microfilm of old newspapers. 
The object might be to collect these types of data into separate files 
on one pass thru the film. Such a user would benefit greatly from computer controls, 
computer identification of the data type, and from having the image of the film 
and the collected data in adjacent windows on the monitor. 
When the image of printed matter on the film is good, 
optical character recognition (OCR) can be very useful to speed extraction, 
but only with immediate and careful proofing and correction of the OCR results. 
Theoretically, one could use a reader/printer to make paper copies to be scanned for OCR, 
but each such step leads to loss of accuracy and 
that procedure for many pages is slow and costly. 
Many whole-roll jobs are performed by volunteers; 
it is important to heed
RULE ONE: "Never waste the valuable time of capable volunteers."

The typical microfilm reader is a large unit with a projection system 
to display an enlarged image on an opaque surface positioned 15 from horizontal. 
For long sessions, it is by far more comfortable for the operator 
when the image is projected onto a surface 15 from horizontal, 
rather than onto a vertical ground-glass screen.

The typical microfilm reader/printer is a large unit with a housed projection system 
to display an enlarged image on a vertical ground-glass screen, a prism to rotate the image, 
and means for selecting the (always rectangular) area 
to be photoprinted. 
These characteristics are hold-overs from the days of analog reader/printers. 
Modern reader/printers include a digitizing system and a (digital) laser printer. 
Such reader/printers do offer the advantage of interchangeable zoom lenses.

This is to propose a system comprised of a computer and a small unit 
that consists of a film transport, light source, lens and digitizer, 
in effect, a film carrier, lamp and monochrome image digitizer. 
The unit, a computer peripheral, would be connected to the computer by electrical cables only. 
The system would include sophisticated software.

As noted, a modern digital microform reader/printer can be connected 
(by use of a 'connectivity package') to a computer; 
that leads to the user's having to view two screens and to use two sets of controls. 
It provides no software for locating a place on a film. 
Selected images can be transferred to the computer for subsequent processing, 
including OCR, but that does not facilitate making the needed immediate corrections 
to OCR'd data.

By eliminating the display system of the typical reader/printer, 
the cost could be substantially reduced. 
Thus, work with microforms could be much better and faster, as well as less costly.

The system would be highly convenient for all users.
SPECIFICATIONS

Hardware
 
1. All controls are by computer mouse and keyboard.

2. All displays are on the computer monitor.

3. Microfilm is hand-started on the take-up reel. 
(A certain Minolta film carrier would be a good start, in contrast to some other, 
less user-friendly 'automatic' carriers by Minolta and others.) 
The film-transport electrostatically discharges, and frees of dust, the film as it passes.

4. The carrier accepts at least microfilm, microcard and microfiche.

5. The monitor has a large, flat screen that is below eye-level and adjustably 
angled 45 to 70 degrees from vertical to accommodate the user and to avoid glare 
in the place where installed.

6. The film and transport system is installed in a dustless housing, 
not necessarily within reach of the operator. The transport is fairly small. 
Unlike with a contemporay reader/printer, there is no mirror, no projection system, 
no prism for image rotation, no ground glass or other surface onto which to 
project images and no hardware system for selecting the area to be printed.

7. The device is solidly constructed for constant use, 
possibly by untrained library patrons, with minimal down-time.

8. The system runs, via USB2 hub, from a standard Windows computer with a very fast processor.
Software
1. There are two main windows on the monitor, 
the image of the selected part of the microform image and the user's transcription thereof, 
plus a pull-down 'tools' menu for each window. 
The image window includes a small display of the project ID and form ID, 
location on the microform and user-typed ID of the date and page number. 
The tools window for each is invoked by right-clicking on the window.

Area-selection tools include one for irregular shapes, 
one for irregular shapes bounded by straight lines, 
one for rectangular areas and one for general rectilinear areas. 
The one for irregular shapes is as drawn by mouse. 
The ones for irregular shapes bounded by straight lines and 
for general rectilinear areas are formed by setting pairs of points and 
snapping a straight line between the two points in ech pair; 
lines so formed have drag handles. 
The one for rectangular areas is formed by setting a starting corner point and 
dragging to the diagonally opposite corner.

2. The image window tools include microfilm motion (slow), microfilm motion (medium), 
microfilm motion (fast), zoom in, zoom out, slow microfilm motion speed setting, 
medium microfilm motion speed setting II, carrier motion, transverse carrier motion, 
(all motions both forward and reverse), fine focus, scratch and dust removal, 
slow image rotation (one-degree steps), fast image rotation (ninety-degree steps), 
rectangular selection, arbitrary selection, 
arbitrary selection with straight lines only and right-angle corners, 
image enhancement, OCR selection, print selection, save selection as image 
(with drop-down dialog box for name of the image file), and return to prior view 
(from among a number of such specified by the user) 
(with drop-down screen of 'thumbnails' to select view and 'back' button 
to step thru prior views, and, when a view is selected, 
to return precisely to the chosen location, longitudinal and transverse location, 
and carrier position) and to the dataset for that location.

3. The transcription window tools include file (with drop-down box to choose, 
and name, new file or from among files already opened), print 
(with drop-down box to choose whole file or current segment), spell-check and save file.

4. Recording (to hard drive) of transcriptions and saved images, 
includes source microform, job and dataset, and, at user's option, 
page and date of material on microform.

5. One use of the system will be to record images only. 
This use requires easy, accurate image naming. 
As some images are not 'square with the film' fast, easy image rotation is needed.

6. Although the system is highly user-friendly, major mistakes, such as failing to identify the film or area, or to set the zero, or inadvertently to re-set the true zero, are disallowed.
Tour
For example, user is to work from a microfilm of a 1912 newspaper, 
to collect data into nine files: death notices, funeral notices, burial permits, 
marriage data, births recorded, divorce data, society items, 
news items about people and miscellaneous. 
During the multi-day process, the user decides to include a tenth file, 
reports on the sinking of the Titanic. 
The system provides the user easy means to save each of these categories 
of data is its own file. But there are microforms of many kinds of data, 
from both printed and hand-written originals.

User invokes software by an icon on the taskbar.

Opening window prompts for name of program and name of project.

To create a folder by that name, Name of project 
(to allow multiple microforms within the project)

Name of sub-folder for text transcriptions, 
with option to add additional folders from the outset (e.g., for burial permits)

(More sub-folders for text transcriptions may be added during the session.)

Name of sub-folder for images, with option to add additional folders from the outset 
(e.g., for Titanic photos)

(More sub-folders for images may be added during the session.)

Unique name of present microform (to tag data segments as to microform)

Whether to prompt for date for each selection, with default to last-entered date 
(drop-down window displays twenty recently entered dates)

Whether to prompt for page number for each selection

If this is the first session of this project with this microform, 
the form-motion window then opens. 
The user advances the form to a view that is unique to the microform and presses [Enter]. 
This point becomes the zero point for the form. 
The image is recorded, with cross-hairs superposed on the image. 
The user then invokes the form-motion window (by right-clicking on the image window) and 
advances the film to the first desired selection.

If this is not the first session of this project with this form, on command, 
the image of the zero point is displayed, 
and cross-hairs are displayed over the moving image to assist the user 
in advancing the film to the approximate zero point. 
The user then invokes the form-motion window and selects one of the prior locations; 
the system moves the form to that location and then presumably the user again invokes 
the form-motion window to seek the next point of interest.

Options include setting minimum (no need to shrink below full-page) 
and maximum zoom and setting the slow and middle film-transport speeds.

PROBABLE DETAILS

The entire width of a microfilm is within the field of view of the digitizer. 
That is, no 'scan' motion (transverse to the length of the film) is needed 
for most uses of microfilm.

The (macro?) lens is very, very good.

A clear glass pressure plate, normally open, is automatically brought 
to bear when the film is stopped.

The digitizer is highly precise* in order to retain detail upon digital zooming but, 
for speed. should not be color-capable. 
[An apparent alternative would be a) to use a complex set of zoom lenses on a turret; 
with digital lens selection by the mouse or keyboard and b) to 'scan' across the film.]

Film is forwarded and reversed by stepping (digital) motors, 
or a digitizing wheel is secured to each motor. 
A tensioning arrangement assures that the film is in fact positioned in agreement 
with the digital indication. 

The digitizer is very fast, so that microfilm can be observed as it is fairly rapidly advanced. 
[There may be an automatic switch to low resolution to speed digitizaion when the film is moving.]

The computer is very fast, to accommodate any needed image rotation as the film is advanced. 
It might be practical to include image rotation hardware, 
but that would necessitate yet another motor in the hardware, 
also to be controlled from the computer. 
[It might be appropriate to allow manual rotation of the digitizer 
as that would probably be necessary only once per roll.]

* Scanning for OCR is usually done at 150 dots per inch, 
and a typical page holds roughly 75 characters in six inches; 
this leads to about 12 dots per character. A 35 mm microfilm, 
such as the 1895 St. Louis Globe-Democrat, may have 512 characters across. 
For OCR, that would then require 12 x 512 = 6150 pixels across. 
A newspaper page is about 20 percent taller than wide, so 6150 x 6150 x 1.2 = 45 megapixels. 
It may be preferable to use optical magnification, controlled from the computer, 
to reduce the needed number of pixels; a 4X zoom would reduce the need to about 12 megapixels.
Again, digitization speed is presumably enhanced with fewer pixels.


E-mail Bob Doerr

Page created 26 Mar 2006