1. Field of the Invention This invention relates to a process of communicating to an observer information which can be conveyed in a motion sequence of frames. It includes a method for producing such motion sequences. Particular embodiments of this process are: (i) the process of communicating with deaf persons by means of finger spelling, (ii) the process of teaching such finger spelling to adult learners, i.e., linguistically mature students, and (iii) a system of animation for use in a mechanism requiring a restricted number of frames due to limitations in the storage capacity for images or limitations in the rapid access of said images. It includes an apparatus for practicing the process.
2. Art Background of the Invention
The method of delivery to which this process is addressed is especially well suited to subject matter in which key body positions and intermediate positions are to be learned. An example of this is teaching the manual alphabet used by the deaf. Finger spelling is the assemblying of words from standard positions of the fingers of one hand, usually the right hand, each position representing a respective letter of the alphabet. Finger spelling is a subset of "sign language", which consists of standard hand motions and related body motion and facial expression which, taken together, represent grammatical components of sentences. The United States of America has been a leader in "deaf communication" and the American Sign Language [ASL] is the most widely accepted form of sign language used by the deaf community. Another form of sign language in wide use is Signed English. Most of the signs in Signed English are the same as those in ASL, but their sequential order in sentences is the same as in conventional English. Grammatical units such as articles and endings for tenses, adjectives and adverbs are finger spelled.
Finger spelling is a supplement to ASL. Finger spelling provides a means for communicating words for which there exist no ASL signs. Because there are substantial differences between ASL and conventional English, finger spelling is also used to provide nuances of meaning. Finger spelling is also used to clarify regional differences in signs and to replace signs which are erroneous or have been forgotten. Signed English is the most widely used language in schools and by hearing people who communicate with the deaf.
The signs of the American Manual Alphabet, illustrated as seen by the observer, are shown in "The Pocket Dictionary Of Signing" by R. R. Butterworth and M. Flodin, Perigee Books, copyright 1987, pp. 11-12, and in "Talk To The Deaf" by 1. L. Riekehof, Gospel Publishing House, copyright 1963, p. 1; and in U. S. Pat. No. 3,858,333, issued Jan. 7, 1975 to W. Kopp, and in U. S. Pat. No. 4,414,537, issued Nov. 8, 1983 to G. J. Grimes.
Two inferences may be drawn from the above discussion: Like all alphabets, the manual alphabet will be learned by the deaf as children unless the onset of deafness occurs as an adult. Because finger spelling is used in a supplemental way, most finger spelled words are unfamiliar. Practice, therefore, should provide a means for dealing with unfamiliar words.
There is no standard or approved way to teach finger spelling. Two ad hoc stategies are sometimes used. The first is to become familiar with the configurations which small groups of letters form. Children can be taught to finger spell before they learn the letters of the written alphabet. They can learn to recognize the sequence of finger positions for C.sub.-- A.sub.-- T and learn to think of a familiar fuzzy animal that laps milk from a saucer. Because this capability is well known, there are strategies for teaching signing which rely on first learning basic configurations of letters and then varying them, e.g., C.sub.-- A.sub.-- T, B.sub.-- A.sub.-- T, F.sub.-- A.sub.-- T. It is apparent that for a person who is born deaf and learns finger spelling at a young age, this method is natural and probably effective. This method is analogous in many ways to the "look and say" method of teaching reading to children who can hear. Eventually some phonics must be learned in order to cope with unfamiliar words.
Persons with already developed linguistic skills can save time and energy in learning finger spelling by the use of a more structured approach which will allow the transfer of these hard earned skills. Such a learner learns the new alphabet and tries to utilize rules of thumb common to his or her first language, such as syllabication, frequencies of letter or word combinations, and grammatical rules. An example of this is the tendency of the hearing finger speller to break a word down into phonetic components, while the deaf finger speller tends to spell a word in its entirety.
Another ad hoc strategy is used in teaching the positions of the fingers of the hand. A resemblance of certain positions of the fingers to the printed form of the respective letters is emphasized. This method has several drawbacks. Fewer than half the finger positions have any resemblance to the respective printed form. Some finger positions look like upper case letters, some look like lower case letters. Most only look like a printed letter when viewed from one particular vantage point. As a learning strategy, the mnemonic value may be outweighed by the emphasis on learning a letter in isolation and the emphasis on its identifying name rather than its sound.
That none of these ad hoc methods really works is indicated by the widely held view, especially among instructors who are deaf, that facial expression and lip reading are a necessary part of finger spelling. The adult is taught to finger spell without obstructing the reader's view of the speller's face. When reading finger spelling, the learner is taught, especially by deaf instructors, not to concentrate on the spelling fingers, but, rather, to watch the facial expression and to read the lips of the speller. The complexity of such a task is overwhelming. The deaf have facial expressions for many words that the hearing are used to delivering "deadpan" such as: scared and plentiful, thick and thin. Lip reading is a skill whose complexity rivals finger spelling. While these kinds of contextual clues may add nuances of meaning for the advanced communicator, they add unnecessary complexity to the task of acquiring at least a minimum of facility in finger spelling.
Advances in technology and in our knowledge of how we mentally process visual information can be used to simplify the learning of finger spelling.
Videotape Systems:
Videotape is frequently used in the teaching of sign language. I know of no use of videotape which is devoted exclusively to finger spelling. Videotape can deliver realistic images in real time, but it is ineffective in teaching finger spelling for the following reasons: (i) The playing mechanism is slow and cumbersome. It is difficult and time consuming to find a particular part of the videotape to play or replay the particular words stored thereat. (ii) It uses predetermined word lists while finger spelling deals primarily with unfamiliar words. (iii) Elements cannot be regrouped. Letters must be viewed in the sequence in which they are stored on the videotape and cannot be used to form new words. (iv) Tho learner has no control over content (subject matter,) context (word order in a sentence,) speed (duration of display of each image,) order (learning style,) or other factors in his or her process of learning.
Computers:
Recently, a computer has been used to display the finger positions for a letter as a small line drawing similar to those diagrams in the front of sign books indicating the letters of the manual alphabet from the receptive view, i.e., as the viewer sees it. Each letter appears when the respective letter key on the keyboard is struck. The effect is that of an automated flip book. The letters of the manual alphabet are small. No three dimensional information is provided. No system for teaching the forming of the letters is provided. No system for visually distinguishing one letter from another is provided. No means is provided of anticipating which letter will next come in the series. No cognizance is taken of the fact that the salient features distinguishing one still image from another still image often substantially differ from the salient features which distinguish moving objects. For example, two airplanes sitting on the ground may be distinguished by their painted decorations, whereas two airplanes at high altitude first would be distinguished by their overall shape and then, if necessary, by a distinguishing visual detail.
The chief disadvantage of the computer per se is the current limitations of its graphic capabilities.
Computer-Peripheral Systems:
The process of this invention is applicable to instructional delivery mechanisms in which "complex images" are made instantly available by the so-called random access capability of computers, or any machine that will simulate that capability. Complex images are photo-like, with a wide color range and grey scale that convey the level of three dimensional information found in a sharp photograph. Videotape images are complex images. However, as discussed above, the videotape player moves too slowly to access images that are not adjacent.
The speed of access of images must permit the illusion that the images are successive with no blanking or flashing of color to interfere with persistence of vision. Although retrieved instantly, the images must be capable of being visible for varying durations of time.
The computer is theoretically capable of meeting all of these criteria. Complex images on the computer often are referred to as raster graphics or bit-map graphics to indicate that the information is not stored in an algorithm and redrawn but as an assemblage of bits which are brought from storage as a unit. Because of the enormous memory capacity required by such images they are usually stored in peripheral devices. Examples of such computer-peripheral systems are: the intelligent videodisc; Compact Disk Interactive [CDI]; and Digital Video Interactive [DVI]. Computer-peripheral systems are a preferred type of delivery system for this invention.
The preferred embodiment of this invention incorporates heuristics, both visual (innate and learned) and cognitive. Learning in visual groups, aided by kinesthetic memory and the knowledge that we know what others see when we move our own hands, are a combination of learned and "prewired" heuristics that will enable students to recognize so many letters that they will actually be reading words. This invention teaches a process for making an image which allows multiple uses of said image. Such images are useful in a self contained system for multiple learning strategies such as this invention contemplates. Such a system allows the user to structure the level of participation in available activities and the order in which to participate in them.
Since the information in a motion sequence (e.g., the bare showing of the expressive manual alphabet,) is not coextensive with the intellectual content of the material (e.g., strategies to make the letters and what their names are,) the invention anticipates the use of ancillary techniques to provide contextual clues such as: overlay of letters or words, sound track information, color or other symbol coding. Kinesthetics, i.e., the memory of muscle movements, also provides the viewer with information about what is being seen. Therefore, the invention also includes activities of the viewer which are read by the computer such as keyboard stroking, voice recognition, and sensing devices for specific actions. To be part of the process, the activity must be directed to enhancing the visual learning task, must refer to a specific set of stored images and a be accessed by a unified set of instructions. All of these capabilities can be accomplished with known computer-periperal systems such as intelligent videodiscs, CDI and DVI.
Conventional Animation Systems:
The characteristics of preferred delivery systems, e.g. computer-peripheral systems, best suited for the processes of this invention result in critical differences between said processes and standard animation practices. There are two main categories of differences, timing and the characteristics of the image.
Timing: Animation can be used as a substitute for real time sequences. Real time motion sequences accessed by a computer present the following problems: (i) Parts of the real time sequences may be blurred depending on the speed of the movement. (ii) If the spacing between accessed motion segments is too great, there may be a black flash or other visual blanking. (iii) The access time of the computer may be too slow for the smooth running of the program. (iv) There may be mismatches between the sequences which will produce visual disconuities. These problems are most severe where the individual images contain a great deal of information such as grey scale information in a photograph-like picture.
Conventional animation is virtually two dimensional and utilizes outlines to define areas which may be filled in with essentially flat colors. The outlines convey most of the information, e.g., the contrast between the figures and the background, the shape of the figures, and the movement of the figures. Conventional animation requires perceived smoothness of motion for its simulation of reality. That line is also the key element in producing the illusion of motion is made very clear in a review of conventional animation in "Disney Animation--The Illusion of Life" by Frank Thomas and Ollie Johnson, Abbeville Press, New York, 1981, at p.35: "One day, almost by accident, someone made a series of drawings that looked far better than anything that had been done before. Each drawing had so close a relationship to the other that `one line would follow through to the next`. . . - how amazed everyone was that just making the lines flow through each drawing in a series could make such a difference . . . - suddenly there was a pleasing smoothness that led the eye from drawing to drawing."
"Everyone knew that it was necessary to get a feeling of weight in the characters and their props if they were to be convincing . . . , . The animators sensed that the key to the illusion of weight lay in the timing and how far a character moved and how fluid the action was, but it was not until they were able to study live action films that the solution was finally found."
This last sentence is a reference to the use of frame by frame studies of live action simulations of sequences that were to be animated. These could not be traced.
[At page 323] "But whenever we stayed too close to the photostats or directly copied even a tiny piece of human action, the results looked very strange. The moves appeared real enough but the figures lost the illusion of life . . . , . It was not the photographed action of the actor's swelling cheek that mattered, it was the animated cheek in our drawings that had to communicate . . . , . Our job was to make the cartoon figure go through the same movements as the live actor, with the same timing and the same staging, but because animatable shapes called for a difference in proportions, the figure and its model could not do things in exactly the same way."
[At page 65] "There was some confusion among the animators when Walt first asked for more realism and then criticized the result because it was not exaggerated enough. In Walt's mind there was probably no difference."
Perhaps it is the reliance on line for so many functions in conventional animation that causes the same authors to end the discussion of the development of film animation with the following remark [at page 528]: "The field of educational films has an almost unlimited future with very little of its potential explored."
In conventional animation all frames are displayed in succession at a constant velocity, e.g., twenty-four frames per second. Key positions are exagerated so as to be perceived as such. The inbetween positions are not intended to be seen because that would interfere with the perceived smoothness of the motion. Therefore, the burden of information must be conveyed by lines in key frames.
In the present invention a bridging position can have two information conveying functions: It can contribute to the illusion of lifelike motion, and it can presage the information content of the next key frame. Unlike inbetween frames in conventional animation, it can be accessed to function as a bridging frame in more than one sequence, and it can be held for the duration of time for which it is needed to convey information.
Key frames do not have to be exagerated in order to be perceived as such, rather they can show of lifelike positioning of their elements. Differences in durations of time of display can be used to distinguish key frames from bridging frames.
In the case of finger spelling, length of time indicates importance. The letters, i.e., the principal material, should be displayed to the observer for a longer period, which is long enough for all of its information to be perceived and for the observer to realize that it is principal information. The intermediate material should be displayed to the observer for a shorter period, which is merely long enough for its information to be perceived, but short enough for the observer to realize that it is not principal information. The end letter of a word in a sentence can be indicated as such by holding it for an extra increment of time.
Complex Images: A major difference between the images of conventional animation and the images contemplated by this invention is that the latter are complex, i.e., photo-like, and the former are not. The discoveries of differences in how complex images create the illusion of motion as opposed to conventional animation were made on an ad hoc basis. The manipulation of complex images contemplated by this invention may contribute to the body of knowledge about surface information, which is an important concept in machine vision. Therefore, it may be worthwhile to point out the similiarities and differences with the theories of David Marr.
In "Vision" by David Marr, pub. W. H. Freeman, copyright 1982, Marr offers a controversial and incomplete conceptual framework for this invention. As both a neurobiologist and a computer scientist, Marr offers a useful vocabulary and conceptual hierarchy based on his unique vantage point. He postulates three stages of perception. His first or primal stage is based on well known facts that the eye-brain has specific cells for specific functions, i.e., motion detectors, edge and line detectors, orientation detectors (bars), and intensity detectors. Information grouping, whether it is by these detectors alone or in concert with other processors, is the essential first stage in perception.
Marr refers to the initial results of 2-D processing by the retina as the "primal sketch" in which he identifies landmarks such as edges, boundaries, and regions. Marr's concept involves a two stage primal sketch: "raw" and "full". On an ad hoc basis this invention telescopes Marr's more detailed theoretical concept into the basic characteristics already known as the "gestalt" principles of grouping incomplete visual data into conceptual units. On an ad hoc basis, the first filtering of vision is the determination of what is important, e.g., what will move, from that which is unimportant, e.g., what will stay the same. In discussions of human perception this is conventionally refered to as the distinction between "figure" and "ground".
The contribution for which Marr is most well known is the concept of a stage intermediate to the data collection stage and the perception of 3D. This is Marr's "21/2-D sketch," which can be loosely understood as "surface information" as used in describing the present invention. Marr particularly emphasizes surfaces that have definite positions and orientations in space.
Marr's background led him to the conclusion that the identification of surfaces occurs early in the retina. The neurons of the retina and the visual cortex employ what Marr calls "modules" which rely on clues such as texture, color, motion, shading, and stereo (an offset of patterns such as one sees by shutting one eye or the other eye.)
Marr offers a conceptual framework for an observable characteristic of what I call "complex images." Complex images can be observed to operate differently from the line drawings and two dimensional use of color that characterize conventional animation. Texture that moves across a stable background is perceived to be part of an object and helps to define that object as it moves in space. I believe that this optical illusion is stronger than the optical illusion that a moving line represents an outline of a shape moving in space.
The process of this invention utilizes the discovery that when a great deal of surface information is utilized, a greater leeway is available with respect to timing. As represented in the Thomas and Johnston book, mentioned previously, timing to achieve smooth motion was the key discovery in the "Illusion of Life."
Marr's third stage of visual processing is "3-D model representation". This stage of vision processing is the recognition stage, which Marr would admit is not well understood beyond the known facts that knowledge, experience and context (what I call cognitive heuristics ) play important roles. This is the least well developed part of Marr's theory and Marr's background accounts for his tendency to discuss "prewired" heuristics (such as the example discussed above of moving surface texture) in greater detail. The distinction between the two kinds of heuristics is deliberately ignored by me because I probably think much more of vision is learned behavior than Marr would concede. Also the speed at which learned heuristics operate blurs the observer's cognizance of separate stages of vision. Heuristics do play a very important part in the process of learning to finger spell. The ad hoc rules that the learner discovers from the bridging frames which lead him or her to anticipate the next letter operate the same way a prewired gestalt rule would.