The following description includes information that may be useful in understanding the present invention. Unless the context indicates otherwise, the text should not be interpreted as an admission that any of the information provided herein is prior art to the claimed inventions, or that any publication specifically or implicitly referenced is prior art. Moreover, it should be appreciated that portions of this background section describe aspects of the inventive subject matter.
Humans are social creatures who tend to seek out community with others. There are, of course, numerous real-life communities that rely on face-to-face encounters, but often disparate interests, geographic separation, age, gender and other differences can make real-life communities difficult to access or maintain. Facebook™ and other social media have been successful in filling in some of the void, but many individuals still find that currently available, electronically accessed communities are poor substitutes for real-life companions.
What is needed is an electronic personal companion (usually referred to hereinafter simply as a “personal companion”) that can act as a friend, providing information, warnings and other guidance, conversation, solace, and so forth, and can also act as an interface with other people or things. That goal has been depicted in science fiction, but has never been realistically enabled.
Crowd-Sourcing and Crowd-Sharing
The current inventors have concluded that personal companions can most effectively be implemented where they utilize one or both of crowd-sourcing and crowd-sharing. See Table 1 and corresponding description below.
TABLE 1Many DistantlyMany DistantlySeparated ReceiversSeparated ReceiversActively Obtain & UsePassively Obtain & UseInformation And/OrInformation And/OrOther ItemsOther ItemsMany DistantlyCrowd-SourcingCrowd-SharingSeparated Providers(e.g. Digg ™,Actively ProvideFacebook ™,Information And/OrPinterest ™, Twitter ™)Other ItemsMany DistantlyCrowd-SourcingCrowd-SharingSeparated Providers(e.g. medical telemetryPassively Providedevices)Information And/OrOther Items
As used herein, “crowd-sourcing” refers to situations where many (defined herein to mean at least a hundred) distantly separated individuals collectively provide information, services, ideas, money, transaction opportunities and/or other items in a manner that enables others to make use of those items in some meaningful manner.
As used herein the term “distantly separated” refers to individuals that are out of unaided earshot and eyesight. Depending on the number of distantly separated individuals participating as providers of information or other items, a system, method, topic, circumstance, etc could be narrowly crowd-sourced (100-999 providers), moderately crowd-sourced (1000-999,999 providers), or massively crowd-sourced (≧1,000,000 providers).
The recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range. Unless otherwise indicated herein, each individual value within a range is incorporated into the specification as if it were individually recited herein.
Also, unless the context indicates otherwise, the terms “individuals” “users”, “persons” should all be interpreted interchangeably to include live human beings, thinking machines, and virtual and mixed-reality beings.
Crowd-sourcing” can be active or passive on the part of the individuals providing the information or other items. Examples of active crowd-sourcing include Digg™, Facebook™, YouTube™, Pinterest™, Twitter™ and many others, each of which allows millions of users to provide commentary, still and video images, or other types of information to other users. Wikipedia™ focuses on encyclopedic and other reference materials along the same lines, and Kickstarter™ does something similar for funding of projects. In all those cases, an active, conscious effort is required for an individual to provide the information, pledge money, or provide other items. Even if the effort is something as simple as clicking “like” or “dislike” icons, the provider must speak, type, or take some other affirmative action with respect to the particular item being provided.
Examples of passive crowd-sourcing include telemetry devices that upload biometric data (blood pressure, heart rate, blood gases, and so forth) to health care providers. For example, a heart monitor might periodically send pulse and blood pressure data to a hospital, with no specific effort being required at all by the wearer. Indeed, such devices would typically operate even if the wearer is asleep or unconscious. If the hospital collects such data from many individuals without those individuals taking active steps with respect to the particular data items (i.e., efforts besides just setting up the devices to capture the information), the data can be interpreted herein as being passively crowd-sourced. Outside the medical field, automatic uploading of image data from many Google Glass™ wearers could also be considered passive crowd-sourcing.
Other examples can be found in the field of on-line or directed surveys, where it is known for companies to use cell phones or other person-carried devices to solicit survey information from hundreds or even thousands of people in a crowd-sourced manner.
What is not done in the prior art of crowd-sourcing, however, is for many person-carried devices to solicit information from their corresponding users in an auditory, conversational manner, and then make that information widely available to others in a manner that weighs the solicited information for the receivers. What is also not done in prior art crowd-sourcing is for the devices that solicit information to be closely associated with their users, such that the devices are not fungible between users. Devices could use different questions or strategies to solicit information from the users, even given the very same inputs, based upon the devices' prior experiences with their users. Systems, methods and devices that do those things are contemplated herein.
Regardless of whether individuals are active or passive in providing the information or other items, the term “crowd-sourcing” is used herein where the recipients must take specific steps to receive and use individual items in a meaningfully way. For example, to effectively utilize a tag cloud found on the Digg™ website, a person must go to that website and click on one of the tags. Similarly, to effectively utilize information in a Tweet™, a recipient must access and read the Tweet™. The same is true for doctors, nurses, researchers or others to effectively utilize medical telemetry data. In each of those instances the recipient must take an active step to obtain and use individual items of information.
In contrast to “crowd-sourcing”, the term “crowd-sharing” is used herein to mean situations where many recipients receive and make use of specific information or other items provided by many providers in a passive manner.
Under that definition, Google Glass™ does not currently crowd-share. Google Glass™ can passively record raw data, and can upload it to a cloud-based server farm. But to the knowledge of the current inventors, the information is not then provided to many others in a manner that they can receive and use the information without taking some affirmative action to do so.
Similarly, Google™, Yahoo™, Bing™, Siri™, Nina™ and other search engine/question-answer systems do not currently crowd-share. They crowd-source information by crawling the Internet and other sources, and they crowd-source additional information by observing how users respond to the search results. But the users are not passively provided with the underlying crowd-sourced data; they must take some affirmative action (e.g., running a search) to use the information in any meaningful way.
As another example, many people listening to a concert or other gathering over the Internet would inevitably overhear a cacophony of hundreds or thousands of voices. That situation falls outside the scope of crowd-sharing (as the term is used herein) because any information overheard cannot be utilized by the attendees in any meaningful way without taking the active step of understanding what was said, or in some other way interpreting the noise. Even appreciating that the crowd was large and the noise immense requires an active step of interpretation.
As yet another example, advertising often utilizes subliminal messaging. An advertiser might show an attractive person drinking a particular product, and thereby seek to instill in viewers the idea that drinkers of their product are attractive. Similarly, a television advertiser might flash a word, image or other message upon the screen so quickly that viewers are not consciously aware of the message. As currently practiced those instances also fall outside the scope of crowd-sharing because the subliminal messaging is basically one-to-many (advertiser to recipients), not many-to-many.
What would be crowd-sharing is where many people carry personal companions that observe and interpret the world around them, using information based at least in part upon interpretations shared by many other personal companions. In such cases, the receivers are the individuals carrying the personal companions, and it is their corresponding personal companions that receive and make use of the shared information on their behalf. In preferred embodiments, personal companions can go further, utilizing the shared information to operate devices, send communications, or do or refrain from doing other things on behalf of their users. Systems, methods and devices that do all of that are contemplated herein, but to the knowledge of the current inventors, are missing from the prior art.
As used herein, the term “crowd-facilitated” includes both crowd-sharing and crowd-sourcing.
A. Crowd-Sourcing of Image and Sound Data
To be effective for the vast majority of people, information crowd-sourced and/or crowd-shared by a personal companion must at the very least utilize ambient image and/or sound data. Following is a brief summary of how ambient image and sound data has been handled in the prior art.
Focusing first on images, prior art FIG. 1 depicts a high-level conceptual overview of how images have historically been recorded, used and stored. In each of the steps depicted, there is a left side that shows physical processing and application of images, and a right side that shows electronic processing and application of images.
In Capture step 10, a camera is used to capture an image. The camera can be pointed specifically at a target, or used in surveillance mode to capture whatever happens to be in view, or both. For most of the last 100 years cameras have used physical film 12A (left side of step 12), but of course more recently cameras have captured images electronically 12B (right side of step 12).
In Processing step 20, images are processed into physical or electronic recordings. For film, images are usually processed chemically 22A to produce negative or positive photographs, slides, movies and so forth. For digital cameras, images are usually processed electronically 22B into TIFF, JPEG, PDF, MP4 or other digital formats.
In Identification step 30, objects (people, animals, buildings, cars, symbols and so forth) captured in the images can either be identified physically through visual examination by a person 32A, or electronically using software programs operating on mainframes, personal computers, cloud services, etc 32B. Security personnel and others, for example, have long used computer-facilitated facial recognition to identify people in captured images, and the references used to make the identifications were often derived from many sources. See, e.g., U.S. Pat. No. 5,982,912 to Fukui et al. (November 1999).
In Additional Information step 40, the identities of objects derived in step 30 are used to obtain information external to the image. To continue with the security example, a person might peruse a physical folder or other physical resource 42A to look up more information about a person recognized in an image, or a computer could electronically discover 42B additional information about the person by accessing a database. See e.g., U.S. Pat. No. 5,771,307 to Lu (June 1998).
As used herein, the term “database” means any organized collection of data in digital form. A database includes both data and supporting data structures. The term “database system” is used herein to mean a combination of (a) one or more database(s) and (b) one or more database management systems (DBMS) used to access the database(s).
In Transaction step 50, the additional information is used to make a purchase or conduct some other transaction. In a physical mode 52A, for example, a security service might use an identification of a person to physically withdraw cash from a person's bank account. In an electronic mode 52B, a computer might recognize a signature to authorize an electronic transfer of the funds, see e.g., U.S. Pat. No. 5,897,625 to Gustin (April 1999), or control access to a facility, see e.g., EP614559 to Davies (January 1999). Here again, the reference data used in executing transactions was likely derived from many different sources, and therefore crowd-sourced.
In Storage step 60, the captured images, as well as information and transaction histories, are stored physically or electronically. On the physical side 62A, photographs or other physical recordings can be stored in albums, slide decks, movie canisters and so forth. On the electronic side 62B, electronic images, receipts and so forth can be stored on electronic, optical or other memories.
Astute readers will appreciate that the two sides of each of the steps 10, 20, 30, 40, 50, and 60 are readily interconvertible. For example, the box and lens of a film camera can be converted for use as a digital camera for capturing an image. Similarly, in step 20 a physical photograph 22A can be readily converted into a TIFF or electronic image 22B, and visa-versa. A physical identification 32A can be used to make electronic identifications 32B, as well as the other way around. In step 40, a printed encyclopedia or other physical resources for obtaining additional information 42A can be readily stored in electronic form 42B, and an electronic resource such as Wikipedia™ 42B can easily be printed as a hard copy 42A. In step 50, an in-person commercial transaction can be conducted entirely with exchange of physical dollars, or can just as easily be transformed into an electronic transaction using a credit card. The same is true in step 60, where a physical photo album 62A is readily interconvertible with an electronic photo album 62B.
One result of interconvertibility of the physical left-side steps with the electronic right-side steps is that the choices at each step are independent of the choices at all the other steps. For example, one can capture an image with a film camera (physical), process the image to a photograph (physical), scan the photograph to a digital image (electronic), and use a server to ID an object in the image (electronic). Then, regardless of how the image was captured or objects in an image were identified, one can discover additional information physically or electronically, conduct transactions physically or electronically, and store images, transactions and so forth physically or electronically. Accordingly, many aspects of processing image data could be considered crowd-sourced.
An analogous situation exists for sound. For example, sound recordings were originally made without any electronics, using a pen and a rotating drum. And since the advent of electronics, people have used microphones to process sounds into digital sound recordings. But physical sound recordings can be used to make electronic recordings, and visa-versa. Similarly, sounds recorded physical or electronically can be used in person to conduct a physical transaction (e.g., in person in a physical store), or an electronic transaction (e.g., over a telephone). Accordingly, many aspects of processing sound data could be considered crowd-sourced.
Crowd-Sourcing of Parameters for Characterizing Image and Sound Data
What hasn't been done in processing of image and sound data is crowd-sourcing the parameters used to characterize ambient data. For example, whether it is Interpol using electronic facial recognition to identify potential criminals, Google™ Goggles™ automatically identifying objects within images captured by a cell phone, or Aurasma™, Google Glass™ or ID™ connecting a magazine reader with a local retail store, the parameters used to resolve the identifications have always been determined by whomever/whatever controls the databases in a top-down, rather than bottoms-up, crowd-sourced manner.
Even one of my own earlier inventions, which claims speaking into a cell phone to retrieve a web page or other address, and then using the phone to contact that address, assumed that the characterization parameters were all determined in a top down manner. US2012/0310623 (Fish, Publ September 2012).
But top-down implementations for handling images, sounds, and other types of information are inherently more restrictive, and less dynamic than bottom-up systems, and are decidedly suboptimal for use as personal companions. What are needed for viable personal companions are systems that not only crowd-source ambient data, but that crowd-source the parameters used in interpreting the data.
Crowd-Sourcing Parameters for Goods and Services
There has already been considerable work in crowd-sourcing parameters for use in buying and selling goods and services. Several of my earlier inventions, for example, involved databases that use parameter-value pairs to describe goods and services for sale or purchase. See U.S. Pat. No. 6,035,294 (Fish, March 2000), U.S. Pat. No. 6,195,652 (Fish, February 2001) and U.S. Pat. No. 6,243,699 (Fish, June 2001). Databases according to those 2000-era patents, which have been referred to over the years as BigFatFish™ or BFF™ patents, are self-evolving in that they allow ordinary users (i.e., non-programmers) to describe products and services using whatever parameters (i.e., features or characteristics) they like. Thus, instead of a programmer or business deciding what parameters can used to describe the goods and services (top-down model), ordinary users collectively determine what parameters are available (bottom-up model).
References to “I”, “me” or “my” in this application refer to the first named inventor herein, Robert Fish. The '294, '652 and '699 patents, as well as all other extrinsic materials discussed herein, are incorporated by reference in their entirety. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
One key benefit of bottom-up systems is that they are inherently self-evolving. For example, instead of being limited to describing an automobile by the usual parameters of make, model, year, mileage, condition, and price, ordinary users can add additional parameters such as color, leather interior, tow package, and so forth. Those new parameters, along with older parameters, are then displayed as choices to subsequent users. As another example, Monster.com and other employment web sites currently limit parameterized data to the fields of job title, educational requirement, years of experience, and so forth. Users cannot add new parameters that might be relevant for them, such as preference for large or small firms, preference for night or day shifts, need for flex time, and so forth. Self-evolving databases, however, can do that, and in that manner can evolve to accommodate the needs of whoever is using the system.
To avoid anarchy in user's descriptions of goods and services, the 2000-era patents contemplated that parameter choices would be displayed to subsequent users along with some sort of indication as to their relative prior frequencies of use. Since users will naturally lean towards using the more popular parameters, the popular parameters will tend to become even more popular, and the less popular parameters will tend to become obsolete, eventually being eliminated from the system. In that manner self-evolving databases according to those 2000-era patents should be able to automatically balance the benefits and drawbacks of creativity with those of conformity.
The 2000-era patents also applied the same self-evolving concepts to the values of parameter-value pairs. For the parameter of “color”, for example, most users would likely describe red cars using the value of “red”. Others, however, might prefer to describe a red car using the value of “rose”, and my self-evolving systems would accommodate that inconsistency by allowing users to use either term. Subsequent users would then be shown both terms as choices, relying on font size, list ranking or some other method to designate the value “red” as having been used more frequently than value “rose”. In that manner the model once again balances creativity with conformity, allowing users to employ whatever values they think are best, but encouraging them to use values that others collectively think are best.
The 2000-era patents contemplated that addition of new parameter and value terms to the system would be automatic—i.e., without interference, checking or other vetting by a human or automated arbiter. If someone wanted to add the parameter “banana” to describe an automobile or a chair, he would be allowed to do that (the pronouns “he” and “his” should be interpreted herein to include both male and female). The idea was that vetting of new parameters (via human or machine) was unnecessary because nonsensical parameters would typically begin at the conceptual bottom of any listing, and would be eliminated from the system relatively quickly through non-use. Vetting may turn out to be unnecessary in practice, but just in case I disclosed the concept of vetting terms in self-evolving databases a few years later in one of my subsequent applications, US2007/0088625 (Fish, Publ April 2007).
Several months after the priority date of the '625 application, Google™ began publicly experimenting with GoogleBase™, a vetted self-evolving database for the sale of goods and services. Although GoogleBase became quite successful, with millions of records being added in only a few weeks, Google™ quickly pulled the plug. Although it was never admitted to be the case, the problem was apparently that widespread adoption of that technology would likely have significantly reduced Google's AdSense™ revenue. After all, a self-evolving marketplace would very efficiently allow users to find exactly what they were looking for, rather than forcing them to scour the Internet in a manner that triggers hefty advertising revenue. In addition, GoogleBase was less effective than it could have been, because it only contemplated half the concept—crowd-sourcing of the parameters. Google™ apparently never appreciated the benefit of crowd-sourcing values.
Others, however, have appreciated the benefits of crowd-sourcing descriptive values. Digg™, for example, allows users to associate descriptive keywords with web-pages of interest. In a Digg-type system, a user might characterize a web-page as being “awesome” or “insightful”, and if other users characterize the same web-page using the same values, those terms would be listed in a large font in a tag cloud, or designated in some other manner as being popular. In my terminology, such descriptors are merely values of the parameter “rating.” It is fascinating that Google™ never appreciated the benefits of crowd-sourcing values, and Digg-type systems never appreciated the benefits of crowd-sourcing parameters.
Even the user-contributed encyclopedias such as Wikipedia basically do the same thing as Digg; they characterize web pages by crowed-sourced keywords (i.e. values) rather than a combination of both crowd-sourced parameters and crowd-sourced keywords. Modern search engines are also similar, in that they index substantially all words in web pages and other documents, and thus effectively utilize almost all words as keywords. But there again the systems fail to crowd-source parameters.
The strategy of crowd-sourcing values, but not parameters, typically results in over-inclusive results sets. For example, someone searching for the keywords “red’ and “car” might well find a web page or article on a red-faced driver of a white car. Wikipedia-type systems address the problem with disambiguation interfaces, and Google™ and other search engines address that problem with sophisticated ranking algorithms. But both of those solutions are themselves top-down approaches that are relatively static and inflexible. Such strategies are inherently resistant to evolution, and are therefore not conducive to developing effective personal companions.
Even Watson™, the very sophisticated set of computer programs developed at IBM™ to process and store vast amounts of information, characterizes information according to pre-established “buckets”. For example, Watson has a “names” database into which names of people can be extracted from newspapers, journals, phone books and other sources, and a “places” database that stores information about physical locations (New Jersey, Princeton, Egypt, etc). The “names” and other specific database buckets are in effect parameters, which are hard-coded into the system by the developers rather than relying on crowd-sourcing.
To be fair, it was not clear for many years that a bottom-up, self-evolving approach to storage and retrieval of data was even practical. The many different types of products, and the potentially millions of different types of information, could easily utilize many thousands of different parameters. Without some new invention, one would consequently need either thousands of different tables to accommodate the different goods and services and types of information, or an extremely large table having thousands of columns. Back in the 1999-2000 timeframe, I hired some of the brightest minds in database design to solve the problem of inefficient storage and retrieval attending self-evolving databases. Despite hundreds of thousands of dollars of effort, however, they were completely unsuccessful in finding a viable solution.
I finally did develop a viable solution, and filed an application on that solution in 2006, see US2007088723 (Fish, Publ April 2007). The answer was to store records in a relatively narrow table (perhaps only 50 columns) in which the different columns had different meanings for different types of products, services, etc. A second table is then used to provide keys to the meaning of the different columns for each different type of products and services. Thus, for descriptions of automobiles a cell at column 6 might store price, but for descriptions of physicians a cell at column 6 might store the number of years in practice, or the name of a practice specialty. According to a calculation in the '723 application, such a database would be extremely efficient, having the ability to store descriptions of 500 million unique items using only about 12.5 gigabytes of storage. Of particular significance in the current application is that retrieval from such a database would also be extremely fast, and highly amenable to associative searching.
Crowd-Sourcing of Documents and Stored Information in General
As noted above, my 2000-era patents were directed to buying and selling goods and services. They failed to apply the concept of crowd-sourcing characterizations to information in general, using crowd-sourced parameters. Two of my later applications, however, addressed that deficiency. U.S. Pat. No. 7,594,172 (Fish, September 2009) disclosed aspects of using self-evolving databases for web pages and other documents, and U.S. Pat. No. 7,693,898 (Fish, April 2010) disclosed aspects of using self-evolving databases for storing information in general.
Crowd-Sourcing of Symbols
In my 2000-era patents I also never disclosed the idea of crowd-sourcing characterizations of symbols, let alone the parameters used in making such characterizations. As used herein, the term “symbols” should be interpreted broadly to include bar codes, QR (Quick Response) codes, facial and other images, photographs, videos, sounds, smells, figurines and other 3D or 2D objects, rhythms, sequences, and indeed anything used as a key to identify something else.
In the prior art, electronic resolution of symbols has apparently always been done according to top-down characterizations. For example, simple bar codes merely correlate objects with their UPC (Uniform Product Identifier) classification numbers. QR codes can be resolved to more complex information, such as a web site, but they are still resolved in a top-down manner because correlation of the code with the link or other information is pre-established by whomever or whatever is in control of the links. Even facial recognition systems link facial features to individuals using top-down databases. It is unknown to the current inventors whether there are existing infrastructures that link a given symbol to multiple, user-selectable targets. But even if such things did exist in the prior art, they would likely still have been top-down because whoever or whatever set up the correlations would want to determine what links are associated with the various target choices.
Of course the top-down resolution of symbols makes very good sense from the perspective of ordinary economic activity. For example, any company paying to link its products and services with a symbol, either in advertising or elsewhere, naturally wants to link its own products and services, website, phone number and so forth codes with that symbol, not those of its competitors. And if consumers were allowed to crowd-source links for products and services however they saw fit, a given bar code or other symbol might well become linked to images, characterizations and other information that are wholly unfavorable to the company owning the symbol. Those of ordinary skill would expect that any service that allowed the public to do that would quickly lose advertising revenue from of the owners of those symbols.
It is contemplated herein, however, that symbols can properly mean different things to different people, and that personal companions should be able to associate symbols according to the needs, wants and perspectives of both their respective users, and of others. Thus, it is contemplated herein that personal companions should crowd-source both the meanings of symbols and their associated links and other information, and that such characterizations could optionally be done using crowd-sourced parameters.
Persistence in Crowd-Sourcing/Crowd-Sharing Characterizations of Ambient Data
Crowd-sourcing of ambient data is contemplated herein to be accomplished most effectively in a persistent manner. Persistent crowd-sourcing of ambient data does not appear to have been done where the parameters are also crowd-sourced, and does not appear to have been done at all where the ambient data is crowd-shared.
As used herein, “ambient data” refers to data derived from the environment within or about a person. The “environment” is categorized herein as a tautology of objects, actions, events, and thoughts. In this context a person, or perhaps his avatar in a game world, might obtains ambient data from which an apple could be characterized as being substantially round, about first sized, has a smooth red or green surface, and optionally has a stem sticking out the top. He or his avatar might also obtain ambient data derived from a person running (action), or a car accident (an event), or a generalization or other idea, a property right, or perhaps an emotion (a thought).
Also as used herein, the term “persistent” refers to something that occurs or is sampled at a rate of at least once every 30 seconds over a five minute period, or at least a cumulative 50% of the time over a five minute period. In contrast, the term “continuous” as used herein means that something occurs or is sampled at a rate of at least every 5 seconds during a five minute period, or at least a cumulative 90% of the time over a five minute period.
It is known in limited circumstances of the prior art to use a cell phone or other camera-containing device to record an image in a user's environment, send that information to a service for identification, and then act upon the identification. It is known, for example, for a user to point his cell phone at the front of a restaurant, and have a service return a menu from that restaurant. Similarly, it is known for a user to point his cell phone at an advertisement in a newspaper, and have a service return price and availability information of a diamond ring featured in the advertisement. But those uses are all one-off searches; they do not embody persistent collection of information.
Many medical telemetry systems do collect ambient biometric data on a persistent or even a continuous basis. But those systems do not involve crowd-sourcing of parameters or crowd-sharing of data or characterizations of the data. They merely collect separate data from different patients, and make that data available to a very limited number of doctors, nurses, and interested others, and only upon those individuals taking active steps to acquire and/or use the data.
Similarly, Google Glass™, GoPro™, Countour™ or other Point of View (POV) camera can collect image and sound data on a persistent or continuous basis, but the data is not interpreted using crowd-sourced parameters, and is not distributed by crowd-sharing.
Allowing users to crowd-source parameters can be especially important with respect to ambient data because the context can be very important. One of my earlier applications did disclose strategies to automatically provide guidance as to contexts in which terms are used. In US2007/0219983 (Fish, Publ September 2007), I described an improvement in which a search engine would pull up a set of records that include a search term, and then look at what terms are used in windows of perhaps 25 words on either side of the search term. Those terms would then be presented with an indication of relative frequency so that the searcher would know what additional terms are typically located nearby the searched-for term. A reverse dictionary such as that available at www.onelook.com provides another means of crowd-sourcing context correlations.
Interestingly, however, all of my earlier crowd-sourcing patent applications, as well as my earlier idea of summarizing windows surrounding search terms, and the currently available reverse dictionaries, only provide correlations as one-off searches. They do not address how context information could be used in storing persistently recorded ambient data.
One huge hurdle is that the wide range of viewing perspectives and contexts accompanying persistently recorded ambient data can make even mere identification of objects, actions, events and thoughts extremely difficult. Those having the resources to put together sophisticated identification infrastructures to resolve those difficulties would almost certainly do so to make money from advertising, or other associations with vendors. Accordingly, they would not want users to characterize ambient data in a crowd-sourced manner because that would undercut the ability of the infrastructure provider to extract monies from the vendors.
Even an ordinary consumer would likely not appreciate the value of crowd-sourced identifications of such things. He would want his cell phone, camera or other device to identify an apple as an apple, not a pear. Similarly, he would want his cell phone or other device to identify a car accident as “a car accident”, not a “meeting of the drivers”.
Still another difficulty is that persistently recorded ambient data often involves a combination of different modalities (image, speech, music, animal and other sounds, smells and even vibrations). Those other modalities may well provide needed context, yet involve information that is not readily understood or properly appreciated, such as person talking in another language, a facial expression or tone of voice, or perhaps background sounds of birds. Thus, there is a need to integrate a translation system into the crowd-sourcing of persistently recorded ambient data. This could be done by adapting teachings in another one of my other patents, U.S. Pat. No. 8,165,867 (Fish, April 2012). In that system, electronics is employed to obtain information in one modality, send signals derived from that information to a distant server for analysis, and then receive back information in the same or a different modality.
Canopy
One can think of a mature forest as including trees growing so close together that the branches and leaves of one tree interact with the branches and leaves of its neighbors. In some cases the branches can even form a canopy, which then develops a life of its own.
If one analogizes a tree to a person, the roots could be viewed as corresponding to the person's history of experiences, and the branches and the leaves as corresponding to characterizations of those experiences. A tribe or other group, or even a civilization could then be viewed as an interpersonal canopy that combines memories, life experiences, observations, and so forth in a manner that has a life of its own. What is needed is an electronic type of canopy, one that would combine automatically abstracted memories and so forth in a manner that survives even when some of the source individuals die, or are otherwise removed from the collective.
Of course, in the realm of personal companions, the different people providing the information need not be close geographically or even in time, as would trees in a canopy, but could be widely distributed over space and time. Moreover, neither the tree nor the canopy concepts discussed herein should be interpreted as requiring that raw data needs to be saved. Indeed, just the opposite is more socially acceptable, as is apparent from the bad press accorded Google Glass™ with respect to privacy issues surrounding recordation of raw data.
Personal companions preferably store characterizations of the ambient data, such as topics discussed, or observations as to whether the other people seemed happy. Yes, they might also store some raw data, either for later local processing or for transmission to a distal processor. But all of the raw data need not, and preferably should not be maintained. It might instead be useful to store only a relatively small number of frames (or segment) from a video for the short term, and then even fewer frames (or a shorter segment), or perhaps line drawings derived from the video, for the long term. Here it is important to appreciate that there can be a need to store both characterizations that are in the same modality as the raw data (e.g., frames from a video), and those in different modalities as the raw data (e.g., the word “happy” from a video image). Moreover, preferred personal companion should be capable of automatically individualizing these characterizations in different ways for different users, or even in different ways for the same user depending upon circumstances.
Thus, in a manner akin to human short term, medium term, and long term memories, there is a need for personal companions to abstract raw data, and then delete some or all of the raw data. For example, a person need not remember all the details of buying groceries at the supermarket. That would only clog up his mind. Instead, he would tend to remember what was purchased only for a medium term. In the long term he might only remember having visited that particular grocery store on various occasions, and eventually he might forget about the grocery store altogether. Personal companions should be able to do something analogous, locally and/or in the canopy.
Not only is it useful to delete raw data over time, in favor of characterizations, but there is also a need for personal companions to hide particular information from others, and possibly at times even from the person who generated, recorded, or abstracted it. For example, a person might well want to forget details, or even the existence, of a given event because the event was embarrassing or painful. Personal companions should be able to do something analogous, locally and/or in the canopy.
The opposite is also true, that there is a need for systems, methods, and apparatus that specifically make some information available to one or more others. For example, it is known for a baby monitor to provide a parent with a live video or audio feed from a nursery. But it would also be useful if the parent could automatically receive characterizations from a system carried by the baby, or perhaps the baby's caretaker, with updates such as “Joey just ate” or “Joey went to sleep”.
It is interesting that the LiveScribe™ pen is useful primarily because it correlates characterizations with raw data. In that system a special pen has a camera that records movements of the tip on a specially printed piece of paper, and a microphone that records conversations and other sounds. When a user takes notes of a conversation or lecture, the system correlates the notes with whatever words were being spoken, and whatever sounds were received by the device when the notes were taken. The notes are characterizations, but they are not stored as parameter-value pairs. Some contemplated embodiments of personal companions should be able to do what LiveScribe™ does, but should also be able to abstract automatically, using crowd-sourced parameters, and then crowd-share the data and characterizations as appropriate.
Handling Variant, Inconsistent, and Incorrect Characterizations
Most computer programming is designed for consistency in results. For example, in known object recognition software, different instances of the same software can be expected to provide the same characterization. If an instance of Midomi™, Musipedia™, Tunedia™ or Shazam™ accessed by one cell phone identifies a song on the radio from a 10 second sound segment, another instance of the same service operating on another person's cell phone at the same time would be expected to make the very same identification. Similarly, if one instance of an airport security X-ray system identifies an object as a knife, every other instance of that same system can be expected to yield the same result when presented with exactly the same image. In other words, the prior art teaches unambiguous links of images to targets; one to one or many to one correlations.
That isn't necessarily helpful. In the airport example, a terrorist need only learn how to get past one instance of a currently deployed screening system to be relatively assured of circumventing all similar screening systems. If different instances of the same system produced variant results under identical circumstances, a person trying to sneak contraband onto a plane could never be sure of passing a particular checkpoint.
Humans do not necessarily want their characterizations to be consistent with everyone else's. And since humans may have very different desires and perspectives, there is a need to build potential variances into personal companions with respect to the way different objects, actions, events and thoughts are treated. For example, in a grocery store, it might be useful if different employees could train their personal companions to have different levels of specificity in describing fruit. One person might want his personal companion to characterize a given apple as an apple, whereas another person would want his personal companion to characterize the banana as a Macintosh apple, or as being overly ripe. Similarly, a person living in a snowy climate might want to train his/her personal companion to be more specific in describing different type of snow than someone in a warm climate.
In addition to supporting variant or inconsistent information, which is thought herein to be desirable in at least some circumstances, there is also a need to deal with intentionally wrong information in a crowd-sourced environment. For example, it is known for companies to game a bulletin ranking service by uploading positive comments about their own products, and negative comments about their competitors' products. Facebook™, Wikipedia™ and others have tried to address that problem by focusing on comments from “friends”, or weighing comments according to the number or accuracy of posts a person has historically provided to the system. But all of those systems can still be spoofed by individuals using multiple user names.
What are needed are systems, methods, and apparatus in which crowd-sourced ambient data is registered or otherwise linked to physical devices that provide the data. For example, if a physical device is regularly worn by a person during significant portions of that person's daily routine, it should be possible to extract from a mirror or other reflected image who the person is. Similarly, the combination should be able to distinguish bona-fide commentators from corporate shills by the frequency and content of their characterizations.
Conversations, Guidance, Warnings
There have been numerous efforts over the years to have computers identify objects, or characteristics of objects (ripeness of fruit, etc), by their visual appearances. As discussed above, prior efforts are almost entirely top-down. In July 2012 Google™ announced their cat identifier, which apparently used a neural net of 16,000 processors and ten million images to effectively crowd-source what a cat looks like. Although an impressive feat at the time, a practical personal companion needs to quickly and inexpensively obtain characterizations from interactions with humans, either directly from its user, or through interactions with other personal companions. Viable ways of doing that are described herein.
In particular, personal companions should be able to obtain characterizations of the world from conversations with humans, preferably by asking questions using audible speech. For example, when the camera of a user's personal companion sees another person, the user might say “Hi Jacob”. From that exchange the companion could automatically associate features derived from an image of that person with the name “Jacob”. Alternatively or additionally, the companion might ask “Who is that person?”, or if the companion knows the person's first name but not the last name, it might ask “What is Jacob's last name?” As another example, it would be desirable if the camera of such a device were worn so that a user could put a banana in its view and say “This is a banana” or “This banana is over-ripe. You can tell by the brown color of the peel”. The companion would then associate brown color on a banana with over-ripeness.
Another need is for personal companions to interact with a user as would a friend, abstracting information from activities of the user, or other sources, and possibly admonishing that user when something seems awry, or to make a suggestion. By way of contrast, current GPS systems are known that request information such as a destination, and even suggest different routes based upon preferences. But all of that is top-down programmed. Two similarly situated people (using the same software on the same model cell phone, in the same position in a traffic jam, at the same time of day, with the same destination and the same selected preferences) would get the same directions and even the same questions (“press 1 to select alternate route”) from their cell phones. A human friend, however, would know personal information about the driver, and might know that on this particular day the driver is not in a rush, and would be happy sitting in traffic listening to the radio. The human friend would have that information based on characterizations of the driver's behavior, not by the driver explicitly setting preferences. It would be helpful if a personal companion could do that. In another example, a person might be walking along at a swap meet. When he stops to buy some unneeded item, his personal companion might admonish him about spending too much money.
Another thing that a human friend does is provide guidance as to specific purchasing decisions. There are, of course, already systems that provide some aspects of purchasing and other guidance to consumers according to crowd-sourced comments. Examples include the “like/dislike” choice on Facebook™, and the “MustGo/Go/so-so/No/OhNo” movie rating system of Fandango™. But those systems are highly simplistic, and are of limited use because of the almost non-existent capability for sorting and filtering of the comments.
My 2000-era patents disclosed interfaces where consumers could sort products and services according to whatever characteristics were of interest to the searcher. That technology, however, stopped short of teaching how to provide purchasing and other guidance to consumers when a display screen is inconvenient or unavailable. For example, if a shopper saw a product advertised on a TV, billboard, or in a store, it would be useful for his/her personal companion to conduct a conversation regarding a possible purchase, without the shopper having to pull out his cell phone or tablet. In particular, it would be helpful to have the companion say something along the lines of “The most common reported characteristics of this product are” ease of use, price, and durability. Would you like information about any of those?” If the shopper then said “Tell me about durability”, the companion might respond “In 1115 characterizations, 30% said very durable, 28% said yes, and 24% said pretty good.”
Similarly, it would be useful if the conversation could provide information that compared “apples to apples”, e.g., price per gram or mg for foods or vitamins. In many instances that would require calculations to be performed on available data, and then that data to a user in a conversational format. As a simple example, it would be useful if a user could inquire in a conversational format something along the lines of “How many Chinese restaurants are in Burbank?” or “How many Chinese restaurants are within a half hour drive of my current location?”
Discussion herein regarding the value of spoken conversation is not meant to completely discount the value of providing visuals to a person, whether through glasses, a hand-held or desk-top device, or otherwise. What is needed, however, is the ability of a personal companion to help visualize the information in a compact format, such as sortable and filterable tables, and allow the user to control the presentation (perhaps sort or filter the records of a table) as part of a spoken conversation. Whereas a user might be able to get a name of a restaurant or an address by asking Google Glass™ “where is the nearest Chinese restaurant?”, it would be more useful the system would review the characteristics used to classify restaurants or Chinese restaurants, and then come back with a question such as “Are you looking for fast, cheap, or for fine dining?”
Another need (at least from the perspective of the current inventor) is for systems, methods, and apparatus that reduce the impact of advertising on decision making. For example, Google™ auctions placement of advertising on its search results pages. Whoever pays the most gets his link moved to the top, or in some other desired position. Most recently, Google™ has even blurred the distinctions between advertised records and search-ranked records, making the advertising more effective from the advertisers' point of view, but potentially reducing the value of the search results to the user.
Even where users have some very limited control over the ranking algorithm, as in Google™ Shopping, Google™ puts advertisers on the top. And of course, Google™ Shopping can only rank products according to price and relevance, which forces the user to look through page after page (with new advertisements each time) to ferret out other product characteristics such as free shipping. What is needed is for personal companions to provide search results to a requester, preferably in a conversational or tabular format, which would give a user the option of eliminating or at least reducing the advertising.