The present disclosure relates generally to digital assistant systems in humanoid form employing facial recognition software, natural language processing, artificial intelligence, text to speech software, real time 3D rendering of computer animations and a holographic image display system.
Currently available digital assistant systems may be in the form of software running on mobile cell phones or browser-based website applications, or, may be in the form of in-home devices that look like speakers, in order to provide users with a way to ask questions and receive information through voice inputs. For example, interested readers may review U.S. Pat. No. 9,548,050, titled “Intelligent Automated Assistant,” and U.S. Pat. No. 6,584,439, titled “Method and Apparatus for Controlling Voice Controlled Device.” Although these are functional ways for a user to interact with a digital assistant in order to receive information (such as the weather and driving conditions), these interactions do not necessarily provide a satisfying experience, and they lack in providing the user with the experience of interacting with another human being.
Kiosks, vending machines, and automated teller machines are examples of physical consoles that a user can walk up to and receive something back, tangible or non-tangible. With respect to kiosks, the tangible output might be something like a prepaid phone card, or even a device, like headphones. Vending machines obviously provide tangible output in the form of drinks and snacks. Similarly, automated teller machines usually dispense cash after receiving a set of detailed inputs from the user.
Existing digital assistants do not typically use facial recognition software. Facial recognition software is still limited in practical applications, primarily being used for security and threat assessment technologies by the government as well as social media platforms. Existing facial recognition software involves uploading pictures of a specific person to a database and then running them through an advanced algorithm to determine the individual's identity. Commercial applications for facial recognition software are limited to social media and photo sharing applications that utilize the software to differentiate individuals in photos. There are no commercially available digital assistants, kiosks, vending machines, or automated teller machines that utilize facial recognition software to validate and differentiate users who are speaking to the system.
Natural language processing is currently used by several major technology corporations for a variety of reasons. The two primary commercial applications of natural language processing are for asking questions that can be answered through a search engine and for requesting music playback. These common applications are generally accomplished through the use of digital assistant software that is embedded into mobile cell phones and into pieces of hardware that look like speakers. There are currently no commercial applications of natural language processing software utilized for the purposes of carrying on a two-way conversation with a computer or holographic simulation of a person with the intent to conduct business or purchase something from the area or retail location.
Artificial intelligence is an expansive term that comprises the goal of a machine, device, or software program to perceive its environment, take actions, and influence outcomes. A program is deemed to be artificially intelligent when it is capable of mimicking the cognitive functions of a human and creating the illusion that a user is speaking with another person. Early examples of this software include Chatbots, a technology that has been developed over the past two decades. More recently, artificial intelligence technologies have greatly advanced through various devices like IBM's Watson. Watson's server-based software program is now being extended and provided to governments, universities and businesses to find patterns within massive amounts of seemingly disparate data. Although Watson and other devices like Watson have been packaged inside of hardware for interaction in the real world beyond the internet and localized computers, there currently doesn't exist any artificial intelligent systems in the retail and office environments that allow customers and employees to converse in real-time with the artificial intelligence through a holographic representation of a human in order to receive a service or seek some desired outcome.
Some existing systems incorporate a concept called “machine learning”, which provides the ability for the software platform to learn without being specifically programmed by a person. Machine learning is a form of advanced pattern recognition that utilizes past experiences to build assumptions about collected data and make informed guesses about optimal solutions based on incomplete information. Machine learning is being utilized by major corporations and governments around the world to analyze massive amounts of data related to populations, citizens, and customers. Existing digital assistant technology utilizes basic machine learning concepts in order to remember things like driving directions, where users live, when they should leave for a meeting, and what type of music and news a specific user enjoys. Additionally, certain smart home devices, like thermostats, can learn user preferences over time and automatically adjust their settings based on user preferences, time, and location for things like temperature or lawn watering schedules.
These forms of basic machine learning or conceptual awareness programs have started to permeate into commercial applications and do make users' lives easier. However, as discussed above, there are currently no commercial or business applications for human-like computer generated imagery that remembers user preferences and tastes and provides a service to the user. This new application incorporates modern machine learning technologies into a completely new and different application in the marketplace.
Text-to-speech software converts text into speech based on a pre-synthesized voice. Currently most individuals interact with computers and software through touch pads, keyboards, computer mice, computer monitors and mobile phone screens. As a result, there is not currently a strong need in these mediums to expand upon the existing infrastructure of predominantly text-based forms of digital communication with auditory ones. One of the more widely used forms of text-to-speech are digital assistant software programs that provide audio responses to a user's questions or requests. Most major global positioning systems that provide turn-by-turn directions utilize text-to-speech communication programs in order to allow the driver to keep their eyes on the road while still receiving directions. Other common applications for commercial and residential text to speech software programs are used by individuals with disabilities—for example, the famed physicist, Stephen Hawking.
Speech-based training algorithms involve the process of using spoken language to provide feedback to an artificially intelligent software application. There are no existing direct comparisons for speech-based training programs to update the programming of a software application.
Computer generated imagery (CGI) is a very well-established industry, with major uses for things like entertainment, making movies, music videos, and video games. CGI is also utilized in commercials to create simulated environments and advertisements. Computer generated imagery is most often pre-rendered or created in advance of a user being able to watch or interact with the CGI. For example, most modern animated movies incorporate some form of pre-rendered CGI. There are very few applications where CGI is rendered or moved from the creation environment to an environment where it can be enjoyed in real-time. The reason for this is that most CGI programs require large amounts of data processing power to produce an image, and most computing systems cannot handle such burdensome requests. Very few real-time systems use CGI technologies, especially within consumer and business settings.
Projection systems exist in many applications across retail, office and personal settings to display images. For example, they are utilized by movie theaters and home users who wish to project movies, television shows, and video games onto large surfaces in order to create massive screens. Projection systems are also used in advertising settings to display products or services onto a wall or window in order to catch the attention of users passing by the projection system. Certain retail environments may use a projection system coupled with a glass or plastic surface in order to present music videos and advertisements to users in the area. There does not currently exist any applications in the consumer or personal market that utilizes projection systems to display a human-like computer generated image for the purpose of carrying on conversations and completing specific business tasks like buying movie tickets or purchasing food.
In summary, no product currently exists that combines facial recognition software, natural language processing, artificial intelligence principles, text-to-speech software, and CGI to produce a conversational holographic assistant for providing services to a user in real-time.