1. Technology Field
The present disclosure is directed to natural language processing.
2. Related Art
The background description includes information that may be useful in understanding the present inventive subject matter. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed inventive subject matter, or that any publication specifically or implicitly referenced is prior art.
Keeping a food diary, i.e. a detailed log of all food consumed, can be a powerful mechanism to lose weight or monitor a person's nutrition. Currently, there exists a large number of mobile or web-based food tracking applications. Food tracking can be quite time consuming and tedious for users. Verbal language recognition technology is used to make the food tracking task easier that allows users to simply speak what they have eaten into the microphone or keyboard of a device, such as a mobile phone, smartphone, tablet, computer or other device. For example, a user might verbalize: “For breakfast I had a bowl of oatmeal with strawberries and a soy latte with honey.” Before starting a database query for the foods mentioned in this example sentence, a digital representation of the uttered sentence is typically processed through natural language processing steps such as normalization, stemming, and tagging. However, such processes fail to identify how to properly split the digital representations of the words of the provided utterance. The food domain is replete with compound words and mixed phrases making the parsing of verbalized words and sentences challenging to accurately identify the type of food a user wishes to track.
Related work with regard to the basic natural language processing task of splitting compound words includes, for example, U.S. Pat. No. 7,711,545 to Philip Koehn entitled “Empirical Methods for Splitting Compound Words with Application to Machine Translation”, issued May 4, 2010, the substance of which is incorporated herein by reference. However, the rank splitting employed and disclosed in U.S. Pat. No. 7,711,545 is solely based on frequency of occurrence in a single corpus and thus fails to take into account a multitude of different information sources in order to improve the splitting accuracy.
Prior research also exists in employing a splitter for compound words for use in the context of automated speech recognition and language modeling, see U.S. Pat. No. 7,801,727 to Gopalakrishnan et al. entitled “System and Method for Acoustic and Language Modelling for Automatic Speech Recognition with Large Vocabularies” issued Sep. 21, 2010, the substance of which is incorporated herein by reference. The disclosure of U.S. Pat. No. 7,801,727 fails to provide insight into domain-specific subject matter such as dealing with food; handling recipes and restaurant menus that contain nutritional information, for example. The disclosure of U.S. Pat. No. 7,801,727 is aimed at recognition of acoustic data rather than seeking to extract the correct food grouping for subsequent food database queries for nutritional information.
All patents or publications identified herein are incorporated by reference to the same extent as if each patent, individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
In some embodiments, the numbers expressing quantities of ingredients, properties such as concentration, reaction conditions, and so forth, used to describe and claim certain embodiments of the inventive subject matter are to be understood as being modified in some instances by the term “about.” Accordingly, in some embodiments, the numerical parameters set forth in the written description and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of some embodiments of the inventive subject matter are approximations, the numerical values set forth in the specific examples are reported as precisely as practicable. The numerical values presented in some embodiments of the inventive subject matter may contain certain errors necessarily resulting from the standard deviation found in their respective testing measurements.
Unless the context dictates the contrary, all ranges set forth herein should be interpreted as being inclusive of their endpoints and open-ended ranges should be interpreted to include only commercially practical values. Similarly, all lists of values should be considered as inclusive of intermediate values unless the context indicates the contrary.
As used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
The recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range. Unless otherwise indicated herein, each individual value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g. “such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the inventive subject matter and does not pose a limitation on the scope of the inventive subject matter otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the inventive subject matter.
Groupings of alternative elements or embodiments of the inventive subject matter disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience and/or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims.
Thus, there is still a need for a device that can split a food-related text with high accuracy based on employing multiple splitting algorithms and rank-merging the results.