In linguistics, stress is the relative emphasis that may be given to certain syllables in a word. Stress is typically signaled by such properties as increased loudness and vowel length, full articulation of the vowel, and changes in pitch.
The stress placed on syllables within words is called word stress or lexical stress. Some languages have fixed stress, meaning that the stress on virtually any multi-syllable word falls on a particular syllable, such as the first or the penultimate. Other languages, like English or Russian, have variable stress, where the position of stress in a word is not predictable in that way. Sometimes more than one level of stress, such as primary stress and secondary stress, may be identified.
In many languages, like in English or Russian, traditional writing does not show the stress position in a word. Determining a correct stress position in words in absence of the respective information is a peculiar problem known in computer technologies. A person, while reading texts where stress positions are not marked, still pronounces words correctly, because they have learned how a particular word has to be pronounced, or they may feel it intuitively and in most cases they are correct. In contrast, known computing devices may pronounce correctly either known word forms, when these known word forms are stored in computer readable storage medium in association with the correct stress position (a first “dictionary approach”). Known computing devices may also pronounce correctly unknown word forms (“new words”) if they can determine stress position of an unknown word form by calculating a probable stress position (a second “frequency analysis” approach).
Both the first and the second approaches have drawbacks. The first known “dictionary” approach can be applied to known word forms. One of its drawbacks is that it does not work when a word form is “unknown” for the computing device (a new word), i.e. that that word form is absent from the accessible list of word forms associated with stress positions. One could see a possible solution in generating a list of all known word forms associated with the stress position. However, this task is not easy as it may appear at the first glance. To better illustrate a depth of the challenge, we will mention that linguists cannot arrive at a consensus how many words are in Russian language: 140,000, or 200,000, or more. Moreover, in some languages, such like in Russian language, word forms of a given word can vary a lot: in social networks, a picture is circulating which demonstrates over 100 Russian word forms which correspond to only four English word forms “run”, “runs”, “ran”, “running”. The problem is further exacerbated by existence of neologisms. The problem is also further exacerbated by the fact that some uses of certain words (for example, when used by users of social networks) may be intentionally mis-used with intentionally committed errors, whereby the correct stress position is still obvious for humans.
The second known “frequency analysis” approach for determining stress positions (sometimes considered to be subsidiary approach) can be applied to unknown word forms. The frequency analysis approach includes analyses (by a computer apparatus) frequency of a particular stress position in a particular context and calculates probability of a particular stress position depending on affixes.
For example, the U.S. Pat. No. 7,356,468 B2 “Lexical stress prediction” teaches using affixes to predict stress positions: “In an embodiment, at least one of the models comprises correlations between word affixes and the position within words of the lexical stress. In general, the affix may be a prefix, suffix or infix. The correlations may be either positive or negative correlations between affix and position. Additionally, the system returns a high percentage accuracy for certain affixes, without the need for the word to pass through every model in the system.” According to Wikipedia, article “Affix”, “[a]ffixes are divided into plenty of categories, depending on their position with reference to the stem.”
This approach, using affixes to predict stress positions, requires, in many instances, an immense training set and, secondly, a lot of computational resources for real-time processing.