Developing and marketing new (consumer) goods, and in particular goods in the fast moving consumer goods area, involves in many cases at some stage obtaining information from consumers as to what sort of products they want, and/or obtaining information from consumers to effectively target marketing and sales, and/or research during the product development process, and/or monitoring the consumer appraisal once the product is/has been evaluated before or after being launched. Gathering such information can be done in many different ways, e.g. by interviewing individuals or groups of consumers, feedback after purchase, spontaneous email (e.g. from consumers to care-lines of product websites), questionnaires call for free-text answers, et cetera. Usually such information is obtained in a textual format or can be translated into that.
Whether the information appears directly expressed or implicit in the text, it may be desired to perform some analysis or information extraction and/or interpretation to obtain this information. Existing examples of information extraction from factual documents (e.g. reports, scientific prose, news feeds, legal texts, etc.) are based on the recognition of entities and events. In order to fill pre-defined templates, the linguistic elements which represent the entities need recognising, the syntactic-structures in which the entities are embedded need disentangling, and the nature of the semantic links between entities need interpreting.
Although in principle similar information extraction techniques could be attempted with unstructured subjective textual data, the infinite multiplicity of the modes of expression used by individuals on subjects such as personal issues, opinions, beliefs and habits does not make these methods wholly appropriate and does not give satisfactory results. Examples of subjective or unstructured text are literature, free-text questionnaire answers, interviews and loosely-directed monologues, focus-groups interactions, spontaneous communications, etc. The information which is to be revealed are not events or entities, but rather qualifiers, concepts, opinions, etc. as well as the characteristics of a specific linguistic expression.
There are also techniques known for analysing textual data which can be grouped under the header linguistic analysis techniques. Examples of such techniques are extraction of linguistic units or linguistic features from textual chains, analysis of lexical semantic patterns, analysis of collocations and coherent groups, analysis of co-occurrences and conceptual chains, analysis of affect, and others. Such techniques are discussed in:                Atkins, B.T.S. & Zampolli, A. (eds) (1994). Computational approaches to the lexicon. Oxford: Oxford University Press.        Firth, J.R. (1951). Modes of meaning (Essays and Studies). In J.R. Firth: Papers in Linguistics, 190–215, London: Oxford University Press.        Ghiglione, R., Landré, A., Bromberg, M., Molette, P. (1998). L'analyse automatique des contenus. Paris: Dunod.        Lebart L., Salem A., Berry L. (1998). Exploring textual data. Dordrecht: Kluwer Academic Publishers.        Marchand, P. (1998). L'analyse du discours assistée par ordinateur. Concepts, méthodes, outils. Paris: Armand Colin (U).        Roberts, C.W. (1997). Text analysis for the social sciences: methods for drawing statistical interferences from texts and transcripts. Lawrence Erlbaum Assoc. Publishers, Mahwah, N.J.        Sinclair, J.M. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.        Whissell, C., Fournier, M., Pelland, R., Weir, D. & Makarec, K. (1986). A dictionary of affect in language: IV. Reliability, validity, and applications. Perceptual and Motor Skills.        
While such information extraction or linguistic analysis techniques may provide valuable information, there is more information which is hidden in textual data which is not revealed by these methods.
A wide variety of statistical methods are available for extracting information from numerical data. Such statistical methods can be anything from simple counting to more advanced statistical techniques, e.g. dimension reduction, clustering, hypothesis testing, model fitting, correlation and others. Various such statistical techniques are described in:                Dimension reduction: Krzanowski W.J. & Marriot F.H.C. (1994) Multivariate Analysis—Edward Arnold.        Clustering: Everitt B.S. (1993) Cluster Analysis—Edward Arnold.        Hypothesis testing: Altman D.G. (1991) Practical Statistics for Medical Research—Chapman & Hall.        Model fitting: e.g. Draper N.R. & Smith H. Applied Regression Analysis—Wiley (1998)        Correlation: The Cambridge Dictionary of Statistics (1998), Everitt B.S.—Cambridge University Press.        Time Series: The Analysis of Time Series: An Introduction. Chatfield C. (1996).        
While such statistical methods may provide information from numerical data, they are not directly applicable to textual data.
New profiling and segmentations of the consumer population are being sought, which are no longer based on demographics but on new drivers like life style, educational attainment, attitudes to the environment, health issues, social preferences, etc.