The present invention relates to information processor arrangements and in particular arrangements utilised in order to identify by processing, management, analysis and manipulation unexpected links and new knowledge, risk and uncertainty in and between data in a data set.
Ever increasing use of financial and other services leads to a vast volume of data being collected. If this data is to be useful to the enterprise and connected enterprises for which it was collected it needs to be analysed. The methodology described in this application provides an efficient and effective means to discover knowledge risk and uncertainty that may be important to the enterprise or connected enterprise in which it was collected to be analysed. Furthermore, this data may be held by a number of service providers with varying degrees of accessibility to others who may be competitors or there may be other reasons defeating data sharing pools and the information itself may be of variable reliability. For example, with respect to an insurance claim, typically a claimant will be asked to provide particular details on a claim form and further information may become available through subsequent contact in respect of further details provided by the claimant or the insured, and possibly through interrogative techniques such as recordal of the telephone number and other data upon which the claimant calls the insurer or the postal sorting office from which paperwork is despatched. In these circumstances it is easy for a fraudster or other persons wishing to perform irregular activities to hide those activities within the multitude of data as well as/or use false information or misleading information for the purpose of evading detection as a fraudster. For example, this methodology helps to identify persons engaging in “Identity Deception” by presenting or publishing their details to recipients rather than proving them in the course of their transaction. Hence, the methodology enhances the ability of the organisation to evaluate the authenticity of a claim made as to identify or alternatively, that some fact or event existed in a given form or state. It is also necessary for some organisations to demonstrate compliance with their regulatory obligations and due diligence responsibilities.
The challenge with respect to investigatory and regulatory bodies, organisations and authorities is to identify within the multitude of information those transactions or activities which require more detailed personal and iterative consideration. Clearly, with respect to so called relational databases, it is possible to define Boolean logic strings in order to provide the search results from the database. Unfortunately, such an approach is either too focused or too diverse generally to identify those most questionable of transactions or activities from the multitude. It will be understood that investigatory resources and those involved in generating business intelligence, exercising due diligence and risk management are limited such that in situations where at the start of an investigation there is a high degree of uncertainty and complexity surrounding the details of the potential irregular activity, it is difficult to be certain that a high proportion of such activities will be detected. As such, this methodology is not limited to Investigatory Resources and equally applies to business intelligence, exercising due diligence and risk management. Investigator in this respect should be construed as meaning any person or body of persons engaged in investigation, business intelligence, exercising due diligence and risk management. For example, an investigator may be aware of known modes of fraudulent activity, but cannot be certain that other forms of activity are not being performed and secondly the most appropriate data items which will be key to detection of a fraud or activity have been identified. What is required is an analytical tool which generates meaningful clusters of information rather than individual items of information. Furthermore, the number and type of clusters formed should be adjustable dependent upon the nature of the fraud or activity and the resources available. In such circumstances it may be possible to identify particular instances of fraudulent activity and so define these clusters in ways necessary in order to identify that activity or alternatively provide a risk assessment with respect to provision of services or otherwise based upon the ease with which fraudulent or irregular activity can be identified with particular information input checks.
Increasing use of remote provision and validation of services has increased the reliance upon individual identification. However, and inevitably, there is an increase in identity theft or use of alias names and other false personal or other details whereby individuals represent themselves as somebody else in order to gain services or goods in the wrongfully identified person's name. Furthermore, it is possible to have a cascade of identity thefts or alias names and other false personal or other details which directly or indirectly link the perpetrator of such fraud or irregular activity to other instances of activity or scenarios or data of interest from themselves.
Service and goods providers and those charged with responsibility for investigation, analytical work such as regulatory compliance, risk assessments, crime investigation and fraud detection need to be able to identify those transactions or passages of activity which are most likely to be as a result of fraud or unacceptable behaviour from a given mass of data. Furthermore, this identification is generally time dependent in that activity continues and it would be unacceptable in most commercial situations (or other situations where deployment of finite resources has to be carefully managed over time) to delay provision of services or goods over a prolonged period to investigate a large number of potential instances of fraud or inappropriate activity. Furthermore, time changes context and so knowledge discovery as a process needs to be both content aware and context driven. (Knowing what is present and discovering what it means or could mean given certain conditions).
The problem can be summarised in the following way. Collections of information described are typically comprised of many different variables. These datum and variables are collected by different enterprises because they are thought by those charged with that responsibility to represent key items of information important to the enterprise as a whole. Therefore, the context in which the collection process operates is dictated at some time in the past and by reference to some given perspective of what was relevant to the enterprise at that time. The way these datum and variables are related together in the process of an analysis is fundamentally important to the extraction of knowledge and the identification of missing information and misleading information. Different combinations of information produce different results but even in small collections of information a problem is presented: that is, the number of possible combinations of two or more items of information is often so large that problems of scale and utility prevail. For example, the number of possible combinations of two or more items of information in a given collection of data can be calculated on the basis of an exponential equation of 2n(−n+1). This can produce massive numbers incapable of being processed sequentially in an efficient manner. If “possibly relevant but no currently represented” information in the collection is also included in this calculation then the resulting problem is even greater. Furthermore, time is an important factor because time changes context and, in turn, context changes meaning. It is important to be able to use the results of the process as new inputs. This provides ‘Feedback’ information to the system employed which in turn allows the processing to be context driven. This means that any solution must take account of the exponential combination of different data items and the influence time has on context. A processor therefore need to reflect this.