Removing outlier data in standards or data-driven model development is an important part of the pre-analysis work to ensure a representative and fair analysis is developed from the underlying data. For example, developing equitable benchmarking of greenhouse gas standards for carbon dioxide (CO2), ozone (O3), water vapor (H2O), hydrofluorocarbons (HFCs), perfluorocarbons (PFCs), chlorofluorocarbons (CFCs), sulphur hexafluoride (SF6), methane (CH4), nitrous oxide (N2O), carbon monoxide (CO), nitrogen oxides (NOx), and non-methane volatile organic compounds (NMVOCs) emissions requires that collected industrial data used in the standards development exhibit certain properties. Extremely good or bad performance by a few of the industrial sites should not bias the standards computed for other sites. It may be judged unfair or unrepresentative to include such performance results in the standard calculations. In the past, the performance outliers were removed via a semi-quantitative process requiring subjective input. The present system and method is a data-driven approach that performs this task as an integral part of the model development, and not at the pre-analysis or pre-model development stage.
The removal of bias can be a subjective process wherein justification is documented in some form to substantiate data changes. However, any form of outlier removal is a form of data censoring that carries the potential for changing calculation results. Such data filtering may or may not reduce bias or error in the calculation and in the spirit of full analysis disclosure, strict data removal guidelines and documentation to remove outliers needs to be included with the analysis results. Therefore, there is a need in the art to provide a new system and method for objectively removing outlier data bias using a dynamic statistical process useful for the purposes of data quality operations, data validation, statistic calculations or mathematical model development, etc. The outlier bias removal system and method can also be used to group data into representative categories where the data is applied to the development of mathematical models customized to each group. In a preferred embodiment, coefficients are defined as multiplicative and additive factors in mathematical models and also other numerical parameters that are nonlinear in nature. For example, in the mathematical model, f(x, y, z)=a*x+b*yc+d*sin(ez)+f, a, b, c, d, e, and f are all defined as coefficients. The values of these terms may be fixed or part of the development of the mathematical model.