As companies grow and change, databases grow and change with them. To plan for future changes to databases, developers have attempted to plan databases around expected growth patterns including, but not limited to, number of characters in a street address, number of digits for transaction amounts, number of characters in user names, and the like. In addition to planning for the number of characters to budget for a given field, growth of indices for databases may be planned as well. An issue that exists with planning for database and/or index growth is the sample data upon which the databases and/or indices are based. Individual characteristics of a given sample dataset may result in database planning going awry as those individual characteristics in the sample data may be mistakenly interpreted by developers as a pattern in global data. The future plan for the database and/or index may be incorrectly biased by overemphasizing the outliers in the sample data.
Aspects described herein may address these and other problems, and generally improve the quality, efficiency, and speed of modeling database systems by offering improved processes for improving sample data upon which databases and/or indices may be modeled.