Automated valuation model (“AVM”) services provide real estate property value predictions using a mathematical predictive model and a set of “training data” describing values of other properties, typically including sales data from recent property sales within a geographic region. Some AVM services also take into account previous surveyor and/or assessor valuations, historical house price movements, user inputs (e.g., number of bedrooms, property improvements, etc.) and the like.
In many cases, the accuracy of a given predictive model may depend to a large extent on the training data provided for use by the model. Typically, training data is selected from a database containing a very large number of records describing real-estate property sales across a large region, such as a country, state, county, or the like, and across a large period of time, such as many years.
Previously known methods for performing automated valuations in general, and more particularly for selecting a set of training data for a subject real-estate property are described in U.S. Pat. No. 5,361,201, which is hereby incorporated by reference, for all purposes. Such previously known methods include establishing a fixed period of time (e.g. from zero to two years prior to an effective date) and expanding or contracting a geographic boundary until a desired count of sales records are selected.
For example, to select training data according to a previously known method, a sales-transaction database may be queried to identify sales-transaction records corresponding to real-estate property sales that took place within two years prior to an effective date and within a geographic search radius of (for example) one km of a subject real-estate property. Those records are counted, and if the count is below a predetermined threshold (e.g., 100 sales transactions), the geographic search radius may be iteratively increased until the count reaches the predetermined threshold.
However, merely targeting a predetermined count of sales transactions may not result in the selection of an optimal set of training data. Consequently, there is a need for an improved method of selecting training data to provide more accurate value predictions.