Automated dialogue systems are imperfect but typically improve over time as additional data is collected from real-world users. This creates a chicken-and-egg problem because the dialogue system should reach a high level of performance before real users will voluntarily make use of it, but this level of performance is most easily attained after collecting data from real users.
One approach for solving this problem is to generate data which is similar to real users in an automated fashion or by paying users to use the system in order to generate data. Unfortunately, both approaches produce very different kinds of data than real-world scenarios. Generated data only includes behaviors, language, and so on, that the author of the generator can imagine, which is usually much less varied than real-world use. Similarly, paid users have very different goals and usage patterns than real users (e.g. they are unlikely to exhibit frustration), and quickly fall into unrealistic habits and patterns. Either way, the quality of this data is not nearly as high as that from real users using the system. An improved dialogue system that is able to provide a high level of service with little initial data is desirable.