The present invention relates to data processing systems, and more specifically to scheduling large work flows on such systems.
Large Internet companies such as Yahoo!, Inc. continuously generate an enormous amount of data, including user data and web page data, from web searches to social relationships to geo-location data, and system data such as various performance metrics. Deriving useful information from the large volume of raw data supports a variety of service objectives, including presenting relevant contextual information, identifying trends in user behavior, and offering better targeted services.
Different types of data processing workflows may benefit from custom scheduling. Specifically, improved mechanisms for efficiently representing and scheduling diverse workflows for processing data would be beneficial.