This description relates to processing data from multiple sources. Data can be stored in a variety of sources, including, for example, an HDFS (Hadoop Distributed File System) cluster. A data processing system can perform operations on data received from an HDFS cluster and also data received from other types of sources.