“Big data” refers to data sets that are too large or complex for traditional data processing applications to properly analyze them. Challenges to processing big data include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy. The large scale of big data and associated applications, infrastructures, and data repositories presents unique challenges for security. As big data and associated analytics become more widely used, security and data protection concerns increase in importance.
Big data sets are growing rapidly in part because they are increasingly gathered by cheap and numerous information-sensing mobile devices, aerial devices (e.g., remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers, wireless sensor networks, etc.
In a big data environment, large data size necessitates that the data is distributed and stored in multiple nodes/servers. A data format used by nodes/servers may be semi-structured or have no structure at all (e.g., a plain text file). Unlike a traditional relational database, big data storage provides no index support, so search and retrieval to find a subset of data is usually a slow process. Additionally, due to privacy concerns, a user may only be authorized to see a small portion of the available data.