Enterprises are often concerned with how best to manage the volume of emails and other files that they and their employees amass during the course of doing business. For example, a typical employee may, on a daily basis, send and/or receive tens if not hundreds of emails and may create many other files. For legal or other information retention reasons, an enterprise may need to retain these emails and files in a manageable and efficient way. Enterprises may use a variety of file management and archiving technologies as a way to manage the retention of large numbers of emails and files. For example, an enterprise may use an email archiving system to reduce the number of emails stored on an employee's computing device by storing the emails on an archiving server.
Conventional file archiving systems may index the files that they manage to facilitate later identification of the files. When a conventional file archiving system first receives a file, the file archiving system may (1) convert the file to text, (2) parse the text for data that may facilitate later identification of the file (e.g., a subject, content, recipients of an email), and (3) index the extracted data. Unfortunately, conventional file archiving systems may be unable to efficiently index container files (e.g., emails that include attachments and/or archive files that include compressed files) and their constituent files. For example, when some conventional file archiving systems first receive a container file, the file archiving systems may (1) convert a portion of the container file to text, (2) parse the text for data that may facilitate later identification of the container file, (3) index the extracted data, and (4) completely ignore the container file's constituent files.
In other examples, when some conventional file archiving systems first receive a container file, the file archiving systems may (1) extract all of the container file's constituent files, (2) convert the container file and its constituent files to text, (3) parse the text for data that may facilitate later identification of the files, and (4) index the extracted data. However, by extracting, converting, parsing, and/or indexing all of a container file's constituent files, these conventional file archiving systems may waste valuable resources, especially if (1) the container file's constituent files are nested within many layers of additional container files and/or (2) an enterprise does not need to later identify some of the container file's constituent files (e.g., an enterprise may only be interested in later identifying certain types of files). In an attempt to overcome these limitations, some conventional file archiving systems may index constituent files to a fixed depth or simply extract as many constituent files as possible in a given amount of time. Accordingly, the instant disclosure identifies and addresses a need for additional and improved systems and methods for extracting contents of container files.