Providers of Internet access, such as enterprises, schools, libraries, employers, government bodies, and parents have an obligation, in some cases legal or self-interest, to protect information within and transiting their gateways. Websites external to their networks may provide destinations or sources of data, which they desire not to enter or leave their control. Conventional systems are overwhelmed by the volume, complexity, and anonymity of “bits on a wire”.
It is known in conventional systems that web archiving refers to taking a snapshot of all pages in the hierarchy of a static website which is unrelated to the problem of controlling data leakage. However even this has become archaic when a website serves an application or operates a database. Another conventional configuration as illustrated in FIG. 1 is a sniffer apparatus 300 coupled to a network 230 which logs packets representing requests from a user client such as a browser to a website and hypertext document which are responses transmitted from a server which transit the network. While all traffic may be logged into a file system 910, it is known that the quantities are enormous and except for governmental entities impractical to store and if stored uneconomical to analyze after storage. This problem is because each packet in isolation may be related to many different protocols and many applications on an uncountable number of websites.
What is needed is a way for a network operator providing Internet access to track information, images, and intellectual property which is exchanged with one or more external servers and to more efficiently trace the senders and receivers thereof.