Pages

Monday, August 31, 2009

The Ephemeral Infosphere

An August 28th article in The Wall Street Journal entitled "A Data Deluge Swamps Science Historians" makes a truly startling statement:
"Computer users world-wide generate enough digital data every 15 minutes to fill the U.S. Library of Congress. In fact, more technical data have been collected in the past year alone than in all previous years since science began, says Johns Hopkins astrophysicist Alexander Szalay, an authority on large data sets and their impact on science."
That's about 3,000 terabytes every 15 minutes or about 288,000 terabytes a day, using a decade-old estimate of the data displacement of the Library of Congress. Those are the data lost to recorded history.

Then there are the data that are not lost, but but un-knowable by search engines, the so-called "Dark Web":
Estimates based on extrapolations from a study done at University of California, Berkeley, show that the deep Web consists of about 91,000 terabytes. By contrast, the surface Web (which is easily reached by search engines) is only about 167 terabytes... (Wikipedia, op cit.)
So, for every page of information found through search engines, there are 544 that are un-findable, and (at least) another 1,725 pages that are produced and lost. Of course, this does not count the blizzard of emails, tweets, IMs, real time videos, and the mountains of business and other kinds of data that never make it to a disk on the internet.

This suggests that, though a thick welter of data swirls around the planet every minute, only the tiniest fraction actually survives to become an object of use or recollection. The rest is like dark matter or physical prehension; massive, causal, but in principal unknowable.

0 comments: