Home » Headline, data management, integration

The Fourth Paradigm: Data-Intensive Scientific Discovery

16 October 2009 No Comment

Fourth Paradigm Book Cover

Fourth Paradigm Book Cover

In the book The Fourth Paradigm: Data-Intensive Scientific Discovery, different authors expands on the vision of pioneering computer scientist Jim Gray for a new, fourth paradigm of discovery based on data-intensive science and offers insights into how it can be fully realized.

Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.

The speed at which any given scientific discipline advances will depend on how well its researchers collaborate with one another, and with technologists, in areas of eScience such as databases, workflow management, visualization, and cloud computing technologies.

While consumers are just starting to comprehend the idea of buying external hard drives for the home capable of storing a terabyte of data, computer scientists need to grapple with data sets thousands of times as large and growing ever larger. (A single terabyte equals 1,000 gigabytes.)

In the life science world, for example, one of the biggest databases to date, the unassembled DNA sequence (The Trace Repository) is about 65 Tb.

But the next generation of computer scientists has to think in terms of what could be described as Internet scale. Facebook, for example, uses more than 1 petabyte of storage space to manage its users’ 40 billion photos. (A petabyte is about 1,000 times as large as a terabyte, and is equivalent to about 500 billion pages of text.).

In physics, the Large Hadron Collider (LHC) at CERN in Geneva will generate collision data at a rate of around 16 Petabytes per year.

The European Bioinformatics Institute now holds over 2.5 petabytes of biological data, a figure that has doubled year over year for the past five years, and the next generation sequencing facility at Sanger Institute alone, with 40 sequencers (May 2009) predicts a data generation volume of 2 petabytes per anum.

The European Life Sciences Infrastructure For Biological Information initiative is trying to bring technical and financial resources to overcome this “data deluge”, as some call it.

But what it is clear is that we have some interesting times ahead.

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.