Translate this page:
Search this website:

BC/DRCloud StorageComplianceData CentresDeduplicationDisk/RAID/Tape/SSDsEthernet StorageSAN/NASTiered StorageVirtualizationSNIA & SNIA EuropeDCIM
White Papers
Web Exclusives
Media Pack


From 10 years to one week

SNS Europe talks to Pete Jokinen, Head of IT at the European Bioinformatics Institute (EMBL-EBI) in the UK, about how his role has changed to keep pace with the ever present, and ever increasing, need to collect, store and curate the information associated with technologies such as genome-sequencing, microarrays, proteomics and structural genomics. As Pete explains, the major challenge is to ensure that the ‘high-throughput revolution’ doesn’t drown the IT resource in data!


Date: 1 Aug 2010

SNS: Can you provide a brief professional autobiography, with reference to how your job has evolved over the years?

PJ: I completed a Masters in Computer Science at Helsinki University, involving one year of research under the supervision of Prof Ukkonen (studying approximate string matching). I then joined the National Public Health Institute in Helsinki as a senior analyst and worked on cancer prevention for approximately 5 years.

In 1996, I joined the EBI and in 1998 became the group leader of the systems team. In 1996 the team consisted of 3 people looking after about 70 EBI staff. We now have a systems team of 20, serving over 450 EBI staff.

SNS: Can you give some background as to the work carried out by the EBI?

PJ: As we move towards understanding biology at the systems level, access to large data sets of many different types has become crucial. Technologies such as genome-sequencing, microarrays, proteomics and structural genomics have provided parts lists for many living organisms, and researchers are now focusing on how the individual components fit together to build systems.

The hope is that scientists will be able to translate their new insights into improving the quality of life for everyone. However, the high-throughput revolution also threatens to drown us in data. There is an ongoing, and growing, need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation.

The European Bioinformatics Institute (EMBL-EBI), which is part of the European Molecular Biology Laboratory (EMBL), is one of the few places in the world that has the resources and expertise to fulfill this important task.

Our mission:

  • To provide freely available data and bioinformatic services to all facets of the scientific community in ways that promote scientific progress.
  • To contribute to the advancement of biology through basic investigator driven research in bioinformatics
  • To provide advanced bioinformatics training to scientists at all levels, from PhD students to independent investigators
  • To help disseminate cutting-edge technologies to industry

SNS: Please give a flavour of the research projects carried out and their typical (if possible) IT/storage requirements

PJ: The EMBL-EBI provides a unique environment for bioinformatics research: we have a broad palette of research interests that complement our data resources, and these two strands of activity at the EMBL-EBI are mutually supportive. Areas of research include genomic analysis of developmental pathways and regulatory systems, evolutionary analysis of sequence data, computational systems biology of signalling pathways, text mining of the scientific literature, and protein structure and function. The services teams, who develop and maintain our data resources, also perform research to develop powerful new tools to handle the flood of data. The 1000 Genomes Project is a prime example to illustrate the scale of the IT/storage requirements of some of the projects with which we are involved.

The data captured as part of the pilot projects was of such volume that transferring it between EMBL-EBI and NCBI in the USA occupied the maximum limits of our internet capacity and took several days.

SNS: Please give some idea of the IT/storage infrastructure that underpins the research projects - ie is there a basic infrastructure, to which is added the necessary processing power and storage depending on the research project being provisioned?

PJ: A general overview of our infrastructure is; Linux server farms (for computational needs); large amounts of NFS based storage solutions and database infrastructure using LINUX and SAN storage

SNS: Are you dealing with a single site when it comes to users and IT/storage resources, or coping with a disparate user base and a grid computing infrastructure?

PJ: The EBI’s IT and storage resources are used by EBI staff and the global scientific community. Servers are located at four geographically dispersed Data Centres.

SNS: Presumably storage requirement has grown into the Petabyte field - making some kind of a tiered storage approach crucial to optimise both CAPEX and OPEX?

PJ: The EBI data has doubled every year and is now in the region of approx 10 Petabytes. We use fibre channel and SAS disk systems for performance critical tasks and high density SATA systems for the rest.

SNS: Can you give some idea as to how the IT/storage infrastructure has changed over the years - ie the processing power and storage capacity requirements - ie a move from high performance computing into the supercomputing bracket, perhaps?

PJ: We have moved from propriety UNIX servers to commodity based Linux server / farms.

Year CPU Cores Disks (TB)
1996 50 0.2
1997 60 0.5
1998 75 2
1999 100 3
2000 140 5
2001 220 10
2002 420 20
2003 500 30
2004 700 55
2005 860 100
2006 2675 238
2007 3777 500
2008 5500 2500
2009 9300 6000

Our storage technology has developed from local / directly attached storage NFS gateways to:

  • NFS gateways and caches
  • High performance NFS
  • High-performance parallel file systems
  • Scale-out NAS NFS storage solutions

SNS: Also, have the research compute demands driven development of IT or IT developments allowed more sophisticated research...which drives which?

PJ: An ever increasing demand for more processing and storage capability from the EBI researchers, has driven us to find IT solutions to enable them to work / research to their maximum capacity.

SNS: Does the advent of technologies such as virtualisation, deduplication (or are the files all image-based?) and The Cloud offer any hope for the future re easing the data storage burden?

PJ: We have always deployed the leading edge solutions due to our massive growth.

SNS: Could you describe the IT/storage demands of a current specific project?

PJ: EBI projects are by and large using centrally shared resources (compute farms, database platforms & general purpose storage). This is an effective way of using our resources, rather than giving each project specific areas of storage / compute usage.

SNS: How do you backup petabytes of data in a timely fashion?

PJ: Our backup strategy is based on replicated disk based storage. We copy and synchronize every night to a remote Data Centre. We also take snapshots of the data and retain these for a week. In addition, our secondary back-up solution is using traditional tape-based methods.

SNS: Are you able to benchmark and share knowledge with other high performance/ supercomputer areas?

PJ: We collaborate mostly with life science organizations.

SNS: Could you give some comparison as to the shrinking time windows over the years in terms of the net effect from research projects?

PJ: The Human Genome Mapping Project was an international collaboration
that took almost 10 years to sequence 1 entire human genome. Now the same task can be accomplished in one week by one single sequencing machine. EBI has had to deal with this increase in raw data storage and this new time scale.

SNS: Do you have any specific objectives for the coming year?

PJ: During 2010 our objective is to move all of our external services to our two new active-redundant Data Centres in London.

SNS: We’ve not mentioned the ‘thorny’ issue of power - do you have any power or associated cooling issues?

PJ: On the campus we do have an issue with power and that is one of the main reasons why we are moving to external Data Centres.

SNS: Can you tell us about your approach to working with a wide variety of vendors and other third parties?

PJ: Due to the nature of having to meet the requirements of effective, robust and flexible data handling, we use multiple solutions from different vendors to find the optimal working solutions for the EBI.

Vendor-independent services, such as the S3 Reseller Channel, act as a gateway to many of these solutions and provide additional expertise to help us identify the best storage implementations.

SNS: Is there any such thing as a typical day?

PJ: The most interesting part of my role is that no two days are the same.

Pete provides a fascinating insight into the challenges he faces to ensure that the EBI’s IT infrastructure keeps pace with rapidly expanding data volumes and types in order to support the institute’s activities in providing freely accessible, comprehensive and integrated biological data resources to scientists worldwide.

The table of year/CPU cores and TBs gives the starkest illustration of just how data loaded organizations are becoming in the digital age.


Tags: Virtualization, BC/DR, Cloud Storage, Data Centres, Deduplication, SAN/NAS, Tiered Storage

Related White Papers

11 Jan 2012 | White Papers

The Infoblox VMware vCenter Orchestrator Plug-in by Infoblox

The ease and speed with which enterprises can deploy virtual machines has outpaced their ability to provide IP address services to them in a timely fashion. ... Download white paper

23 Nov 2011 | White Papers

Automated Storage Tiering on Infortrend’s ESVA Solution by Infortrend

This white paper introduces automated storage tiering on Infortrend’s ESVA storage solutions. Automated storage tiering can generate significant advant... Download white paper

Read more White Papers»

Related News

28 Sep 2015 | ICT

22 Sep 2015 | ICT

18 Sep 2015 | ICT

16 Sep 2015 | ICT

Read more News »
Related Web Exclusives

4 Dec 2014 | ICT

24 Feb 2014 | ICT

11 Feb 2013 | ICT

  • A look into the future

    Now that 2012 is nearly over, I guess it’s time to start looking at what’s coming down the track in 2013. Here are my top five predictions for th... Read more

4 Feb 2013 | ICT

Read more Web Exclusives»

Related Magazine Articles

Winter 2010/2011 | Disk/RAID/Tape/SSDs

Winter 2010/2011 | Data Centres

Winter 2010/2011 | Ethernet Storage

  • Saving time and money

    Following the acquisition of Double-Take Software, SNS Europe talks to Vision Solutions’ Chief Technology Officer, Alan Arnold, about how the two organ... Read more

October 2010 | Virtualization

Read more Magazine Articles»

Related Supplements

1 Feb 2009 | Virtualization

IT Professionals’ Guide to a Smarter Storage Environment: Solutions for Managing Data Growth and Controlling Costs

The storage landscape continues to evolve. Faced with such an array of storage ideas and options, how does one begin to make sense of such complexity?

Click here to learn more »

1 Feb 2009 | SAN/NAS

Networked Enterprise Storage Solutions for Business Partners

Avnet Technology Solutions (via acquisitions) helped the Fibre Channel Industry Association (FCIA) Europe put storage networking technology on the map, across Europe, more than 10 years ago. Move forward to the present day and the FCIA Europe has ?evolved? into the Storage Networking Industry Association (SNIA) Europe.

Click here to learn more »

Read more Supplements »


Latest IT jobs from leading companies.


Click here for full listings»