|Home > Virtualization > News > From 10 years to one week||
SNS Europe Data storage and IT management: Virtualisation, server networking and virtualization, virtualisation professional services, virtualization software, server virtualization, virtualization technology
SNS: Can you provide a brief professional autobiography, with reference to how your job has evolved over the years?
PJ: I completed a Masters in Computer Science at Helsinki University, involving one year of research under the supervision of Prof Ukkonen (studying approximate string matching). I then joined the National Public Health Institute in Helsinki as a senior analyst and worked on cancer prevention for approximately 5 years.
In 1996, I joined the EBI and in 1998 became the group leader of the systems team. In 1996 the team consisted of 3 people looking after about 70 EBI staff. We now have a systems team of 20, serving over 450 EBI staff.
SNS: Can you give some background as to the work carried out by the EBI?
PJ: As we move towards understanding biology at the systems level, access to large data sets of many different types has become crucial. Technologies such as genome-sequencing, microarrays, proteomics and structural genomics have provided parts lists for many living organisms, and researchers are now focusing on how the individual components fit together to build systems.
The hope is that scientists will be able to translate their new insights into improving the quality of life for everyone. However, the high-throughput revolution also threatens to drown us in data. There is an ongoing, and growing, need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation.
The European Bioinformatics Institute (EMBL-EBI), which is part of the European Molecular Biology Laboratory (EMBL), is one of the few places in the world that has the resources and expertise to fulfill this important task.
SNS: Please give a flavour of the research projects carried out and their typical (if possible) IT/storage requirements
PJ: The EMBL-EBI provides a unique environment for bioinformatics research: we have a broad palette of research interests that complement our data resources, and these two strands of activity at the EMBL-EBI are mutually supportive. Areas of research include genomic analysis of developmental pathways and regulatory systems, evolutionary analysis of sequence data, computational systems biology of signalling pathways, text mining of the scientific literature, and protein structure and function. The services teams, who develop and maintain our data resources, also perform research to develop powerful new tools to handle the flood of data. The 1000 Genomes Project is a prime example to illustrate the scale of the IT/storage requirements of some of the projects with which we are involved.
The data captured as part of the pilot projects was of such volume that transferring it between EMBL-EBI and NCBI in the USA occupied the maximum limits of our internet capacity and took several days.
SNS: Please give some idea of the IT/storage infrastructure that underpins the research projects - ie is there a basic infrastructure, to which is added the necessary processing power and storage depending on the research project being provisioned?
PJ: A general overview of our infrastructure is; Linux server farms (for computational needs); large amounts of NFS based storage solutions and database infrastructure using LINUX and SAN storage
SNS: Are you dealing with a single site when it comes to users and IT/storage resources, or coping with a disparate user base and a grid computing infrastructure?
PJ: The EBI’s IT and storage resources are used by EBI staff and the global scientific community. Servers are located at four geographically dispersed Data Centres.
SNS: Presumably storage requirement has grown into the Petabyte field - making some kind of a tiered storage approach crucial to optimise both CAPEX and OPEX?
PJ: The EBI data has doubled every year and is now in the region of approx 10 Petabytes. We use fibre channel and SAS disk systems for performance critical tasks and high density SATA systems for the rest.
SNS: Can you give some idea as to how the IT/storage infrastructure has changed over the years - ie the processing power and storage capacity requirements - ie a move from high performance computing into the supercomputing bracket, perhaps?
PJ: We have moved from propriety UNIX servers to commodity based Linux server / farms.
SNS: Also, have the research compute demands driven development of IT or IT developments allowed more sophisticated research...which drives which?
PJ: An ever increasing demand for more processing and storage capability from the EBI researchers, has driven us to find IT solutions to enable them to work / research to their maximum capacity.
SNS: Does the advent of technologies such as virtualisation, deduplication (or are the files all image-based?) and The Cloud offer any hope for the future re easing the data storage burden?
PJ: We have always deployed the leading edge solutions due to our massive growth.
SNS: Could you describe the IT/storage demands of a current specific project?
PJ: EBI projects are by and large using centrally shared resources (compute farms, database platforms & general purpose storage). This is an effective way of using our resources, rather than giving each project specific areas of storage / compute usage.
SNS: How do you backup petabytes of data in a timely fashion?
PJ: Our backup strategy is based on replicated disk based storage. We copy and synchronize every night to a remote Data Centre. We also take snapshots of the data and retain these for a week. In addition, our secondary back-up solution is using traditional tape-based methods.
SNS: Are you able to benchmark and share knowledge with other high performance/ supercomputer areas?
PJ: We collaborate mostly with life science organizations.
SNS: Could you give some comparison as to the shrinking time windows over the years in terms of the net effect from research projects?
PJ: The Human Genome Mapping Project was an international collaboration
SNS: Do you have any specific objectives for the coming year?
PJ: During 2010 our objective is to move all of our external services to our two new active-redundant Data Centres in London.
SNS: We’ve not mentioned the ‘thorny’ issue of power - do you have any power or associated cooling issues?
PJ: On the campus we do have an issue with power and that is one of the main reasons why we are moving to external Data Centres.
SNS: Can you tell us about your approach to working with a wide variety of vendors and other third parties?
PJ: Due to the nature of having to meet the requirements of effective, robust and flexible data handling, we use multiple solutions from different vendors to find the optimal working solutions for the EBI.
Vendor-independent services, such as the S3 Reseller Channel, act as a gateway to many of these solutions and provide additional expertise to help us identify the best storage implementations.
SNS: Is there any such thing as a typical day?
PJ: The most interesting part of my role is that no two days are the same.
The table of year/CPU cores and TBs gives the starkest illustration of just how data loaded organizations are becoming in the digital age.
|Related White Papers|
|Read more News »|
|Related Web Exclusives|
|Related Magazine Articles|
|White Paper Downloads|
Keep up to date with the latest industry products, services and technologies from the world's leading IT companies.