ELIXIR community event

Added by Branwen Hide on 14 December 2009 16:13

As I mentioned last week, I recently attended the ELIXIR community event held by the BBSRC, MRC, NERC and the Wellcome Trust. I have to admit I only had vague knowledge of ELIXIR, so found the event really useful. The day started out with Prof Janet Thornton, Director, EMBL-EBI, giving an overview of ELIXIR followed a series of presentations by individuals working in or associated with existing data centres around the UK.

According to the BBSRC website the point of ELIXIR is to ?construct and operate a sustainable infrastructure for biological information in Europe to support life science research and its translation to medicine and the environment, the bio-industries and society?. This is no mean feat. ELIXIR is organised into 14 work packages and is about to organise two surveys ? one of users and the other of data providers ? as well as five technical-feasibility studies. The outcome will be a model consisting of a number of nodes based in the different member states and a hub. The nodes themselves will be legal entities, but do not necessarily have to be national data centres and there is a wide scope as to what/ where they can be located. This will enable the nodes to better align with the strategic priorities of the member state(s) and funding agencies that support it. The hub itself, will act as the scientific/ technical coordinator, and will be responsible for training as well as standards and ontology development.

The need for training and standards was echoed throughout the day, especially as producers and users of data are no longer based within specialist data centres. Also data handling, management and storage requirements are becoming much more complex. For example, originally someone?s entire research project was to sequence the gene of interest or solve a protein structure and put it in the appropriate database. Now, finding the sequence or protein structure is often only a small component of the project, and the sequence/protein structure will only be made public at the time of publication. This means more data needs to be stored locally and for a longer period of time with more focus on the analysis and applications. Complete finished data is becoming rare, as people are often only interested in small sections of the data, which leads to questions over quality assurance, and therefore requires more detailed metadata.

The need for interoperability between different types of datasets and metadata was also discussed. This is particularly important if people are to start pulling different datasets together in innovative ways. The need for easy ways to identify databases and search them was also brought up. In lines with this is the need for new algorithms and tools to analysis the various datasets in new ways, as often the old tools/algorithms will only answer old questions and others are not necessarily scalable.

Other issues which were brought up include the need for recognition, how do you decide between local and central storage, how do you decide what to keep, how do you define data etc. Many of these issues were highlighted in our past report on the data sharing practises of researchers, and require guidance from research funders and others within the community. There was also a lengthy discussion over image data ? which would make this blog even longer so I talk about that another time.

