BioFederator: A data federation system for bioinformatics on the web
Abstract
A problem facing many bioinformatics researchers today is the aggregation and analysis of vast amounts of data produced by large scale projects from various laboratories around the world. Depositing such data into centralized web-based repositories (e.g. NCBI, UCSC Genome Browser) is the common approach. However, the distributed nature of the data, its growth rate, and increased collaborative needs represent real challenges calling for novel decentralized web architectures. The BioFederator is a web services-based data federation architecture for bioinformatics applications. Based on collaborations with bioinformatics researchers, several domainspecific data federation challenges and needs are identified. The BioFederator addresses such challenges and provides an architecture that incorporates a series of utility services. These address issues like automatic workflow composition, domain semantics, and the distributed nature of the data. It also incorporates a series of data-oriented services that facilitate the actual integration of data. The BioFederator is deployed on a grid environment over the web. The proposed design, services, and usage scenarios are discussed in detail. We demonstrate how our architecture can be leveraged for a real-world bioinformatics problem involving tissue specificity of gene expression. Copyright © 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). AU rights reserved.