NIST logo
*

Ontologies and Interoperability in Evolutionary Comparative Analysis

Summary:

Many computer-based inferences in contemporary biology are comparative and would benefit from the rigorous and flexible approach of evolutionary comparative analysis (ECA), in which similarities and differences between evolved things are treated explicitly as outcomes of an evolutionary process. A practical example would be inferring regulatory sites in the human genome as slow-evolving non-coding regions in a multi-species genome comparison. Broader application of such approaches is hampered by lack of an interoperability infrastructure. We are working with domain experts to develop formal ontologies and other artifacts, such as file formats, data standards, and workflow environments, and to apply them to improving interoperability in this domain.

Description:


Intended impact

Nearly all scientists who regularly use online resources on genes, proteins and genomes make use of comparative data to advance biomedical research. For instance, researchers often make useful inferences by comparing human genes (as well as proteins, reactions, interactions, pathways, behaviors and so on) to those of more well studied, experimentally tractable “model organisms” such as the mouse. In principle, robust and flexible ECA methods can be applied to a huge range of such comparative problems, but in practice, this is difficult. Our technologies will lower the barrier to applying ECA methods. Use of these technologies by service-providers and scientific end-users will increase the value of comparative data by expanding the depth and breath of inferences made from these data.

Objective

Facilitate data interchange both for end-users and for data providers, increase scientific re-use of available (pre-computed) comparative data sets, increase representation and re-use of ECA protocols, and facilitate validation, automation and scale-up of ECA approaches.

Goals

  • Develop a formal ontology for comparative analysis, and a data exchange file format
  • Evaluate and refine the ontology as a tool to support interoperability of file formats and data resources
  • Develop a minimal standard for representing protocols (MIAPA, Minimal Information for a Phylogenetic Analysis)
  • Develop a language for representing analysis workflows
  • Evaluate and refine the workflow language using MIAPA reports and captured knowledge of workflows
  • Develop a semantics-based system for representing, designing, validating, re-using, and executing workflow plans.

Research activities and technical approach

Efforts began in 2006 with organizing a group of domain experts and developing a proposal for what was to become the NESCent Evolutionary Informatics (“evoinfo”) working group. NESCent is an NSF-supported center focusing on synthetic and infrastructure-building activities. The evoinfo working group has met 3 times since 2007. After establishing priorities, Dr. Vos and others began to focus on an XML-based data file format, nexml. Others led by Dr. Stoltzfus focused on developing CDAO (Comparative Data Analysis Ontology). The group and its members also developed a Concept Glossary, a list of use-cases, a guide to supporting the existing NEXUS file format standard, and a White Paper on a MIAPA (Minimal Information for a Phylogenetic Analysis) standard.

The fourth meeting of the group in March, 2009, will be a “Data Resources Interoperability Hackathon” in which we work with data providers (from the scientific community) on using CDAO and nexml to exchange data.

In pursuit of the other project goals, a team led by Dr. Stoltzfus has begun a project to develop and apply a workflow system that applies advanced computer science technologies, including logical denotations for semantic transformation, a Domain-Specific Language (DSL), and automated planning technology for discovering and configuring workflows.

Major Accomplishments:

  • Organized and co-led NESCent Evolutionary Informatics working group (2007 on)

  • Recruited experts from domain of phylogenetic analysis

  • Assessed interoperability needs and developed proposal

  • Met with domain experts to develop interoperability priorities

  • Spawned effort to develop ontology (www.evolutionaryontology.org/)

  • Spawned effort to develop data format (www.nexml.org)

  • Maintained extensive wiki documentation (www.nescent.org/wg_evoinfo)

  • Co-led team implementing CDAO ontology (Prosdocimi, et al., in review)

  • Participated in, and helped to organize, 2006 NESCent “Phyloinformatics Hackathon” (Lapp, et al., 2007)

  • Participated in organizing the upcoming 2009 NESCent “Data Resource Interoperability Hackathon”

Start Date:

June 1, 2006

End Date:

ongoing

Lead Organizational Unit:

mml

Source of Extramural Funding:

NESCent (www.nescent.org), an NSF-funded center, provides meeting support for the Evolutionary Informatics Working Group

Customers/Contributors/Collaborators:

Customers of products: While our interoperability tools are still in development, we do not have end-user customers. Some end-user tools have incorporated support for nexml (see www.nexml.org for a list).

Contributors or collaborators:

  • NSF National Evolutionary Synthesis Center (Durham, NC)
  • CDAO development team (New Mexico State Univ., Las Cruces; Univ. Texas at Dallas; Univ. Strasbourg, France)
  • Participants of the Evolutionary Informatics Working Group (UC Davis Genome Center; U. Washington, Seattle; Univ. Kansas; Antiviral Research Center, UCSD; Center for Evolutionary Functional Genomics, ASU; GlaxoSmithKline; Univ. Arizona; Univ. British Columbia; Hunter College; Univ. Edinburgh; Florida State Univ.; Univ. Ottawa; Burnham Institute for Medical Research; New Mexico State Univ., Las Cruces; Peabody Museum of Natural History, Yale Univ.; Univ. Texas at Dallas; Univ. Strasbourg, France)

Associated Products:

Comparative Data Analysis Ontology (CDAO), an OWL-DL ontology for evolutionary comparative analysis(www.evolutionaryontology.org/). (Prosdocimi, F., B. Chisham, E. Pontelli, J.D. Thompson, and A. Stoltzfusm 2008).

nexml: an XML data exchange format for phylogenetic analysis (www.nexml.org) (Vos, R. 2007).

Supporting NEXUS. A guide for developers to implement and to improve support for the NEXUS file format(https://www.nescent.org/wg_phyloinformatics/Supporting_NEXUS) (Stoltzfus, A., R. Vos, M. Holder, H. Lapp, S. Kosakovsky Pond, and C. Zmasek. 2007).

MIAPA: Developing a minimal reporting standard for phylogenetics(Stoltzfus, 2008), a white paper written for the National Evolutionary Synthesis Center, available at https://www.nescent.org/wg_evoinfo/MIAPA_WhitePaper.

Associated publications/reports

Prosdocimi, F., B. Chisham, E. Pontelli, J.D. Thompson, and A. Stoltzfus, Initial Implementation of a Comparative Data Analysis Ontology. BMC Evol Biol, in review.

Lapp, H., S. Bala, J.P. Balhoff, A. Bouck, N. Goto, M. Holder, R. Hollan, A. Holloway, T. Katayama, P.O. Lewis, A. Mackey, B.I. Osborne, W.H. Piel, S.L. Kosakovsky Pond, A. Poon, W.G. Qiu, J.E. Stajich, A. Stoltzfus, T. Thierer, A.J. Vilella, R. Vos, C.M. Zmasek, D. Zwickl, and T.J. Vision, The 2006 NESCent Phyloinformatics Hackathon: A field report. Evolutionary Bioinformatics, 2007. 3: p. 357-366.

Hladish, T., V. Gopalan, C. Liang, W. Qiu, P. Yang, and A. Stoltzfus, Bio::NEXUS: a Perl API for the NEXUS format for comparative biological data. BMC Bioinformatics, 2007. 8: p. 191.

Contact

Dr. Arlin Stolzfus
arlin.stoltzfus@nist.gov
240-314-6208