Initially the Comparative Mammalian Proteome Aggregator Resource (CoMPARe) Program will generate proteomic data from sera from 25 different species that currently have genome annotations. The resulting data will be publicly available and a data tool developed to humanize protein identifications between species to facilitate direct comparisons. Once the data has been acquired and humanized, work will be begin on a web portal to allow for easy species-species comparisons directed toward expert and non-expert end-users. This will create the foundation for the next phase that will generate data on 50 additional mammalian species as genome annotations become available. This project will be the template for future comparative proteomic projects evaluating plasma and specific tissues of interest.
State-of-the-art biomolecular analysis is no longer limited to model organisms and is becoming routine in non-model organisms. Major drivers of this emerging bioanalytical capacity include increasing accessibility and quality of sequenced genomes as well as high-resolution fast-duty cycle mass spectrometers for proteomic analysis. In recent years there has been a dramatic decrease in sequencing cost along with an increase in the number of published eukaryotic genomes. Moreover, there are ongoing projects to sequence over 9,000 species (G10K and Earth Biogenome Projects). Despite this, currently many organisms do not have genome annotations available. NIST will assist or lead development of high-quality genome assemblies and gene annotations with partners, industry and other agencies (such as our efforts related to the Atlantic bottlenose dolphin).
Using comparative proteomics to evaluate a large, diverse group of non-model organisms creates unique and exciting research questions, opportunities and downstream products. Developing high-quality proteomic data for each species requires quality samples, genomic databases, acquiring data on cutting-edge mass spectrometers, and managing the data into an easily accessible and usable product. To make these results broadly applicable, initially blood will be used. Blood is typically available due to regular health monitoring, making it a readily available and rich resource. Blood also has the advantage of being proximate to most tissues, while also being relatively stable when it comes to many of the major constituents. Further, blood protein constituents cannot be predicted by mRNA transcript abundance. Using modern proteomic analysis of non-depleted serum/plasma, it is possible with two hours of instrument time to identify and provide relative quantification on 100 to 500 proteins. In order to take advantage of emerging proteomics techniques (such as data-independent acquisition), which may not be suitable for non-model organisms, NIST will be working alongside software and algorithm developers to ensure that these platforms can be used beyond human data sets. In order to compile and compare data across species, data tools will be developed to enable comparisons of homologous proteins across species. These data tools and data sets will be made publicly available on ProteomeXchange and MassIVE, as well as a web portal to aid in retrieval and species-species or protein-protein comparisons. This tool will allow researchers and comparative medicine departments to determine suitability of a comparative model beyond the presence or absence of a specific gene and will allow consideration of phenotypic backgrounds to influence research choice.