Project research questions
Project research challenges
Data Quality
To have confidence in the knowledge gleaned from data, it is essential to verify the quality of the data sources and to have a means of quantifying the potential uncertainties due to the quality of data for making critical, intelligent decisions. In computational biology, two areas where data quality is significant are quality of measured images and quality of the corresponding reference data (e.g., manual segmentations or cell colony labels).
The objective of the data quality component in the CS-Bio-Met project is to collect and develop a repository of image quality descriptors and analyze their sensitivity in improving data quality upstream (microscope conditions, sample preparation) and downstream (recommending segmentation methods, predicting segmentation accuracy). Developing automated methods to assess image quality improves quality of the biological analysis. Automated image analysis ensures objectivity of the results. However the quality of the image directly affects the accuracy of the image analysis (segmentation) and has been proven to impact the accuracy of the research findings such as drug effectiveness and optimal dosage. This is true especially in applications such as the High Content Screening (HCS). HCS is an automated microscopy technique enabling the evaluation of spatial and temporal effects on cells for drug discovery and other applications. In addition to the quality of measured images, inconsistent reference labels that correspond to the measured images can degrade the ability to derive biologically meaningful classes or clusters. Inconsistent labels imply that the experts did not agree when determining the reference label, and hence, it is difficult to create rules for classification or clustering. The project aims at understanding the impact of reference data quality on clustering and classification uncertainty.