The group’s work falls into three main categories—(1) Infrastructure or Platforms for Science-Oriented Analytics, (2) Advanced Computational Techniques and Algorithms, and (3) Foundational Capabilities—that contribute to developing computationally enabled measurements with trust in computing and handling of high-throughput instruments built-in by design.
Infrastructure or Platforms for Science-Oriented Analytics
- CDCS (Customizable Data Curation System) focuses on registries, curating document-oriented data, and providing persistent IDs. It provides web interfaces for retrieving and querying data, text-oriented search. It is being augmented with advanced tools, inspired by Natural Language Processing (NLP), to provide semantic search.
- WIPP (Web Image Processing Pipeline) focuses on image analytics over terabyte-sized image collections running on distributed computational hardware (clusters and clouds). It provides web interfaces for managing and viewing images or subsets of images, and for the traceable and reproducible processing of images via workflows of software containers from a WIPP registry.
Advanced Computational Techniques and Algorithms
- Image Analytics—the group is developing approaches that combine approaches from conventional feature engineering and ones rooted in Artificial Intelligence/Deep Learning (AI/DL) for analyzing a variety of image types: optical microscopy, electron microscopy, Cryo-EM images (Cryogenic Electron Microscopy), neutron images, etc. Several of these image types go beyond 3D by adding a time dimension (T) or multiple channels (C) and can be very large (approaching 1 TB).
- Text Analytics—the group is applying Natural Language Processing (NLP) techniques and language models (e.g., BERT) to analyze curated scientific publications and answer more sophisticated queries than traditional Informational Retrieval (IR) systems. The group is also a participant in identifying a subdomain of NLP, Technical Language Processing (TLP), which aims to tackle text-related problems in technical domains with limited data availability (e.g., maintenance logs).
- Algorithmic Acceleration—the group is continuing its development of specialized algorithms with reduced operation counts in areas ranging from Monte Carlo sampling for Molecular Dynamics, mixed or reduced precision computations, and stochastic algorithms.
Foundational Capabilities
- Artificial Intelligence/Deep Learning for Imaging and NLP—most group members have become quite proficient in the use of DL tools. The group now uses DL approaches as a foundational building block to solve problems in multiple domains: imaging across multiple modalities, text, specialized signal processing, and computer security (trojan detection). Furthermore, the group is collaborating with groups in several NIST OUs (EL, MML, NCNR) to apply AI/DL techniques to OU-specific problems, has made code available to NIST researchers for automating training on AI-oriented hardware resources at NIST, and has given AI-related presentations to NIST researchers.
The group is administering a public competition to detect Trojans (hidden classes) in AI Deep Learning models (Neural Networks) on behalf of IARPA.
- Scalability & Performance—the group continues to extend its work on Program Execution Models for obtaining performance in a range of applications. This work identified Data Flow Graphs as a promising execution model that makes it easy to take advantage of accelerators (e.g., GPUs). The group released Hedgehog, a library and runtime system for implementing Multi-threaded Asynchronous Data Flow Graphs on high-end single compute nodes, along with FastLoader, a companion library for the multithreaded asynchronous reading of large objects from files (e.g., very large images), to simplifying the development of performance-oriented applications. We have used this execution model to develop performance-oriented applications (e.g., analysis of very large microscopy images [100K x 50K pixels]).
At a conceptual level, the group is cooperating with a University of Utah research team, led by Prof. Martin Berzins, to extend this programming model beyond a single compute node, so it applies to a cluster. The group is also exploring Ray Tracing as a programming model to accelerate and simplify particle transport simulation, which are of interest to multiple NIST OUs.
- Trustworthy Computing—the group is developing or extending approaches to enhance trust in computing in three areas: (a) numerical reproducibility—by associating a numerical uncertainty with a computed result; (b) explainable AI in OMICS Problems—by combining simulations of neural networks, interactive visualizations for sequencing data, and designs of multiple metrics of AI models using perturbations; (c) reproducible image analysis—by organizing imaging computations as reproducible workflows using containers and tracking data & result provenance.