High-end automated microscopes can acquire very large volumes of data, reaching into terabyte territory. These data volumes impose their own constraints. They necessitate the reuse of image collections from expensive experiments as well as the use of complex analytical tools running on high-performance computing (HPC) resources to minimize the waiting times for image processing and enable data-driven discoveries. There is a need to assist imaging scientists with computational solutions that make (a) the use of HPC resources easy, (b) large image collections searchable and accessible, (c) data processing tools reusable and interoperable so that complex workflows of processing steps can be created.
To address these needs, we designed a web image processing pipeline (WIPP) client-server system for constructing and executing computational workflows, monitoring executions and gathering provenance information, and delivering traceable results of computations. In addition, we have introduced interoperable interfaces for containerized image processing tools in order to chain them into workflows and registering them as plugins in the WIPP system. We also deployed a registry of plugins so that users can perform a federated search over all registries to find plugins (processing tools) of interest.
The goal of this project is to lower the bar for users and algorithm developers to execute and share image analyses over terabyte-sized image collections in high-throughput and high content microscopy imaging applications.
WIPP is designed as an expandable platform leveraging containerization and orchestration technologies such as Docker and Kubernetes and is meant to be deployed on a Kubernetes cluster (see Installation instructions).
The core of the WIPP framework consists of:
Additional tools can be added to the WIPP core for a more complete system:
The main accomplishments include:
Description:
Software pointers:
Source code:
WIPP plugin registry: