The Large Hadron Collider (LHC), the world's most powerful particle accelerator, is used to test theoretical predictions in particle physics. It generates an estimated 900 petabytes at more than 170 sites in the US and across the world. The human genome sequencing initiative is believed to have amassed about an exabyte of data in institutions, labs, and industry. And similar amounts of data exist for the genome sequencing of plants, animals, viruses, and bacteria. These and other large data science programs face unprecedented challenges in managing data with limited computing, storage, and network resources.
To help, university and NIST researchers designed and implemented a new, field-tested system that efficiently distributes, caches, and accesses data for the Large Hadron Collider’s high energy physics (HEP) experiments and other major science programs. The system is called Named Data Network (NDN) for Data Intensive Science Experiments (N-DISE) and is described in a paper published in the Proceedings of the 9th ACM Conference on Information-Centric Networking which took place in September 2022.
Initially, researchers developed a Named Data Networking scheme for naming the Large Hadron Collider data. They then built consumer and producer applications and implemented algorithms enabling fast forwarding of data packets and caching, which were integrated with the Named Data Networking router developed by NIST. The result was N-DISE, which researchers evaluated in a wide-area network testbed, spanning five sites across the U.S.
Researchers reported that tests showed an impressive performance for retrieval, forwarding, and caching of data, but believe that the system has more potential for performance gains. Researchers plan future enhancements, which will include integrating the N-DISE data distribution network with the Large Hadron Collider network and using it with a genomic use case.