Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

An Infrastructure for Curating, Querying, and Augmenting Document Data: COVID-19 Case Study

Published

Author(s)

Eswaran Subrahmanian, Guillaume Sousa Amaral, Talapady N. Bhat, Mary C. Brady, Kevin G. Brady, Jacob Collard, Sarra Chouder, Philippe Dessauw, Alden A. Dima, John T. Elliott, Walid Keyrouz, Nicolas Lelouche, Benjamin Long, Rachael Sexton, Ram D. Sriram

Abstract

With the advent of the COVID-19 pandemic, there was the hope that data science approaches could help discover means for understanding, mitigating, and treating the disease. This manifested itself in the creation of the COVID-19 Open Research Dataset (CORD-19) which aggregated COVID-19- related scientific literature for use by the data mining community. As a group of interdisciplinary informatics researchers at NIST, we embarked on an effort to use our experience and previously developed systems to explore whether we could enhance the CORD-19 data set and facilitate its use. This effort produced a prototype scientific informatics system that extended data curation, data repository, resource registry, term extraction and indexing systems and resulted in a repackaging of CORD-19 as a Python data package. This paper documents our efforts, provides lessons learned, and proposes a general architecture for these types of systems.
Citation
NIST Interagency/Internal Report (NISTIR) - 8479
Report Number
8479

Keywords

CORD-19, COVID-19, curation, infrastructure, informatics, search.

Citation

Subrahmanian, E. , Sousa Amaral, G. , Bhat, T. , Brady, M. , Brady, K. , Collard, J. , Chouder, S. , Dessauw, P. , Dima, A. , Elliott, J. , Keyrouz, W. , Lelouche, N. , Long, B. , Sexton, R. and Sriram, R. (2023), An Infrastructure for Curating, Querying, and Augmenting Document Data: COVID-19 Case Study, NIST Interagency/Internal Report (NISTIR), National Institute of Standards and Technology, Gaithersburg, MD, [online], https://doi.org/10.6028/NIST.IR.8479, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=936892 (Accessed November 20, 2024)

Issues

If you have any questions about this publication or are having problems accessing it, please contact reflib@nist.gov.

Created August 8, 2023