Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

ParquetDB: A Lightweight Database System Leveraging Apache Parquet for Efficient Data Storage and Retrieval

Published

Author(s)

Logan Lang, Eduardo Hernandez, Kamal Choudhary, Aldo Romero

Abstract

Traditional data storage formats and databases often introduce complexities and inefficiencies that hinder rapid iteration and adaptability. To address these challenges, we introduce ParquetDB, a Python-based database framework that leverages the Parquet file format's optimized columnar storage. ParquetDB offers efficient serialization and deserialization, native support for complex and nested data types, reduced dependency on indexing through predicate pushdown filtering, and enhanced portability due to its file-based storage system. Comprehensive benchmarks demonstrate that ParquetDB outperforms traditional databases like SQLite and MongoDB in managing large volumes of data, especially when using data formats compatible with PyArrow. We validate ParquetDB's practical utility by applying it to the Alexandria 3D Materials Database, efficiently handling approximately 4.8 million complex and nested records. By addressing the inherent limitations of existing data storage systems and continuously evolving to meet future demands, ParquetDB has the potential to significantly streamline data management processes and accelerate research development in data-driven fields.
Citation
npj Computational Materials

Citation

Lang, L. , Hernandez, E. , Choudhary, K. and Romero, A. (2025), ParquetDB: A Lightweight Database System Leveraging Apache Parquet for Efficient Data Storage and Retrieval, npj Computational Materials (Accessed April 12, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact reflib@nist.gov.

Created February 7, 2025, Updated March 5, 2025