Abstract
Large-scale measurements linking genetic background to biological function have drivena need for models that can incorporate these data for reliable predictions and insightinto the underlying biophysical system. Recent modeling efforts, however, prioritize pre-dictive accuracy at the expense of model interpretability. Here, we present LANTERN(landscape interpretable nonparametric model,
https://github.com/usnistgov/lantern),a hierarchical Bayesian model that distills genotype–phenotype landscape (GPL) mea-surements into a low-dimensional feature space that represents the fundamental biolog-ical mechanisms of the system while also enabling straightforward, explainable predic-tions. Across a benchmark of large-scale datasets, LANTERN equals or outperformsall alternative approaches, including deep neural networks. LANTERN furthermoreextracts useful insights of the landscape, including its inherent dimensionality, a latentspace of additive mutational effects, and metrics of landscape structure. LANTERNfacilitates straightforward discovery of fundamental mechanisms in GPLs, while alsoreliably extrapolating to unexplored regions of genotypic space.