Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Discovery and Recognition of Formula Conceptsusing Machine Learning

Published

Author(s)

Howard Cohl, Bela Gipp, Moritz Schubotz, Philipp Scharpf

Abstract

Citation-based Information Retrieval (IR) methods for scientific documents have proven effective for IR applications, such as Plagiarism Detection or Literature Recommender Systems in academic disciplines that use many references. In science, technology, engineering, and mathematics, researchers often employ mathematical concepts through formula notation to refer to prior knowledge. Our long-term goal is to generalize citation-based IR methods and apply this generalized method to both classical references and mathematical concepts. In this paper, we suggest how mathematical formulas could be cited and define a Formula Concept Retrieval task with two subtasks: Formula Concept Discovery (FCD) and Formula Concept Recognition (FCR). While FCD aims at the definition and exploration of a 'Formula Concept' that names bundled equivalent representations of a formula, FCR is designed to match a given formula to a prior assigned unique mathematical concept identifier. We present machine learning-based approaches to address the FCD and FCR tasks. We then evaluate these approaches on a standardized test collection (NTCIR arXiv dataset). Our FCD approach yields a precision of 68% for retrieving equivalent representations of frequent formulas and a recall of 72% for extracting the formula name from the surrounding text. FCD and FCR enable the citation of formulas within mathematical documents and facilitate semantic search and question answering, as well as document similarity assessments for plagiarism detection or recommender systems.
Citation
Scientometrics
Volume
128

Keywords

Mathematical Information Retrieval, Machine Learning, Wikidata

Citation

Cohl, H. , Gipp, B. , Schubotz, M. and Scharpf, P. (2023), Discovery and Recognition of Formula Conceptsusing Machine Learning, Scientometrics, [online], https://doi.org/10.1007/s11192-023-04667-9 (Accessed October 31, 2024)

Issues

If you have any questions about this publication or are having problems accessing it, please contact reflib@nist.gov.

Created July 13, 2023, Updated March 27, 2024