GeoDeepShovel: A platform for building scientific database from geoscience literature with AI assistance
Abstract
With the rapid development of big data science, the research paradigm in the field of geosciences has also begun to shift to big data-driven scientific discovery. Researchers need to read a huge amount of literature to locate, extract and aggregate relevant results and data that are published and stored in PDF format for building a scientific database to support the big data-driven discovery. In this paper, based on the findings of a study about how geoscientists annotate literature and extract and aggregate data, we proposed GeoDeepShovel, a publicly available AI-assisted data extraction system to support their needs. GeoDeepShovel leverages state-of-the-art neural network models to support researcher(s) easily and accurately annotate papers (in the PDF format) and extract data from tables, figures, maps, etc., in a human–AI collaboration manner. As a part of the Deep-Time Digital Earth (DDE) program, GeoDeepShovel has been deployed for 8 months, and there are already 400 users from 44 geoscience research teams within the DDE program using it to construct scientific databases on a daily basis, and more than 240 projects and 50,000 documents have been processed for building scientific databases.