Theme

COVID-19 Preprints and Their Publishing Rate: An Improved Method

Resource type
Report
Author/contributor
Title
COVID-19 Preprints and Their Publishing Rate: An Improved Method
Abstract
Context As the COVID-19 pandemic persists around the world, the scientific community continues to produce and circulate knowledge on the deadly disease at an unprecedented rate. During the early stage of the pandemic, preprints represented nearly 40% of all English-language COVID-19 scientific corpus (6, 000+ preprints | 16, 000+ articles). As of mid-August 2020, that proportion dropped to around 28% (13, 000+ preprints | 49, 000+ articles). Nevertheless, preprint servers remain a key engine in the efficient dissemination of scientific work on this infectious disease. But, giving the ‘uncertified’ nature of the scientific manuscripts curated on preprint repositories, their integration to the global ecosystem of scientific communication is not without creating serious tensions. This is especially the case for biomedical knowledge since the dissemination of bad science can have widespread societal consequences. Scope In this paper, I propose a robust method that allows the repeated monitoring and measuring of COVID-19 preprints’ publication rate. I also introduce a new API called Upload-or-Publish. It is a free micro-API service that enables a client to query a specific preprint manuscript’s publication status and associated meta-data using a unique ID. The beta-version is currently working and deployed. Data I use Covid-19 Open Research Dataset (CORD-19) to calculate COVID-19 preprint corpus’ conversion rate to peer-reviewed articles. CORD-19 dataset includes 10,454 preprints from arXiv, bioRxiv, and medRxiv. Methods I utilize conditional fuzzy logic to link preprints with their published counterparts. My approach is an important departure from previous studies that rely exclusively on bio/medRxiv API to ascertain preprints’ publication status. This is problematic since the level of false negatives in bio/medRxiv non-COVID-19 metadata could be as high as 37%. My analysis suggests bio/medRxiv API accurately captures about only 50% of its published preprints. My improved method clocked an F1-score of 0.96. Findings My analysis reveals that 19.6% (n=2048) of COVID-19 preprint manuscripts in the CORD-19 dataset uploaded on arXiv, bioRxiv, and medRxiv between January and early September 2020 were published in peer-reviewed venues. When compared to the most recent measure available, this represents a two-fold increase in a period of two months. My discussion review and theorize on the potential explanations for COVID-19 preprints’ overall low conversion rate.
Date
2020-10-10
Pages
2020.09.04.20188771
Language
en
Short Title
COVID-19 Preprints and Their Publishing Rate
Accessed
05/10/2021, 18:36
Library Catalogue
medRxiv
Rights
© 2020, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at http://creativecommons.org/licenses/by-nd/4.0/
Extra
Company: Cold Spring Harbor Laboratory Press DOI: 10.1101/2020.09.04.20188771 Distributor: Cold Spring Harbor Laboratory Press Label: Cold Spring Harbor Laboratory Press Type: article
Citation
Lachapelle, F. (2020). COVID-19 Preprints and Their Publishing Rate: An Improved Method (p. 2020.09.04.20188771). https://doi.org/10.1101/2020.09.04.20188771