Be careful using data from online repositories

For the discussion of catfish systematics. Post here to draw our attention to new publications or to discuss existing works.
Post Reply
User avatar
Silurus
Posts: 12378
Joined: 31 Dec 2002, 11:35
I've donated: $12.00!
My articles: 55
My images: 884
My catfish: 1
My cats species list: 90 (i:0, k:0)
Spotted: 419
Location 1: Singapore
Location 2: Moderator Emeritus

Be careful using data from online repositories

Post by Silurus »

Freitas, TMS, LFA Montag, P De Marco Jr & J Hortal, 2020. How reliable are species identifications in biodiversity big data? Evaluating the records of a neotropical fish family in online repositories, Systematics and Biodiversity, doi: 10.1080/14772000.2020.1730473

Abstract

The increase of free and open online biodiversity databases is of paramount importance for current research in ecology and evolution. However, little attention is paid to using updated taxonomy in these “biodiversity big data” repositories and the quality of their taxonomic information is often questioned. Here we assess how reliable is the current use of nomenclatural classification in the distributional information available from two biodiversity information networks: GBIF and the Brazilian SpeciesLink. We use as a study case the records of Auchenipteridae, a Neotropical fish family that has been subject to recent taxonomical reviews. A data filtering procedure was applied to identify and quantify the inaccuracies in the taxonomical status of the records in three steps: assessment of identification accuracy at the family, genus or species level; current validity of species name; and assignation of inaccurate species records to different categories of classification quality. Synonyms, nonexistent combinations, and outdated combinations were reassigned to currently valid species. A total of 9148 records of Auchenipteridae fishes were analyzed, of which 4165 were from GBIF and 4983 from SpeciesLink, deriving from 46 and 31 sources, respectively. After correcting all possible records following the taxonomic data filtering steps, 6988 records (76.4% of the original) were adequate for describing species distributions, while 2160 remained inaccurate. The most inaccurate records at the species level were due to the use of outdated nomenclatures, resulting in non-valid combinations of species and genus, and synonymy. Our results evidence a large taxonomic inconsistency among records, and, most importantly, that taxonomic information obtained from repositories should be used with caution. Many inaccuracy issues may be embedded in the biodiversity databases’ records, which could lead researchers to provide an incomplete or even mistaken perspective of the variations in the natural world.
Image
User avatar
bekateen
Posts: 8994
Joined: 09 Sep 2014, 17:50
I've donated: $40.00!
My articles: 4
My images: 130
My cats species list: 142 (i:102, k:39)
My aquaria list: 36 (i:13)
My BLogs: 44 (i:149, p:2671)
My Wishlist: 35
Spotted: 177
Location 1: USA, California, Stockton
Location 2: USA, California, Stockton
Contact:

Re: Be careful using data from online repositories

Post by bekateen »

Here's the link: https://www.tandfonline.com/doi/abs/10. ... 20.1730473

Good advice, always.
Freitas et al. (2020) wrote:A data filtering procedure was applied to identify and quantify the inaccuracies in the taxonomical status of the records in three steps: assessment of identification accuracy at the family, genus or species level; current validity of species name; and assignation of inaccurate species records to different categories of classification quality. Synonyms, nonexistent combinations, and outdated combinations were reassigned to currently valid species. A total of 9148 records of Auchenipteridae fishes were analyzed, of which 4165 were from GBIF and 4983 from SpeciesLink, deriving from 46 and 31 sources, respectively. After correcting all possible records following the taxonomic data filtering steps, 6988 records (76.4% of the original) were adequate for describing species distributions, while 2160 remained inaccurate. The most inaccurate records at the species level were due to the use of outdated nomenclatures, resulting in non-valid combinations of species and genus, and synonymy.
Since they were running "big data"-level analytics on "big data," the one thing I imagine they couldn't do systematically was validate whether the individual identifications were correct in the first place, irrespective of reclassifications, nomenclature changes, and the like... Things such as the not-uncommon-habit of assigning undescribed species in new locations to otherwise familiar and similar looking fish from distant locations, in spite of being in vastly different drainages, etc.

Cheers, Eric
Image
Find me on YouTube and Facebook: http://youtube.com/user/Bekateen1; https://www.facebook.com/Bekateen
Buying caves from https://plecocaves.com? Plecocaves sponsor Bekateen's Fishroom. Use coupon code "bekateen" (no quotes) for 15% off your order.
Post Reply

Return to “Taxonomy & Science News”