‘Garbage In, Garbage Out’ Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data?
Published in Quantitative Science Studies, 2021
Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. We report to what extent a random sample of ML application papers across disciplines give specific details about whether best practices were followed in labeling training data. Read more
