Posts by Collection

articles

‘Garbage In, Garbage Out’ Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data?

Published in Quantitative Science Studies, 2021

Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. We report to what extent a random sample of ML application papers across disciplines give specific details about whether best practices were followed in labeling training data. Read more

Publications


PL-CVIO: Point-Line Cooperative Visual-Inertial Odometry
Y. Zhang, P. Zhu, and W. Ren
2023 IEEE Conference on Control Technology and Applications (CCTA) Read more

research

Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?

Published in ACM FAT* 2020, 2020

Many machine learning projects for new application areas involve teams of humans who label data for a particular purpose, from hiring crowdworkers to the paper’s authors labeling the data themselves. Such a task is quite similar to (or a form of) structured content analysis, which is a longstanding methodology in the social sciences and humanities, with many established best practices. In this paper, we investigate to what extent a sample of machine learning application papers in social computing — specifically papers from ArXiv and traditional publications performing an ML classification task on Twitter data — give specific details about whether such best practices were followed. Read more

teaching