Imprint Feedback

Article Details

Title	On a Comprehensive Metadata Framework for Artificial Data in Unsupervised Learning
Authors	Dangl, Rainer and Leisch, Friedrich
Year	2017
Volume	Archives of Data Science, Series A 2(1) / 2017
Abstract	Evaluating new methods and algorithms in unsupervised learning obviously requires thorough benchmarking studies on data sets that most closely reﬂect performance in actual usage. Designing data sets that do exactly that is quite a challenging task in itself; standing up to the challenge in comparison to other methods is another point which poses a risk of compromising the goal of an objective benchmarking study. We want to address the latter by proposing a framework that standardizes the format of artiﬁcial data, or rather its metadata. We intend to introduce a web repository that functions as an exchange for metadata of artiﬁcial data and an accompanying R package that can generate actual data from the descriptions obtained from the repository. It is therefore much simpler to ﬁnd data designed by others and which has been used in previous benchmarking studies. This removes some of the temptation to speciﬁcally design artiﬁcial data in a way so that a proposed method performs signiﬁcantly better than existing ones, a claim that might not hold in real life applications.