建立数据挖掘和机器学习数据集的语义库Towards a semantic repository of data mining and machine learning datasets |
|
课程网址: | http://videolectures.net/sikdd2018_kostovska_machine_learning_dat... |
主讲教师: | Ana Kostovska |
开课单位: | Jožef Stefan研究所知识技术部 |
开课时间: | 2018-11-23 |
课程语种: | 英语 |
中文简介: | 随着我们生活中各个领域的数据呈指数级增长,越来越需要开发有效数据管理的新方法。也就是说,在数据挖掘(DM)和数据库中的知识发现(KDD)领域,科学家通常会投入大量时间和资源来收集已经获得的数据。在此背景下,通过发布开放和FAIR(可查找、可访问、可互操作、可重用)数据,研究人员可以重用以前收集、预处理和存储的数据。受此启发,我们对当前用于机器学习(ML)和数据挖掘领域数据集注释、存储和查询的方法、数据仓库和语义技术进行了广泛的审查。最后,我们确定了现有数据集存储库的局限性,并提出了一个语义数据存储库的设计方案,该方案遵循数据管理和管理的FAIR原则。 |
课程简介: | With the exponential growth of data in all areas of our lives, there is an increasing need of developing new approaches for effective data management. Namely, in the field of Data Mining (DM) and Knowledge Discovery in Databases (KDD), scientists often invest a lot of time and resources for collec- ting data that has already been acquired. In that context, by publishing open and FAIR (Findable, Accessible, Interoperable, Reusable) data, researchers could reuse data that was previously collected, preprocessed and stored. Motivated by this, we conducted extensive review on current approaches, data repositories and semantic technologies used for annotation, storage and querying of datasets for the domain of machine learning (ML) and data mining. Finally, we identify the limitations of the existing repositories of datasets and propose a design of a semantic data repository that adheres to FAIR principles for data management and stewardship. |
关 键 词: | 数据呈指数级增长; 数据库中的知识发现; 开发有效数据管理; 数据仓库和语义技术 |
课程来源: | 视频讲座网 |
数据采集: | 2022-12-28:cyh |
最后编审: | 2023-05-15:cyh |
阅读次数: | 14 |