在复杂数据分析中提高实验的再现性和研究成果的可重用性Improving the reproducibility of experiments and reusability of research outputs in complex data analysis |
|
课程网址: | http://videolectures.net/icgeb_panov_complex_data_analysis/ |
主讲教师: | Panče Panov |
开课单位: | Joíef Stefan学院知识技术部 |
开课时间: | 2019-06-28 |
课程语种: | 英语 |
中文简介: | 传统监督方法的预测性能在很大程度上取决于标记数据的数量。然而,在许多实际任务中,获取标签是一个困难的过程,包括化合物筛选、生物标记物发现等。通常只有少量标记数据可用于模型学习。作为对这个问题的回答,半监督学习的概念应运而生。半监督方法除了使用标记数据外,还使用未标记数据来提高监督方法的性能。对于结构化输出的数据挖掘问题,获取带标签的数据更加困难,因为需要为每个示例确定多个标签。多目标预测(MTP)是一种结构化输出预测问题,需要同时预测多个变量。尽管表面上需要能够处理MTP的半监督方法,但只有少数此类方法可用,甚至那些在实践中难以使用的方法和/或其相对于监督方法的MTP优势尚不明确。我们将提出一种从有限数量的标记数据中学习预测模型的算法,该算法可以利用可用的未标记数据,以获得具有更好预测性能的模型。我们还将展示一些基准实验,以评估其预测性能。最后,我们将说明并讨论它们在高内容屏幕分析中的用途。 |
课程简介: | The advances in science are heavily based on the premise of the concept of a trusted discovery, provided that the performed research is done correctly, and reproducible by other scientists. In order to increase the reusability of research outputs, such as developed models and produced data, they should be Findable, Accessible, Interoperable and Reusable (FAIR principles). The main point of the FAIR is to ensure that research outputs are reusable and will actually be used by others, thus becoming more valuable. The research outputs that wish to fulfil the FAIR principles must be represented with a wide accepted machine-readable framework. Currently, a popular solution to data sharing that fulfils the FAIR requirements is the use of semantic web technologies and ontologies. Complex data analysis methods, originating from machine learning and data mining, are increasingly being used in applications from various domains of science (e.g., life sciences, space research, etc). In order to provide reproducibility of experiments (e.g., executions of methods) and reuse of research outputs (e.g., predictive models), one needs to formally describe the entities involved in the process of analysis, and store them together with their descriptions (e.g., metadata) as a digital objects in a database like structure. Having a “semantically aware” stores of entities for complex data analytics enhanced with automatic reasoning capabilities would be beneficial for improving the reproducibility of experiments and reuse of research outputs. In this way, we would move closer to a FAIR data analysis process. In this talk, I will show and discuss the recent advances in the domain that are aimed towards improving the reproducibility of experiments and reusability of research outputs in complex data analysis. |
关 键 词: | 复杂数据分析; 提高实验的再现性; MTP的半监督方法; 模型学习; 化合物筛选 |
课程来源: | 视频讲座网 |
数据采集: | 2022-10-14:cyh |
最后编审: | 2022-10-14:cyh |
阅读次数: | 35 |