0


基于非凸多实例学习的人类蛋白质编码亚型功能注释

Functional Annotation of Human Protein Coding Isoforms via Non­convex Multi­Instance Learning
课程网址: http://videolectures.net/kdd2017_luo_human_protein_coding/  
主讲教师: Tingjin Luo
开课单位: 中国国防科技大学
开课时间: 2017-10-09
课程语种: 英语
中文简介:

人类基因的功能注释对于理解各种遗传疾病的分子基础至关重要。确定人类基因功能的主要挑战在于蛋白质的功能多样性,即一个基因可以执行不同的功能,因为它可能由多种蛋白质编码同工型(PCIs)组成。因此,区分PCI的功能可以大大加深我们对基因功能的了解。然而,由于缺乏同工型金标准(地面真相注释),许多现有的功能注释方法都在基因水平上开发出来。在本文中,我们提出了一种通过将稀疏单纯形投影(即非凸稀疏诱导正则化器)与多实例学习(MIL)框架集成来区分PCI功能的新颖方法。具体来说,我们将注释中所考虑功能的基因标记为“ emph {阳性袋}”,将没有功能的基因标记为“ emph {阴性袋}”。然后,通过将稀疏投影投影到单纯形上,我们学习了将原始袋子空间嵌入到区分特征空间的映射。我们的框架很灵活,可以合并各种平滑和非平滑损失函数,例如逻辑损失和铰链损失。为了解决由此产生的高度非平凡的非凸和非平滑优化问题,我们进一步开发了一种有效的块坐标体面算法。对人类基因组数据进行的大量实验表明,从人类PCI的功能注释准确性和效率来看,该方法明显优于现有方法。

课程简介: Functional annotation of human genes is fundamentally important for understanding the molecular basis of various genetic diseases. A major challenge in determining the functions of human genes lies in the functional diversity of proteins, that is, a gene can perform different functions as it may consist of multiple protein coding isoforms (PCIs). Therefore, differentiating functions of PCIs can significantly deepen our understanding of the functions of genes. However, due to the lack of isoform-level gold-standards (ground-truth annotation), many existing functional annotation approaches are developed at gene-level. In this paper, we propose a novel approach to differentiate the functions of PCIs by integrating sparse simplex projection—-that is, a nonconvex sparsity-inducing regularizer—-with the framework of multi-instance learning (MIL). Specifically, we label the genes that are annotated to the function under consideration as emph{positive bags} and the genes without the function as emph{negative bags}. Then, by sparse projections onto simplex, we learn a mapping that embeds the original bag space to a discriminative feature space. Our framework is flexible to incorporate various smooth and nonsmooth loss functions such as logistic loss and hinge loss. To solve the resulting highly nontrivial non-convex and nonsmooth optimization problem, we further develop an efficient block coordinate decent algorithm. Extensive experiments on human genome data demonstrate that the proposed approaches significantly outperform the state-of-the-art methods in terms of functional annotation accuracy of human PCIs and efficiency.
关 键 词: 基因; 遗传疾病; 蛋白质编码
课程来源: 视频讲座网
数据采集: 2020-05-07:zhouxj
最后编审: 2020-05-25:cxin
阅读次数: 55