Collecting aligned textual corpora from the Hidden Web

Collecting aligned textual corpora from the Hidden Web
课程网址: http://videolectures.net/w3cworkshop2011_pajntar_corpora/  
主讲教师: Boštjan Pajntar
开课单位: 约瑟夫·斯特凡学院
开课时间: 2011-04-22
课程语种: 英语
课程简介: With the constant growth of web based content large collections of textual become available. Many if not most professional non-English web sites offer translated webpages to English and other languages of their clients and partners. This are usually professional translation and are abundant. We call this Hidden Web. We intend to present possibilities, problems and best practices for harnessing such aligned textual corpora. Such data can then be efficiently used as a translation memory for example as help for a human translators or as training data for machine translation algorithms.
关 键 词: 翻译; 隐藏网页; 计算机科学
课程来源: 视频讲座网
最后编审: 2020-07-23:yumf
阅读次数: 44