0


Collecting aligned textual corpora from the Hidden Web

Collecting aligned textual corpora from the Hidden Web
课程网址: http://videolectures.net/w3cworkshop2011_pajntar_corpora/  
主讲教师: Boštjan Pajntar
开课单位: 约瑟夫·斯特凡学院
开课时间: 2011-04-22
课程语种: 英语
中文简介:
随着基于Web的内容的不断增长,大量文本集合变得可用。许多(如果不是大多数)非专业的非英语网站提供翻译的网页到英语和其他客户和合作伙伴的语言。这通常是专业翻译,而且很丰富。我们称之为隐藏网络。我们打算提出利用这种对齐的文本语料库的可能性,问题和最佳实践。这样的数据然后可以有效地用作翻译记忆库,例如作为人类翻译者的帮助或者用作机器翻译算法的训练数据。
课程简介: With the constant growth of web based content large collections of textual become available. Many if not most professional non-English web sites offer translated webpages to English and other languages of their clients and partners. This are usually professional translation and are abundant. We call this Hidden Web. We intend to present possibilities, problems and best practices for harnessing such aligned textual corpora. Such data can then be efficiently used as a translation memory for example as help for a human translators or as training data for machine translation algorithms.
关 键 词: 翻译; 隐藏网页; 计算机科学
课程来源: 视频讲座网
最后编审: 2020-07-23:yumf
阅读次数: 44