Fusing domain-specific data with general data for in-domain applications

An Zi Yen, Hen Hsen Huang, Hsin Hsi Chen

研究成果: Conference contribution同行評審

1 引文 斯高帕斯(Scopus)

摘要

This paper analyzes the lexical semantics of domain-specific terms based on various pre-Trained specific domain and general domain word vectors, and addresses the semantic drift between domains. To capture lexical semantics in the specific domain, we propose a bridge mechanism to introduce domain-specific data into general data, and re-Train word vectors. We find that even a small-scale fusion can result in the similar lexical semantics learned by using the large-scale domain-specific dataset. Experiments on sentiment analysis and outlier detection show that application of word embedding by the fusion dataset has the better performance than applications of word embeddings by pure large domain-specific and pure large general datasets. The simple, but effective methodology facilitates the domain adaptation of distributed word representations.

原文English
主出版物標題Proceedings - 2017 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017
發行者Association for Computing Machinery, Inc
頁面566-572
頁數7
ISBN(電子)9781450349512
DOIs
出版狀態Published - 23 8月 2017
事件16th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017 - Leipzig, 德國
持續時間: 23 8月 201726 8月 2017

出版系列

名字Proceedings - 2017 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017

Conference

Conference16th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017
國家/地區德國
城市Leipzig
期間23/08/1726/08/17

指紋

深入研究「Fusing domain-specific data with general data for in-domain applications」主題。共同形成了獨特的指紋。

引用此