Measuring and Controlling Text Generation by Semantic Search

Jieh-Sheng Lee*

*此作品的通信作者

研究成果: Conference contribution同行評審

1 引文 斯高帕斯(Scopus)

摘要

Our motivation in this work is to measure patent text generation by semantic search, particularly by textual similarity in high dimensional space for neural network models. The objective is to control patent text generation by semantic search. Conceptually it is an attempt to integrate two subfields in NLP: text generation and semantic search. In our previous milestone of the PatentTransformer project, a prototype based on GPT-2 is capable of generating fluent patent title, abstract, independent claim, and dependent claim. However, beneath the surface form, the quality issue in the generated patent text was less explored. How to control text generation is also a hard problem in NLP field. We would like to address these issues in this work and experiment with different approaches. On the measurement side, this work will address the quality measurement issue from the perspective of textual similarity. Based on that, the approaches we propose include two embedding spaces, span-based textual similarity, and language model for patent claim spans. One the control side, we propose a knob-turning approach for controlling text generation based on measuring a range of textual similarity. In this way, we can search for a Goldilocks zone in which the similarity of generated patent text is close to but not too far from prior patents. We hypothesize that patent novelty may exist in such a zone.

原文English
主出版物標題The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020
發行者Association for Computing Machinery
頁面269-273
頁數5
ISBN(電子)9781450370240
DOIs
出版狀態Published - 20 4月 2020
事件29th International World Wide Web Conference, WWW 2020 - Taipei, Taiwan
持續時間: 20 4月 202024 4月 2020

出版系列

名字The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020

Conference

Conference29th International World Wide Web Conference, WWW 2020
國家/地區Taiwan
城市Taipei
期間20/04/2024/04/20

指紋

深入研究「Measuring and Controlling Text Generation by Semantic Search」主題。共同形成了獨特的指紋。

引用此