Measuring and Controlling Text Generation by Semantic Search

Jieh-Sheng Lee*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Our motivation in this work is to measure patent text generation by semantic search, particularly by textual similarity in high dimensional space for neural network models. The objective is to control patent text generation by semantic search. Conceptually it is an attempt to integrate two subfields in NLP: text generation and semantic search. In our previous milestone of the PatentTransformer project, a prototype based on GPT-2 is capable of generating fluent patent title, abstract, independent claim, and dependent claim. However, beneath the surface form, the quality issue in the generated patent text was less explored. How to control text generation is also a hard problem in NLP field. We would like to address these issues in this work and experiment with different approaches. On the measurement side, this work will address the quality measurement issue from the perspective of textual similarity. Based on that, the approaches we propose include two embedding spaces, span-based textual similarity, and language model for patent claim spans. One the control side, we propose a knob-turning approach for controlling text generation based on measuring a range of textual similarity. In this way, we can search for a Goldilocks zone in which the similarity of generated patent text is close to but not too far from prior patents. We hypothesize that patent novelty may exist in such a zone.

Original languageEnglish
Title of host publicationThe Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020
PublisherAssociation for Computing Machinery
Pages269-273
Number of pages5
ISBN (Electronic)9781450370240
DOIs
StatePublished - 20 Apr 2020
Event29th International World Wide Web Conference, WWW 2020 - Taipei, Taiwan
Duration: 20 Apr 202024 Apr 2020

Publication series

NameThe Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020

Conference

Conference29th International World Wide Web Conference, WWW 2020
Country/TerritoryTaiwan
CityTaipei
Period20/04/2024/04/20

Keywords

  • GPT-2
  • natural language generation
  • natural language processing
  • patent
  • semantic search
  • textual similarity

Fingerprint

Dive into the research topics of 'Measuring and Controlling Text Generation by Semantic Search'. Together they form a unique fingerprint.

Cite this