Controlling Patent Text Generation by Structural Metadata

Jieh-Sheng Lee*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

The ultimate goal of my long-term project is "Augmented Inventing." This work is a follow-up effort toward the goal. It leverages the structural metadata in patent documents and the text-to-text mappings between metadata. The structural metadata includes patent title, abstract, independent claim, and dependent claim. By using the structural metadata, it is possible to control what kind of patent text to generate. By using the text-to-text mapping, it is possible to let a generative model generate one type of patent text from another type of patent text. Furthermore, through multiple mappings, it is possible to build a text generation flow, for example, generating from a few words to a patent title, from the title to an abstract, from the abstract to an independent claim, and from the independent claim to multiple dependent claims. The text generation flow can also go backward after training with bi-directional mappings. In addition to those above, the contributions of this work include: (1) released four generative models trained with patent corpus from scratch, (2) released the sample code to demonstrate how to generate patent text bi-directionally, (3) measuring the performances of the models by ROGUE and Universal Sentence Encoder as preliminary evaluations of text generation quality.

Original languageEnglish
Title of host publicationCIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages3241-3244
Number of pages4
ISBN (Electronic)9781450368599
DOIs
StatePublished - 19 Oct 2020
Event29th ACM International Conference on Information and Knowledge Management, CIKM 2020 - Virtual, Online, Ireland
Duration: 19 Oct 202023 Oct 2020

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference29th ACM International Conference on Information and Knowledge Management, CIKM 2020
Country/TerritoryIreland
CityVirtual, Online
Period19/10/2023/10/20

Keywords

  • machine learning
  • natural language generation
  • natural language processing
  • patent

Fingerprint

Dive into the research topics of 'Controlling Patent Text Generation by Structural Metadata'. Together they form a unique fingerprint.

Cite this