TY - GEN
T1 - Controlling Patent Text Generation by Structural Metadata
AU - Lee, Jieh-Sheng
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/10/19
Y1 - 2020/10/19
N2 - The ultimate goal of my long-term project is "Augmented Inventing." This work is a follow-up effort toward the goal. It leverages the structural metadata in patent documents and the text-to-text mappings between metadata. The structural metadata includes patent title, abstract, independent claim, and dependent claim. By using the structural metadata, it is possible to control what kind of patent text to generate. By using the text-to-text mapping, it is possible to let a generative model generate one type of patent text from another type of patent text. Furthermore, through multiple mappings, it is possible to build a text generation flow, for example, generating from a few words to a patent title, from the title to an abstract, from the abstract to an independent claim, and from the independent claim to multiple dependent claims. The text generation flow can also go backward after training with bi-directional mappings. In addition to those above, the contributions of this work include: (1) released four generative models trained with patent corpus from scratch, (2) released the sample code to demonstrate how to generate patent text bi-directionally, (3) measuring the performances of the models by ROGUE and Universal Sentence Encoder as preliminary evaluations of text generation quality.
AB - The ultimate goal of my long-term project is "Augmented Inventing." This work is a follow-up effort toward the goal. It leverages the structural metadata in patent documents and the text-to-text mappings between metadata. The structural metadata includes patent title, abstract, independent claim, and dependent claim. By using the structural metadata, it is possible to control what kind of patent text to generate. By using the text-to-text mapping, it is possible to let a generative model generate one type of patent text from another type of patent text. Furthermore, through multiple mappings, it is possible to build a text generation flow, for example, generating from a few words to a patent title, from the title to an abstract, from the abstract to an independent claim, and from the independent claim to multiple dependent claims. The text generation flow can also go backward after training with bi-directional mappings. In addition to those above, the contributions of this work include: (1) released four generative models trained with patent corpus from scratch, (2) released the sample code to demonstrate how to generate patent text bi-directionally, (3) measuring the performances of the models by ROGUE and Universal Sentence Encoder as preliminary evaluations of text generation quality.
KW - machine learning
KW - natural language generation
KW - natural language processing
KW - patent
UR - http://www.scopus.com/inward/record.url?scp=85095863640&partnerID=8YFLogxK
U2 - 10.1145/3340531.3418503
DO - 10.1145/3340531.3418503
M3 - Conference contribution
AN - SCOPUS:85095863640
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 3241
EP - 3244
BT - CIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 29th ACM International Conference on Information and Knowledge Management, CIKM 2020
Y2 - 19 October 2020 through 23 October 2020
ER -