TY - JOUR
T1 - Patent classification by fine-tuning BERT language model
AU - Lee, Jieh-Sheng
AU - Hsiang, Jieh
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/6
Y1 - 2020/6
N2 - In this work we focus on fine-tuning a pre-trained BERT model and applying it to patent classification. When applied to large datasets of over two million patents, our approach outperforms the state of the art by an approach using CNN with word embeddings. Besides, we focus on patent claims without other parts in patent documents. Our contributions include: (1) a new state-of-the-art result based on pre-trained BERT model and fine-tuning for patent classification, (2) a large dataset USPTO-3M at the CPC subclass level with SQL statements that can be used by future researchers, (3) showing that patent claims alone are sufficient to achieve state-of-the-art results for classification task, in contrast to conventional wisdom.
AB - In this work we focus on fine-tuning a pre-trained BERT model and applying it to patent classification. When applied to large datasets of over two million patents, our approach outperforms the state of the art by an approach using CNN with word embeddings. Besides, we focus on patent claims without other parts in patent documents. Our contributions include: (1) a new state-of-the-art result based on pre-trained BERT model and fine-tuning for patent classification, (2) a large dataset USPTO-3M at the CPC subclass level with SQL statements that can be used by future researchers, (3) showing that patent claims alone are sufficient to achieve state-of-the-art results for classification task, in contrast to conventional wisdom.
UR - http://www.scopus.com/inward/record.url?scp=85085650222&partnerID=8YFLogxK
U2 - 10.1016/j.wpi.2020.101965
DO - 10.1016/j.wpi.2020.101965
M3 - Article
AN - SCOPUS:85085650222
SN - 0172-2190
VL - 61
SP - 1
EP - 4
JO - World Patent Information
JF - World Patent Information
M1 - 101965
ER -