TY - JOUR
T1 - Ensemble Pre-trained Transformer Models for Writing Style Change Detection
AU - Lin, Tzu Mi
AU - Chen, Chao Yi
AU - Tzeng, Yu Wen
AU - Lee, Lung Hao
N1 - Publisher Copyright:
© 2022 Copyright for this paper by its authors.
PY - 2022
Y1 - 2022
N2 - This paper describes a proposed system design for Style Change Detection (SCD) tasks for PAN at CLEF 2022. We propose a unified architecture of ensemble neural networks to solve three SCD-2022 edition tasks. We fine-tune the BERT, RoBERTa and ALBERT transformers and their connecting classifiers to measure the similarity of two given paragraphs or sentences for authorship analysis. Each transformer model is regarded as a standalone method to detect differences in the writing styles of each testing pair. The final output prediction is then combined using the majority voting ensemble mechanism. For SCD-2022 Task 1, which requires finding the only one position of a single style at the paragraph level, our approach achieves a macro F1-score of 0.7540. For SCD-2022 Task 2 to detect the actual authors of each written paragraph, our method achieves a macro F1-score of 0.5097, a Diarization error rate of 0.1941 and a Jaccard error rate of 0.3095. For SCD-2022 Task 3 to find located writing style changes at the sentence level, our model achieves a macro F1-score of 0.7156. In summary, our method is the winning approach in the list of all intrinsic approaches.
AB - This paper describes a proposed system design for Style Change Detection (SCD) tasks for PAN at CLEF 2022. We propose a unified architecture of ensemble neural networks to solve three SCD-2022 edition tasks. We fine-tune the BERT, RoBERTa and ALBERT transformers and their connecting classifiers to measure the similarity of two given paragraphs or sentences for authorship analysis. Each transformer model is regarded as a standalone method to detect differences in the writing styles of each testing pair. The final output prediction is then combined using the majority voting ensemble mechanism. For SCD-2022 Task 1, which requires finding the only one position of a single style at the paragraph level, our approach achieves a macro F1-score of 0.7540. For SCD-2022 Task 2 to detect the actual authors of each written paragraph, our method achieves a macro F1-score of 0.5097, a Diarization error rate of 0.1941 and a Jaccard error rate of 0.3095. For SCD-2022 Task 3 to find located writing style changes at the sentence level, our model achieves a macro F1-score of 0.7156. In summary, our method is the winning approach in the list of all intrinsic approaches.
KW - Authorship Analysis
KW - Ensemble Learning
KW - Plagiarism Detection
KW - Pre-trained Models
UR - http://www.scopus.com/inward/record.url?scp=85136915289&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85136915289
SN - 1613-0073
VL - 3180
SP - 2565
EP - 2573
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2022 Conference and Labs of the Evaluation Forum, CLEF 2022
Y2 - 5 September 2022 through 8 September 2022
ER -