TY - JOUR
T1 - Team NYCU-NLP at PAN 2024
T2 - 25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2024
AU - Lin, Tzu Mi
AU - Wu, Yu Hsin
AU - Lee, Lung Hao
N1 - Publisher Copyright:
© 2024 Copyright for this paper by its authors.
PY - 2024
Y1 - 2024
N2 - This paper describes our NYCU-NLP system design for multi-author writing style analysis tasks of the PAN Lab at CLEF 2024. We propose a unified architecture integrating transformer-based models with similarity adjustments to identify author switches within a given multi-author document. We first fine-tune the RoBERTa, DeBERTa and ERNIE transformers to detect differences in writing style in two given paragraphs. The output prediction is then determined by the ensemble mechanism. We also use similarity adjustments to further enhance multi-author analysis performance. The experimental data contains three difficulty levels to reflect simultaneous changes of authorship and topic. Our submission achieved a macro F1-score of 0.964, 0.857 and 0.863 respectively for the easy, medium and hard levels, ranking first and second, respectively for hard and medium levels out of 16 and 17 participating teams.
AB - This paper describes our NYCU-NLP system design for multi-author writing style analysis tasks of the PAN Lab at CLEF 2024. We propose a unified architecture integrating transformer-based models with similarity adjustments to identify author switches within a given multi-author document. We first fine-tune the RoBERTa, DeBERTa and ERNIE transformers to detect differences in writing style in two given paragraphs. The output prediction is then determined by the ensemble mechanism. We also use similarity adjustments to further enhance multi-author analysis performance. The experimental data contains three difficulty levels to reflect simultaneous changes of authorship and topic. Our submission achieved a macro F1-score of 0.964, 0.857 and 0.863 respectively for the easy, medium and hard levels, ranking first and second, respectively for hard and medium levels out of 16 and 17 participating teams.
KW - Authorship Analysis
KW - Embedding Similarity
KW - Plagiarism Detection
KW - Pre-trained Language Models
UR - http://www.scopus.com/inward/record.url?scp=85201616891&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85201616891
SN - 1613-0073
VL - 3740
SP - 2716
EP - 2721
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 9 September 2024 through 12 September 2024
ER -