TY - JOUR
T1 - NYCU-NLP at EXIST 2024
T2 - 25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2024
AU - Fang, Yi Zeng
AU - Lee, Lung Hao
AU - Huang, Juinn-Dar
N1 - Publisher Copyright:
© 2024 Copyright for this paper by its authors.
PY - 2024
Y1 - 2024
N2 - This paper presents a robust methodology for identifying sexism in social media texts as part of the EXIST 2024 challenge. First, we incorporate extensive data preprocessing techniques, including removing redundant elements, standardizing text formats, increasing data diversity by the back-translation, and augmenting texts using the AEDA approach. We then integrate annotator demographics such as gender, age, and ethnicity into our selected transformer-based language models. The rounding technique is used to handle non-continuous annotation values to maintain precise probability distributions. We empirically optimize shared layers across tasks based on the hard parameter-sharing techniques to improve generalization and computational efficiency. Rigorous evaluations were conducted using five-fold cross-validation to ensure the reliability of the findings. Finally, our system was respectively ranked first out of 40, 35, and 33 submissions for Tasks 1, 2 and 3 in the Soft-Soft category setting. In addition, in the Hard-Hard category setting, our system was ranked the first out of 70 submissions for Task 1; second out of 46 submissions for Task 2; and third out of 34 submissions for Task 3. This paper reports our findings in classifying sexism within social media textual content, offering substantial insights for the EXIST 2024 challenge.
AB - This paper presents a robust methodology for identifying sexism in social media texts as part of the EXIST 2024 challenge. First, we incorporate extensive data preprocessing techniques, including removing redundant elements, standardizing text formats, increasing data diversity by the back-translation, and augmenting texts using the AEDA approach. We then integrate annotator demographics such as gender, age, and ethnicity into our selected transformer-based language models. The rounding technique is used to handle non-continuous annotation values to maintain precise probability distributions. We empirically optimize shared layers across tasks based on the hard parameter-sharing techniques to improve generalization and computational efficiency. Rigorous evaluations were conducted using five-fold cross-validation to ensure the reliability of the findings. Finally, our system was respectively ranked first out of 40, 35, and 33 submissions for Tasks 1, 2 and 3 in the Soft-Soft category setting. In addition, in the Hard-Hard category setting, our system was ranked the first out of 70 submissions for Task 1; second out of 46 submissions for Task 2; and third out of 34 submissions for Task 3. This paper reports our findings in classifying sexism within social media textual content, offering substantial insights for the EXIST 2024 challenge.
KW - Pre-trained Language Models
KW - Sexism Identification
KW - Text Classification
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85201610497&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85201610497
SN - 1613-0073
VL - 3740
SP - 1003
EP - 1011
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 9 September 2024 through 12 September 2024
ER -