TY - GEN
T1 - Taming NLU Noise
T2 - 2024 IEEE Spoken Language Technology Workshop, SLT 2024
AU - Rohmatillah, Mahdin
AU - Chien, Jen Tzung
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Dialogue policy is a crucial component of dialogue systems, responsible for determining system responses based on user inputs. While reinforcement learning (RL) can effectively optimize the dialogue policy, the system performance in real-world settings is heavily influenced by an earlier component for natural language understanding (NLU). Once the NLU produces a wrong information, the dialogue policy will be affected to degrade the performance. To enhance the robustness of dialogue policy, this paper proposes integrating RL optimization with a noisy student-teacher learning, taming the noise generated by NLU. To prevent overconfidence during knowledge transfer from the teacher, we introduce a dual-teacher mechanism where knowledge distillation is carried out by using dynamic changes in the samples stored in the replay buffer which leverages the exploration-exploitation paradigm from RL. Evaluations on multi-domain multi-turn dialogue tasks demonstrate the effectiveness of this approach which shows the increased robustness to noisy NLU outputs and accordingly the improved overall system performance.
AB - Dialogue policy is a crucial component of dialogue systems, responsible for determining system responses based on user inputs. While reinforcement learning (RL) can effectively optimize the dialogue policy, the system performance in real-world settings is heavily influenced by an earlier component for natural language understanding (NLU). Once the NLU produces a wrong information, the dialogue policy will be affected to degrade the performance. To enhance the robustness of dialogue policy, this paper proposes integrating RL optimization with a noisy student-teacher learning, taming the noise generated by NLU. To prevent overconfidence during knowledge transfer from the teacher, we introduce a dual-teacher mechanism where knowledge distillation is carried out by using dynamic changes in the samples stored in the replay buffer which leverages the exploration-exploitation paradigm from RL. Evaluations on multi-domain multi-turn dialogue tasks demonstrate the effectiveness of this approach which shows the increased robustness to noisy NLU outputs and accordingly the improved overall system performance.
KW - Dialogue policy
KW - multi-domain dialogue
KW - reinforcement learning
KW - student-teacher learning
UR - http://www.scopus.com/inward/record.url?scp=85217432659&partnerID=8YFLogxK
U2 - 10.1109/SLT61566.2024.10832293
DO - 10.1109/SLT61566.2024.10832293
M3 - Conference contribution
AN - SCOPUS:85217432659
T3 - Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
SP - 849
EP - 856
BT - Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 2 December 2024 through 5 December 2024
ER -