TY - GEN
T1 - A Self-Supervised Solution for the Switch-Toggling Visual Task
AU - Huang, Yuehong
AU - Tseng, Yu Chee
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - How a robot explores and interacts with the real world by itself is a major research challenge. On the other hand, causal reasoning and combinatorial generalization are indispensable parts of human intelligence for exploration and survival. This paper presents SelfSVT, a self-supervised solution for the switch-toggling visual task, of which the goal is to infer causalities of visual combinatorial effects on the environment. Specifically, when a robot takes over a new place with a set of light switches and knows nothing about the switches' functions, it has to figure out how to transfer the environment from the current visual state to a goal visual state by toggling these switches by itself. SelfSVT trains efficient learning models that are able to perform the goal-conditioned visual task by directly reasoning the causalities of different visual states or inferring the switch states from its observations. In particular, we use the switch state to directly represent the combinatorial effect to make self-supervised learning possible and our framework adopts a siamese network with a discrete contrastive loss. It can perform causal induction and combinatorial generalization in a new environment with a few interactions. Our solution outperforms previous methods in both simulated and real-world environments and both static and dynamic environments. SelfSVT could achieve 100% success reasoning rates in most cases when there are sufficient interactions with the environment.
AB - How a robot explores and interacts with the real world by itself is a major research challenge. On the other hand, causal reasoning and combinatorial generalization are indispensable parts of human intelligence for exploration and survival. This paper presents SelfSVT, a self-supervised solution for the switch-toggling visual task, of which the goal is to infer causalities of visual combinatorial effects on the environment. Specifically, when a robot takes over a new place with a set of light switches and knows nothing about the switches' functions, it has to figure out how to transfer the environment from the current visual state to a goal visual state by toggling these switches by itself. SelfSVT trains efficient learning models that are able to perform the goal-conditioned visual task by directly reasoning the causalities of different visual states or inferring the switch states from its observations. In particular, we use the switch state to directly represent the combinatorial effect to make self-supervised learning possible and our framework adopts a siamese network with a discrete contrastive loss. It can perform causal induction and combinatorial generalization in a new environment with a few interactions. Our solution outperforms previous methods in both simulated and real-world environments and both static and dynamic environments. SelfSVT could achieve 100% success reasoning rates in most cases when there are sufficient interactions with the environment.
UR - http://www.scopus.com/inward/record.url?scp=85143631206&partnerID=8YFLogxK
U2 - 10.1109/ICPR56361.2022.9956480
DO - 10.1109/ICPR56361.2022.9956480
M3 - Conference contribution
AN - SCOPUS:85143631206
T3 - Proceedings - International Conference on Pattern Recognition
SP - 3429
EP - 3435
BT - 2022 26th International Conference on Pattern Recognition, ICPR 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th International Conference on Pattern Recognition, ICPR 2022
Y2 - 21 August 2022 through 25 August 2022
ER -