Finding an effective way to represent human actions is yet an open problem because it usually requires taking evidences extracted from various temporal resolutions into account. A conventional way of representing an action employs tem-porally ordered fine-grained movements, e.g., key poses or subtle motions. Many existing approaches model actions by directly learning the transitional relationships between those fine-grained features. Yet, an action data may have many similar observations with occasional and irregular changes, which make commonly used fine-grained features less reli-able. This paper presents a set of temporal pyramid features that enriches action representation with various levels of se-mantic granularities. For learning and inferring the proposed pyramid features, we adopt a discriminative model with latent variables to capture the hidden dynamics in each layer of the pyramid. Our method is evaluated on a Tai-Chi Chun dataset and a daily activities dataset. Both of them are collected by us. Experimental results demonstrate that our approach achieves more favorable performance than existing methods.
|Name||ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings|
- conditional random fields
- human action recognition
- temporal pyramid representation