Spatiotemporal Activity Semantics Understanding Based on Foreground Object Segmentation: iCounter Scenario

Tzu Wei Yu, Muhammad Atif Sarwar, Yousef Awwad Daraghmi, Sheng Hsien Cheng, Tsi Ui Ik*, Yih Lang Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Foreground object segmentation that captures the spatial and temporal information of moving objects in video is the most fundamental task for activity understanding in many intelligent applications, such as smart stores. Recently, several methods are proposed for the detection and recognition of activity based on object segmentation. However, these methods are often inaccurate because they do not maintain the temporal associations of object segment consistency across time. In this work, we proposed a hierarchical approach for foreground object segmentation and activity semantics understanding from sequential video to preserve spatial and temporal connectivity in the frames. The proposed system consists of two main modules: (a) the concatenated deep learning network containing PSPNet and convolutional-GRU to segment the foreground of an object of interest; (b) the activity mining framework which incorporates three sub-modules (i) a RetinaNet-based frame classifier to detect and count objects of interest; (ii) a time-domain activity and event detection algorithm; (iii) an image-based item query engine to recognize the shopping items. To evaluate the proposed approach, we designed the smart checkout-box called iCounter to collect the shopping activities dataset named 'NOL-41' which is used in extensive experiments. The results show that the accuracy of the foreground object segmentation is 90.6%, the accuracy of the frame classification is 93.4%, the accuracy of activity event detection is 98.4%, and the accuracy of item query is 94.3%. Finally, the overall accuracy of the shopping list is 95.2%.

Original languageEnglish
Pages (from-to)57748-57758
Number of pages11
JournalIEEE Access
StatePublished - 2022


  • Activity recognition
  • Activity semantics
  • Conv-GRU
  • Foreground object segmentation
  • Image database
  • Image query
  • PSPNet
  • Retinanet
  • Self-checkout
  • Smart store
  • Video foreground segmentation


Dive into the research topics of 'Spatiotemporal Activity Semantics Understanding Based on Foreground Object Segmentation: iCounter Scenario'. Together they form a unique fingerprint.

Cite this