TY - JOUR
T1 - Learning and recognition of on-premise signs from weakly labeled street view images
AU - Tsai, Tsung Hung
AU - Cheng, Wen-Huang
AU - You, Chuang Wen
AU - Hu, Min Chun
AU - Tsui, Arvin Wen
AU - Chi, Heng Yu
PY - 2014/3/1
Y1 - 2014/3/1
N2 - Camera-enabled mobile devices are commonly used as interaction platforms for linking the user's virtual and physical worlds in numerous research and commercial applications, such as serving an augmented reality interface for mobile information retrieval. The various application scenarios give rise to a key technique of daily life visual object recognition. On-premise signs (OPSs), a popular form of commercial advertising, are widely used in our living life. The OPSs often exhibit great visual diversity (e.g., appearing in arbitrary size), accompanied with complex environmental conditions (e.g., foreground and background clutter). Observing that such real-world characteristics are lacking in most of the existing image data sets, in this paper, we first proposed an OPS data set, namely OPS-62, in which totally 4649 OPS images of 62 different businesses are collected from Google's Street View. Further, for addressing the problem of real-world OPS learning and recognition, we developed a probabilistic framework based on the distributional clustering, in which we proposed to exploit the distributional information of each visual feature (the distribution of its associated OPS labels) as a reliable selection criterion for building discriminative OPS models. Experiments on the OPS-62 data set demonstrated the outperformance of our approach over the state-of-the-art probabilistic latent semantic analysis models for more accurate recognitions and less false alarms, with a significant 151.28% relative improvement in the average recognition rate. Meanwhile, our approach is simple, linear, and can be executed in a parallel fashion, making it practical and scalable for large-scale multimedia applications.
AB - Camera-enabled mobile devices are commonly used as interaction platforms for linking the user's virtual and physical worlds in numerous research and commercial applications, such as serving an augmented reality interface for mobile information retrieval. The various application scenarios give rise to a key technique of daily life visual object recognition. On-premise signs (OPSs), a popular form of commercial advertising, are widely used in our living life. The OPSs often exhibit great visual diversity (e.g., appearing in arbitrary size), accompanied with complex environmental conditions (e.g., foreground and background clutter). Observing that such real-world characteristics are lacking in most of the existing image data sets, in this paper, we first proposed an OPS data set, namely OPS-62, in which totally 4649 OPS images of 62 different businesses are collected from Google's Street View. Further, for addressing the problem of real-world OPS learning and recognition, we developed a probabilistic framework based on the distributional clustering, in which we proposed to exploit the distributional information of each visual feature (the distribution of its associated OPS labels) as a reliable selection criterion for building discriminative OPS models. Experiments on the OPS-62 data set demonstrated the outperformance of our approach over the state-of-the-art probabilistic latent semantic analysis models for more accurate recognitions and less false alarms, with a significant 151.28% relative improvement in the average recognition rate. Meanwhile, our approach is simple, linear, and can be executed in a parallel fashion, making it practical and scalable for large-scale multimedia applications.
KW - Real-world objects
KW - learning and recognition
KW - object image data set
KW - street view scenes
UR - http://www.scopus.com/inward/record.url?scp=84893846216&partnerID=8YFLogxK
U2 - 10.1109/TIP.2014.2298982
DO - 10.1109/TIP.2014.2298982
M3 - Article
C2 - 24474374
AN - SCOPUS:84893846216
SN - 1057-7149
VL - 23
SP - 1047
EP - 1059
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
IS - 3
M1 - 6705667
ER -