Depth estimation and video synthesis for 2D to 3D video conversion

Chien Chih Han, Hsu-Feng Hsiao*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


With the recent progress of multi-view devices and the corresponding signal processing techniques, stereoscopic viewing experience has been introduced to the public with growing interest. To create depth perception in human vision, two different video sequences in binocular vision are required for viewers. Those videos can be either captured by 3D-enabled cameras or synthesized as needed. The primary contribution of this paper is to establish two transformation models for stationary scenes and non-stationary objects in a given view, respectively. The models can be used for the production of corresponding stereoscopic videos as a viewer would have seen at the original event of the scene. The transformation model to estimate the depth information for stationary scenes is based on the information of the vanishing point and vanishing lines of the given video. The transformation model for non-stationary regions is the result of combining the motion analysis of the non-stationary regions and the transformation model for stationary scenes to estimate the depth information. The performance of the models is evaluated using subjective 3D video quality evaluation and objective quality evaluation on the synthesized views. Performance comparison with the ground truth and a famous multi-view video synthesis algorithm, VSRS, which requires six views to complete synthesis, is also presented. It is shown that the proposed method can provide better perceptual 3D video quality with natural depth perception.

Original languageEnglish
Pages (from-to)33-46
Number of pages14
JournalJournal of Signal Processing Systems
Issue number1
StatePublished - 1 Jan 2014


  • 2D to 3D video conversion
  • Motion analysis
  • Vanishing point
  • View synthesis


Dive into the research topics of 'Depth estimation and video synthesis for 2D to 3D video conversion'. Together they form a unique fingerprint.

Cite this