Multi-view Video Based Object Segmentation - A Tutorial
Main Article Content
Abstract
Video based object segmentation (VBOS) is an important step in many computer vision and multimedia tasks such as video editing and compositing. In recent years, multi-view VBOS systems have become more and more popular because the stereo clues from multiview data can be efficiently incorporated to improve the segmentation results and eliminate the required initial user input. In this paper, we give a review on recent development of multi-view VBOS systems and the related techniques including data acquisition, camera calibration, depth reconstruction, object segmentation and tracking. Furthermore, we introduce our multiple objects segmentation system from multiview video sequence to illustrate the practical implementation of multi-view VBOS system for 3D video rendering applications.
Article Details
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.
- Creative Commons Copyright License
The journal allows readers to download and share all published articles as long as they properly cite such articles; however, they cannot change them or use them commercially. This is classified as CC BY-NC-ND for the creative commons license.
- Retention of Copyright and Publishing Rights
The journal allows the authors of the published articles to hold copyrights and publishing rights without restrictions.
References
[2] Y. Li, J. Sun, H. Y. Shum, "Video Object Cut and Paste," Proceedings of ACM SIGGRAPH 2005., volF. 23, pp. 595-600, 2005.
[3] J. S. Cardoso, J. C. S. Cardoso, L. Corte-Real, "Object-Based Spatial Segmentation of Video Guided by Depth and Motion Information," IEEE Workshop on Motion and Video Computing., pp. 7-7, 2007.
[4] I. D. Reid, K. Connor, "Multiview Segmentation and Tracking of Dynamic Occluding Layers," Proc. 16th British Machine Vision Conference., vol. 2, pp. 919-928, , 2005.
[5] P. J. Narayanan, P. Rander, T. Kanade, "Synchronous capture of image sequences from multiple cameras," tech. rep., The Robotics Institute, CMU, 1995.
[6] T. Kanade, H. Saito, S. Vedula, "The 3D room: digitizing time-varying 3D events by synchronized multiple video streams," tech. rep., The Robotics Institute, CMU, 1998.
[7] C. Zhang, T. Chen, "Multi-View Imaging: Capturing and Rendering Interactive Environments," Proc. Computer Vision for Interactive and Intelligent Environment., pp. 51-67, 2005.
[8] T. Naemura, J. Tago, H. Harashima, "Real Time Video-Based Modeling and Rendering of 3D Scenes," IEEE Computer Graphics and Applications., vol. 22, pp. 66-73, 2002.
[9] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, R. Szeliski, "High-quality video view interpolation using a layered representation," ACM Trans. on Graphics., vol. 23, pp. 600-608, 2004.
[10] N. Inamoto, H. Saito, "Intermediate view generation of soccer scene from multiple videos," IEEE Conf. on Computer Vision Pattern Recognition., vol. 2, pp. 713-716, 2002.
[11] W. Matusik, C. Buehler, R. Raskar, "Image Based Visual Hulls," Computer Graphics Proceedings, Annual Conference Series, ACM SIG-GRAPH., pp.369-376, 2000.
[12] R. G. Yang, G. Welch, G. Bishop, "Real-Time Consensus-Based Scene Reconstruction using Commodity Graphics Hardware," Proc. Pacific Conf. Computer Graphics and Applications., pp. 225-234, 2002.
[13] A. Kubota, A. Smolic, M. Magnor, M. Tanimoto, T. Chen, C. Zhang, "Multi-view Imaging and 3DTV," Proc. IEEE Signal Processing Magazine., vol. 24, pp.10-21, 2007.
[14] A. Fusiello, "Uncalibrated euclidean reconstruction: a review," Image and Vision Computing., vol.18, pp. 555-563, 2000.
[15] J. Salvi, X. Armangue, J. Batlle, "A comparative review of camera calibrating methods with accuracy evaluation," Pattern Recognition., vol. 35, pp. 1617-1635, 2002.
[16] R. Y. Tsai, "A versatile camera calibration technique for high accuracy 3d machine vision metrology using o®-the-shelf tv cameras and lenses," IEEE J. Robotics and Automation., 1987.
[17] O. Faugeras, G. Toscani, "The calibration problem for stereo," Proc. IEEE Conf. on Computer Vision and Pattern Recognition., pp.15-20. 1986.
[18] J. Weng, P. Cohen, M. Herniou, "Camera calibration with distortion models and accuracy evaluation," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 14, no. 10, pp.965-980, 1992.
[19] J. HeikkilÄa, "Geometric camera calibration using circular control points," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 22, no. 10, pp.1066-1077, 2000.
[20] O. Faugeras, T. Luong, S. Maybank, "Camera self calibration: theory and experiments," European Conf. on Computer Vision., pp.321-334, 1992.
[21] R. Hartley and A. Zisserman, "Multiple view geometry in computer vision," Cambridge University Press., 2002.
[22] S. Bougnoux, "From projective to euclidean space under any practical situation, a criticism of self-calibration.," Proc. 6th Intl. Conf. on Computer Vision., pp. 790-796, 1998.
[23] Z. Zhang, "A °exible new technique for camera calibration," IIEEE Trans. Pattern Analysis and Machine Intelligence., vol. 22, no.11, pp. 1330-1334, 2000.
[24] P. Sturm, S. Maybank, "On plane-based camera calibration: a general algorithm, singularities, applications," Proc. IEEE Conf. on Computer Vision and Pattern Recognition., pp. 432-437, 1999.
[25] B. Triggs, "Autocalibration from planar scenes," European Conf. on Computer Vision., pp. 89-105, 1998.
[26] S. Prince, A. D. Cheok, F. Farbiz, T. Williamson, N. Johnson, M.Billinghurst, H. Kato, "3D Live: Real Time Captured Content for Mixed Reality," International Symposium on Mixed and Augmented Reality., pp. 307-317, 2002.
[27] I. hrke, L. Ahrenberg, M. Magnor, "External Camera Calibration for Synchronized Multivideo Systems.," Journal of WSCG., vol. 12, 2004.
[28] B. Caprile, V. Torre, "Using vanishing points for camera calibration," International Journal of Computer Vision., vol. 4, no. 2, pp.127-140, 1990.
[29] P. Sturm, B. Triggs, "A Factorization Based Algorithm for Multi-Image Projective Structure and Motion," European Conference on Computer Vision., pp. 709-720, 1996.
[30] T. Ueshiba, F. Tomita, "Plane-based Calibration Algorithm for Multi-camera Systems via Factorization of Homography Matrices," International Conference on Computer Vision., vol. 2, pp. 966-
973, 2003.
[31] T. Svoboda, D. Martinec, T. Pajdla, "A Convenient Multicamera Self-calibration for Virtual Environments," PRESENCE: Teleoperators and Virtual Environments., vol. 14, no. 4, 2005.
[32] M. A. Penna, "Camera Calibration: A Quick and Easy Way to Determine the Scale Factor," IEEE Trans. Pattern Analysis and Machine Intelligence., , vol. 13, no. 12, pp. 1240-1245, 1991.
[33] D. Daucher, M. Dhome, J. Lapreste, "Camera Calibration from Spheres Images," Proc. European Conf. Computer Vision., pp. 449-454, 1994.
[34] H. Teramoto, G. Xu, "Camera Calibration by a Single Image of Balls: From Conics to the Absolute Conic," Proc. Fifth Asian Conf. Computer Vision., pp. 499-506, 2002.
[35] H. Zhang, Y. Wong, G. Zhang, "Camera Calibration from Images of Spheres," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 29, no. 3, 2007
[36] X. Chen, J. Davis, P. Slusallek, "Wide Area Camera Calibration Using Virtual Calibration Objects," IEEE Conf. on Computer Vision Pattern Recognition., vol. 2, pp. 520-527, 2000 .
[37] D. Scharstein, R. Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms," International Journal of Computer Vision., vol. 47, no. 1-3, pp. 7-42, 2002.
[38] T. Kanade, M. Okutomi, "A stereo matching algorithm with an adaptive window: theory and experiment," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 16, no. 9, pp. 920-932, 1994.
[39] O. Veksler, "Stereo correspondence with compact windows via minimum ratio cycle," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 24, no. 12, pp. 1654-1660, 2002.
[40] L. Tang, C. Wu, Z. Chen, "Image dense matching based on region growth with adaptive window," Pattern Recognition Letters., vol. 23, pp. 1169-1178, 2002.
[41] K. J. Yoon, I. S. Kweon, "Adaptive support weight approach for correspondence search," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 28, no. 4, pp. 650-656, 2006.
[42] Y. Zhang, C. Kambhamettu, "Stereo matching with segmentation based cooperation," European Conf. on Computer Vision., pp. 556-571, 2002.
[43] L. Hong and G. Chen, "Segment-based stereo matching using graph cuts," IEEE Conf. on Computer Vision Pattern Recognition., pp.74-81, 2004.
[44] L. Zitnick, S. B. Kang, "Stereo for image-based rendering using image over-segmentation," International Journal of Computer Vision., 2007.
[45] Y. Taguchi, B. Wilburn, L. Zitnick, "Stereo reconstruction with mixed pixels using adaptive over-segmentation," IEEE Conf. on Computer Vision Pattern Recognition., 2008.
[46] M. Gong, R. Yang, "Image-gradient-guided realtime stereo on graphics hardware.," Proc. IEEE 3DIM., pp. 548-555, 2005.
[47] S. Yoon, D. Min, K. Sohn, "Fast dense stereo matching using adaptive window in hierarchical framework," Proc. Int. Symposium on Visual Computing., pp. 316-325, 2006.
[48] V. Kolmogorov, R. Zabih, "Computing visual correspondence with occlusion using graph cuts," Proc. of International Conference on Computer Vision ., pp. 508-515, 2001.
[49] V. Kolmogorov, R. Zabih, "Multi-camera scene reconstruction via graph cuts," Proc. of European Conference on Computer Vision., pp. 82-96, 2002.
[50] Y. Wei, L. Quan, "Asymmetrical occlusion handling using graph cut for multi-view stereo," Proc. Computer Vision and Pattern Recognition., pp. 902-909, 2005.
[51] J. Sun, N. N. Zheng, H. Y. Shum, "Stereo matching using belief propagation," IEEE Transactions on Pattern Analysis and Machine Intelligence., vol. 25, no. 7, pp. 787-800, 2003.
[52] A. Klaus, M. Sormann, K. Karner, "Segmentbased stereo matching using belief propagation and a self-adapting dissimilarity measure," International Conference on Pattern Recognition., 2006.
[53] S. Larsen, P. Mordohai, M. Pollefeys, H. Fuchs, "Temporally consistent reconstruction from multiple video streams using enhanced belief propagation," IEEE International Conference on Computer Vision., 2007.
[54] M. F. Tappen, W. T. Freeman, "Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters," IEEE International Conference on Computer Vision., 2003.
[55] P. F. Felzenszwalb, D. P. Huttenlocher, "Efficient Belief Propagation for Early Vision," International Journal of Computer Vision., vol. 70, no. 1, pp. 41-54, 2006
[56] Y. Ohta, T. Kanade, "Stereo by intra and interscanline search using dynamic programming," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 7, no. 2, pp. 139-154, 1985.
[57] P. N. Belhumeur, "A Bayesian approach to binocular stereopsis," International Journal of Computer Vision., vol. 19, no. 3, pp. 237-260. 1996.
[58] P. N. Belhumeur, D. Mumford, "A Bayesian treatment of the stereo correspondence problem using half-occuluded regions," IEEE Conf. on Computer Vision Pattern Recognition., pp. 506-512, 1992.
[59] A. F. Bobick, S. S. Intille, "Large occlusion stereo," International Journal of Computer Vision., vol. 33, no. 3, pp. 181-200, 1999.
[60] C. Kim, K. M. Lee, B. T. Choi, S. U. Lee, "Adense stereo matching using two-pass dynamic programming with generalized ground control points," Proc. IEEE Conference on Computer Vision and Pattern Recognition., 2005.
[61] O. Veksler, "Stereo Correspondence by Dynamic Programming on a Tree," Proc. IEEE Conference on Computer Vision and Pattern Recognition., vol. 2, pp. 384-390, 2005.
[62] "vision.middlebury.edu/stereo/".
[63] D. Terzopoulos, "Regularization of inverse visual problems involving discontinuities," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 8, no. 4, pp. 413-424, 1986.
[64] S. Geman, D. Geman, "Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 6, no. 6, pp. 721-741, 1984.
[65] S. T. Barnard, "Stochastic stereo matching over scale," International Journal of Computer Vision., vol. 3, no. 1, pp.17-32, 1989.
[66] D. M. Greig, B. T. Porteous and A. H. Seheult, "Exact maximum a posteriori estimation for binary images," Journal of the Royal Statistical Society Series B., vol. 51, pp.271-279, 1989.
[67] Y. Boykov, M. P. Jolly, "Interactive Graph cuts for optimal boundary and region segmentation of objects in N-D images," Proc. IEEE Int. Conf. Computer Vision., pp: 105-112, 2001.
[68] L. Ford, D. Fulkerson, "Flows in network.," Princeton University Press., 1962.
[69] Y. Boykov, O. Veksler, R. Zabih, "Fast approximate energy minimization via graph cuts," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 23, no. 11, pp: 1222- 1239, 2001.
[70] Y. Li, J. Sun, C. K. Tang, H. Y. Shum, "Lazy Snapping," ACM Tranc. Graph., vol. 23, pp: 303-308, 2004.
[71] L. Vincent, P. Soille, "Watersheds in digital space: an e±cient algorithm based on immersion simulations," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 13, pp: 583-598, 1991.
[72] C. Rother, V. Kolmogorov, A. Blake, "Grabcut: Interactive foreground extraction using iterated graph cuts," ACM Tranc. Graph., vol. 23, pp: 309-314, 2004.
[73] S. Wang, J. M. Siskind, "Image Segmentation with Ratio Cut," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 25, pp: 675-690, 2003.
[74] J. B, Shi, J. Malik, "Normalized Cuts and Image Segmentation," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 22, pp: 888-905, 2000.
[75] Y. Mu, H. Zhang, H. L. Wang, W. Zuo, "Automatic video object segmentation using graph cut," Proc. IEEE Int. Conf. Image Processing., vol. 3, pp: 377-380, 2007.
[76] S. F. Chen, L. L. Cao, J. Z. Liu, X. O. Tang, "Iterative MAP and ML Estimations for Image Segmentation.," IEEE Conf. Computer Vision and Pattern Recognition., pp: 1-6, 2007.
[77] Y. P. Tasi, C. H. Ko, Y. P.Hung, Z. C. Shih, "Background Removal of Multiview Images by Learning Shape Priors," IEEE Trans. Image Processing., vol. 16, pp: 2607-2616, 2007.
[78] B. GoldlÄucke, M. A. Magnor, "Joint 3D-Reconstruction and Background Removal Separation in Multiple Views using Graph Cuts," Proc. IEEE Conf. Computer Vision and Pattern Recognition., vol. 1, pp: 683-688, 2003.
[79] M. Sormann, C. Zach, K. Karner, "Graph Cut Based Multiple View Segmentation for 3D Reconstruction," Proc. Int. Symposium on 3D Data Processing, Visualization, and Transmission., pp: 1085-1092, 2006.
[80] L. Itti, C. Koch, E. Niebur, "A Model of Saliency-based Visual Attention for Rapid Scene Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 20, pp: 1254-1259, 1998.
[81] C. Strouthopoulos, N. Papamarkos, "Multi-thresholding of mixed type documents," Engineering Application of Arti¯cial Intelligence., vol. 13, no. 3, pp: 323-343, 2000.