300 156
Full Length Article
Volume 2 , Issue 2, PP: 64-73 , 2020


Egocentric Performance Capture: A Review

Authors Names :   Shivam Grover   1 *     Kshitij Sidana   2     Vanita Jain   3  

1  Affiliation :  Bharati Vidyapeeth's College of Engineering,INDIA

    Email :  shivumgrover@gmail

2  Affiliation :  Bharati Vidyapeeth's College of Engineering, INDIA

    Email :  kshitijsidana@gmail.com

3  Affiliation :  3Bharati Vidyapeeth's College of Engineering, INDIA

    Email :  vanita.jain@bharatividyapeeth.edu

Doi   :  10.5281/zenodo.3951315

Received: March 13, 2020 Revised: April 20, 2020 Accepted: June 19, 2020

Abstract :

Performance capture of human beings have been used to animate 3D characters for movies and games for several decades now. Traditional performance capture methods require a dedicated costly setup which usually consists of more than one sensor placed at a distance from the subject, hence requiring a large amount of budget and space to accomodate. This lowers its feasibility and portability by a huge amount. Egocentric (first-person/wearable) cameras, however, are attached to the body and hence are mobile. With a rise of acceptance of wearable technology by the general public, wearable cameras have gotten cheaper too. We can make use of their excessive portability in the performance capture domain. However working with egocentric images is a mammoth task as the views are severely distorted due to the first-person perspective and the body parts farther from the camera are highly prone to being occluded. In this paper, we review the existing state-of-the-art methods about performance capture using egocentric based views.

Keywords :

Egocentric Performance , Image Analysis , 3D Animation

References :

[1]    A. Woodward, Y. H. Chan, R. Gong, M. Nguyen, T. Gee, P. Delmas, G. Gimel’Farb, and J. A. M. Flores, “A low cost framework for real-time marker based 3-D human expression modeling,” Journal of Applied Research and Technology, vol. 15, no. 1, pp. 61–77, 2017.

[2]    A. Kolahi, M. Hoviattalab, T. Rezaeian, M. Alizadeh, M. Bostan, and H. Mokhtarzadeh, “Design of a marker-based human motion tracking system,” Biomedical Signal Processing and Control, vol. 2, no. 1, pp. 59–67, 2007.

[3]    L. Ballan and G. M. Cortelazzo. “Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes”. In 3DPVT. Atlanta, GA, USA, June 2008.

[4]    E. De Aguiar, C. Stoll, C. Theobalt, N. Ahmed, H.-P. Seidel, and S. Thrun. “Performance capture from sparse multi-view video.” In ACM Transactions on Graphics (TOG), vol. 27, p. 98. ACM, 2008.

[5]    E. de Aguiar, C. Theobalt, C. Stoll, and H.-P. Seidel. “Marker-less deformable mesh tracking for human shape and motion capture.” In Proc. CVPR, 2007.

[6]    J. Starck and A. Hilton. “Surface capture for performance-based animation.”Computer Graphics and Applications, 27(3):21–31, 2007.

[7]    D. Vlasic, I. Baran, W. Matusik, and J. Popovic. “Articulated mesh animation from multi-view silhouettes.” In ACM Transactions on Graphics (TOG), vol. 27, p. 97. ACM, 2008.

[8]    C. Wu, C. Stoll, L. Valgaerts, and C. Theobalt. “On-set performance capture of multiple actors with a stereo camera.”ACM Transactions on Graphics (TOG), 32(6):161, 2013.

[9]    G. Ye, Y. Liu, N. Hasler, X. Ji, Q. Dai, and C. Theobalt. “Performance capture of interacting characters with handheld kinects.” In Proc. ECCV, pp. 828–841. Springer, 2012.

[10] S. L. Colyer, M. Evans, D. P. Cosker, and A. I. T. Salo, “A Review of the Evolution of Vision-Based Motion Analysis and the Integration of Advanced Computer Vision Methods Towards Developing a Markerless System,” Sports Medicine - Open, vol. 4, no. 1, 2018.

[11] W. Xu, A. Chatterjee, M. Zollhöfer, H. Rhodin, D. Mehta, H.-P. Seidel, and C. Theobalt, “MonoPerfCap,” ACM Transactions on Graphics, vol. 37, no. 2, pp. 1–15, 2018.

[12] D. Osokin, “Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose,” Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, 2019.

[13] Z. Cao, G. H. Martinez, T. Simon, S.-E. Wei, and Y. A. Sheikh, “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2019.

[14] K. Chen, “Sitting Posture Recognition Based on OpenPose,” IOP Conference Series: Materials Science and Engineering, vol. 677, p. 032057, 2019.

[15] A. P. Yunus, N. C. Shirai, K. Morita, and T. Wakabayashi, “Time Series Human Motion Prediction Using RGB Camera and OpenPose,” International Symposium on Affective Science and Engineering, vol. ISASE2020, pp. 1–4, 2020.

[16] A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization,” 2015 IEEE International Conference on Computer Vision (ICCV), 2015.

[17] J. Y. Chang, G. Moon, and K. M. Lee, “V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.

[18] A. Newell, K. Yang, and J. Deng, “Stacked Hourglass Networks for Human Pose Estimation,” Computer Vision – ECCV 2016 Lecture Notes in Computer Science, pp. 483–499, 2016.

[19] X. Zhang, D. Zhang, J. Ge, K. Hu, L. Yang, and P. Chen, “Multi-stage Real-time Human Head Pose Estimation,” 2019 6th International Conference on Systems and Informatics (ICSAI), 2019.

[20] A. S. Jackson, A. Bulat, V. Argyriou, and G. Tzimiropoulos, “Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression,” 2017 IEEE International Conference on Computer Vision (ICCV), 2017.

[21] H. Joo, H. Liu, L. Tan, L. Gui, B. Nabbe, I. Matthews, T. Kanade, S. Nobuhara, and Y. Sheikh, “Panoptic Studio: A Massively Multiview System for Social Motion Capture,” 2015 IEEE International Conference on Computer Vision (ICCV), 2015.

[22] A. Fathi, A. Farhadi, and J. M. Rehg, “Understanding egocentric activities,” 2011 International Conference on Computer Vision, 2011.

[23] K. M. Kitani, T. Okabe, Y. Sato, and A. Sugimoto, “Fast unsupervised ego-action learning for first-person sports videos,” Cvpr 2011, 2011.

[24] M. Ma, H. Fan, and K. M. Kitani, “Going Deeper into First-Person Activity Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[25] K. Ohnishi, A. Kanehira, A. Kanezaki, and T. Harada, “Recognizing Activities of Daily Living with a Wrist-Mounted Camera,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[26] Y.-C. Su and K. Grauman, “Detecting Engagement in Egocentric Video,” Computer Vision – ECCV 2016 Lecture Notes in Computer Science, pp. 454–471, 2016.

[27] H.S.Park, E. Jain, & Y. Sheikh“3D Social Saliency from Head-mounted Cameras.” NIPS (2012).

[28] A. Jones, G. Fyffe, X. Yu, W.-C. Ma, J. Busch, R. Ichikari, M. Bolas, and P. Debevec, “Head-Mounted Photometric Stereo for Performance Capture,” 2011 Conference for Visual Media Production, 2011.

[29] J. Wang, Y. Cheng, and R. S. Feris, “Walk and Learn: Facial Attribute Representation Learning from Egocentric Video and Contextual Data,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[30] Y. Sugano and A. Bulling, “Self-Calibrating Head-Mounted Eye Trackers Using Egocentric Visual Saliency,” Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology - UIST 15, 2015.

[31] D. Kim, O. Hilliges, S. Izadi, A. D. Butler, J. Chen, I. Oikonomidis, and P. Olivier, “Digits,” Proceedings of the 25th annual ACM symposium on User interface software and technology - UIST 12, 2012.

[32] P. Zhang, K. Siu, J. Zhang, C. K. Liu, and J. Chai, “Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture,” ACM Transactions on Graphics, vol. 33, no. 6, pp. 1–14, 2014.

[33] M. Elgharib, R.MallikarjunB.,A.Tewari, H. Kim, W. Liu, H. Seidel, &C.Theobalt(2019). “EgoFace: Egocentric Face Performance Capture and Videorealistic Reenactment.” ArXiv, abs/1905.10822.

[34] S.-E. Wei, J. M. Saragih, T. Simon, A. W. Harley, S. Lombardi, M. Perdoch, A. Hypes, D. wei Wang, H. Badino, and Y. Sheikh, “Vr facial animation via multiview image translation,” ACM Transactions on Graphics (TOG), vol. 38, pp. 1 – 16, 2019.

[35] Y.-W. Cha, T. Price, Z. Wei, X. Lu, N. Rewkowski, R. Chabra, Z. Qin, H. Kim, Z. Su, Y. Liu, A. Ilie, A. State, Z. Xu, J.-M. Frahm, and H. Fuchs, “Towards fully mobile 3d face, body, and environment capture using only head-worn cameras,” IEEE Transactions on Visualization and Computer Graphics, vol. 24, pp. 2993–3004, 2018.

[36] S. Sridhar, F. Mueller, A. Oulasvirta, and C. Theobalt, “Fast and robust hand tracking using detection-guided optimization,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

[37] S. Sridhar, A. Oulasvirta, and C. Theobalt. Interactive markerless articulated hand motion tracking using RGB and depth data. In Proc. of ICCV 2013, pages 2456–2463

[38] G. Rogez, J. S. Supancic, and D. Ramanan. First-person pose recognition using egocentric workspaces. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4325–4333, 2015.

[39] F. Mueller, D. Mehta, O. Sotnychenko, S. Sridhar, D. Casas, and C. Theobalt. “Real-time hand tracking under occlusion from an egocentric rgb-d sensor.” In International Conference on Computer Vision (ICCV), 2017.

[40] T. Shiratori, H. S. Park, L. Sigal, Y. Sheikh, and J. K. Hodgins. “Motion capture from body-mounted cameras. In ACM Transactions on Graphics (TOG),” vol. 30, p. 31. ACM, 2011.

[41] H. Jiang and K. Grauman, “Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[42] H. Yonemoto, K. Murasaki, T. Osawa, K. Sudo, J. Shimamura, and Y. Taniguchi. “Egocentric articulated pose tracking for action recognition.” In International Conference on Machine Vision Applications (MVA), May 2015. doi: 10.1109/MVA.2015.7153142

[43] H. Rhodin, C. Richardt, D. Casas, E. Insafutdinov, M. Shafiei, H.-P. Seidel, B. Schiele, and C. Theobalt. “Egocap: Egocentric marker-less motion capture with two fisheye cameras.”ACM Trans. Graph., 35(6):162:1– 162:11, November 2016. doi: 10.1145/2980179.2980235

[44] W. Xu, A. Chatterjee, M. Zollhofer, H. Rhodin, P. Fua, H.-P. Seidel, and C. Theobalt, “Mo2Cap2 : Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 5, pp. 2093–2101, 2019.

[45] G. Varol, J. Romero, X. Martin, N. Mahmood, M. Black, I. Laptev, and C. Schmid. “Learning from synthetic humans.” In CVPR, 2017

[46] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A skinned multi-person linear model,” ACM Trans. Graphics (Proc. SIGGRAPH Asia), vol. 34, no. 6, pp. 248:1–248:16, Oct. 2015.

[47] Carnegie Mellon University Motion Capture Database. http://mocap. cs.cmu.edu/.

[48] Z. Xu, Y. Yang, and A. G. Hauptmann. “A discriminative CNN video representation for event detection.” In CVPR, 2015.

[49] A. Fathi, X. Ren, and J. M. Rehg. “Learning to recognize objects in egocentric activities.” In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference On, pages 3281–3288. IEEE, 2011.

[50] A. Fathi, Y. Li, and J. M. Rehg. “Learning to recognize daily actions using gaze.” In Computer Vision–ECCV 2012, pages 314–327. Springer, 2012.

[51] C. Cao, Y. Zhang, Y. Wu, H. Lu, and J. Cheng. “Egocentric gesture recognition using recurrent 3d convolutional neural networks with spatiotemporal transformer modules. ” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3763–3771, 2017.


[52] A. Jones, G. Fyffe, X. Yu, W.-C. Ma, J. Busch, R. Ichikari, M. Bolas, and P. Debevec. “Head-mounted photometric stereo for performance capture.” In CVMP, 2011. doi: 10.1109/CVMP.2011.24