Review on recent Computer Vision Methods for Human Action Recognition

Azhee Wria Muhamada; Aree A. Mohammed

doi:10.14201/ADCAIJ2021104361379

Review on recent Computer Vision Methods for Human Action Recognition

Azhee Wria Muhamada

University of Sulaimani azhee.muhamad[at]univsul.edu.iq
Aree A. Mohammed

University of Sulaimani

https://doi.org/10.14201/ADCAIJ2021104361379

Resumen

La temo de homa agado-rekono estas konsiderata grava celo en la regado de komputila vizio ekde la komenco de ?ia disvolvi?o kaj atingis novajn nivelojn. ?i anka? estas pensata kiel simpla procedo. Problemoj ekestas en rapidaj kaj progresintaj scenoj, kaj la nombra analizo de artefarita inteligenteco (AI) per agado-anta?diro mistraktado pliigis la atenton de esploristoj por studi. Havante decajn metodikajn kaj enhavajn rilatajn varia?ojn, pluraj datenserioj estis kreitaj por trakti la taksadon de ?i tiuj manieroj. Homaj agadoj ludas gravan rolon sed kun malfacila karakteriza?o en diversaj kampoj. Multaj aplikoj ekzistas en ?i tiu kampo, kiel inteligenta hejmo, helpema AI, HCI (Homa-Komputila Interagado), progresoj en protekto en aplikoj kiel transportado, edukado, sekureco kaj administrado de medikamentoj, inkluzive faladon a? helpon al maljunuloj pri kuracado de drogoj. La pozitiva efiko de profundaj lernaj teknikoj sur multaj vidaj aplikoj kondukas al disfaldi ?i tiujn manierojn en video-prilaborado. Analizo de homaj kondutagadoj implikas gravajn defiojn kiam homa ?eesto temas. Unu individuo povas esti reprezentita en multoblaj videosekvencoj tra skeleto, movi?o kaj / a? abstraktaj karakteriza?oj. ?i tiu verko celas trakti homan ?eeston kombinante multajn eblojn kaj uzante novan RNN-strukturon por agadoj. La papero temigas lastatempajn progresojn en ma?inlernado-helpata agado. Ekzistantaj modernaj teknikoj por la rekono de agoj kaj prognozo simile ?ar la estonta amplekso por la analizo estas menciita precizeco ene de la recenzo-papero.

Referencias
Cómo citar
Del mismo autor
Métricas

Aggarwal, J.K. & Cai, Q. (1999). Human motion analysis: A review. Computer Vision and Image Understanding, 73, 428–440.
Albu, V. (2016). Measuring Customer Behavior with Deep Convolutional Neural Networks; BRAIN. Broad Research in Artificial Intelligence and Neuroscience, 7(1), pp. 74–79.
Baldominos, A., Saez, Y. & Isasi, P. (2018). Evolutionary Design of Convolutional Neural Networks for Human Activity Recognition in Sensor-Rich Environments. Sensors, 18(4), 1288.
Ballas, N., Yao, L., Pal, C. & Courville, A. (2015) . Delving Deeper into Convolutional Networks for Learning Video Reresentations. arXiv preprint arXiv:1511.06432.
Bojanowski, P., Lajugie, R., Bach, F., Laptev, I., Ponce, J., Schmid, C. & Sivic, J. (2014, September). Weakly supervised action labeling in videos under ordering constraints. In European Conference on Computer Vision, Springer, Cham, pp. 628–643.
Cao, L., Liu, Z. & Huang, T.S. (2010, June). Cross-dataset action detection. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp. 1998–2005.
Cippitelli, E., Gasparrini, S., Gambi, E. & Spinsante, S. (2016). A Human Activity Recognition System Using Skeleton Data from RGBD Sensors. Computational Intelligence and Neuroscience.
Cruz-Silva, J.E., Montiel-Pérez, J.Y. & Sossa-Azuela, H. (2019, October). 3-D Human Body Posture Reconstruction by Computer Vision. In Mexican International Conference on Artificial Intelligence, Springer, Cham, pp. 579–588.
Das, S., Koperski, M., Bremond, F. & Francesca, G. (2018). A fusion of appearance based CNNs and temporal evolution of skeleton with LSTM for daily live in action. Recognition ArXiv e-prints.
Delaitre, V., Laptev, I. & Sivic, J. (2010, August). Recognizing human actions in still images: a study of bag-of-features and part-based representations. In BMVC 2010-21st British Machine Vision Conference.
Faria, D.R., Premebida, C. & Nunes, U. (2014, August). A probabilistic approach for human everyday activities recognition using body motion from RGB-D images. In The 23rd IEEE International Symposium on Robot and Human Interactive Communication, IEEE, pp. 732–737.
Feichtenhofer, C., Pinz, A. & Zisserman, A. (2016). Convolutional twostream network fusion for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1933–1941.
Gaglio, S., Re, G.L. & Morana, M. (2014). Human Activity Recognition Process Using 3-D Posture Data. IEEE Transactions on Human-Machine Systems 45(5), 586–597.
Gao, Y., Xiang, X., Xiong, N., Huang, B., Lee, H.J., Alrifai, R., Jiang, X. & Fang, Z. (2018). Human Action Monitoring for Healthcare based on Deep Learning. IEEE Access, 6, 52277–52285.
Herath, S., Harandi, M. … Porikli, F. (2017). Going deeper into action recognition: A survey. Image and vision computing, 60, 4–21.
Idrees, H., Zamir, A. R., Jiang, Y. G., Gorban, A., Laptev, I., Sukthankar, R. … Shah, M. (2017). The THUMOS challenge on action recognition for videos ‘in the wild’. Computer Vision and Image Understanding, 155, 1–23.
Ijjina, E. P. & Krishna Mohan, C. (2016). Hybrid deep neural network model for human action recognition. Applied Soft Computing, 46, 936–952.
Jaouedi, N., Perales, F. J., Buades, J. M., Boujnah, N. & Bouhlel, M. S. (2020). Prediction of Human Activities Based on a New Structure of Skeleton Features and Deep Learning Model. Sensors, 20(17), 4944.
Ji, Y., Xu, F., Yang, Y., Shen, F., Shen, H. T. & Zheng, WS. (2018, October). A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition. In Proceedings of the 26th ACM Multimedia Conference on Multimedia, pp. 1510–1518).
Khaire, P., Kumar, P. & Imran, J. (2018). Combining CNN Streams of RGB-D and Skeletal Data for Human Activity Recognition. Pattern Recognition Letters, 115, 107–116.
Kim, H. & Kim, I. (2015). Human Activity Recognition as Time-Series Analysis. Mathematical Problems in Engineering.
Kliper-Gross, O., Hassner, T. & Wolf, L. (2011). The action similarity labeling challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 615–621.
Koppula, H.S., Gupta, R. & Saxena, A. (2013). Learning human activities and object affordances from RGB-D videos. The International Journal of Robotics Research, 32(8), 951–970.
Krishna, R., Hata, K., Ren, F., Fei-Fei, L. & Carlos Niebles, J. (2017). Densecaptioning events in videos. In Proceedings of the IEEE International Conference on Computer Vision, pp. 706–715.
Latah, M. (2017). Human action recognition using support vector machines and 3D convolutional neural networks. Int. J. Adv. Intell. Informatics, 3(1), 47.
Liu, C., Hu, Y., Li, Y., Song, S. & Liu, J. (2017). PKU-MMD: a large scale benchmark for continuous multi-modal human action understanding. arXiv:1703.07475
Liu, J., Luo, J. & Shah, M. (2009, June). Recognizing realistic actions from video? in the Wild. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition , IEEE, pp. 1996–2003.
Ma, S., Bargal, S. A., Zhang, J., Sigal, L. & Sclaroff, S. (2017). Do less and achieve more: Training CNNs for action recognition utilizing action images from the Web. Pattern Recognition, 68, 334–345.
Manzi, A., Dario, P. & Cavallo, F. (2017). A Human Activity Recognition System Based on Dynamic Clustering of Skeleton Data. Sensors, 17(5), 1100.
Minnen, D., Westeyn, T., Starner, T., Ward, J.A. & Lukowicz, P. (2006). Performance metrics and evaluation issues for continuous activity recognition. Performance Metrics for Intelligent Systems ,4 , pp. 141–148.
Murad, A. & Ryun, J. Y. (2017). Deep Recurrent Neural Networks for Human Activity Recognition. Sensors, 17(11), 2556.
Ni, B., Pei, Y., Moulin, P. & Yan, S. (2013). Multilevel Depth and Image Fusion for Human Activity Detection. IEEE Transactions on Cybernetics, 43(5), 1383–1394.
Ohnishi, K., Kanehira, A., Kanezaki, A. & Harada, T. (2016). Recognizing activities of daily living with a wrist-mounted camera. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3103–3111.
Pham, H. H., Khoudour, L., Crouzil, A., Zegers, P. & Velastin Carroza, S. A. (2015). Video-based human action recognition using deep learning: a review.
Qin, Z., Zhang, Y., Meng, S., Qin, Z. & Choo, K. R. (2020). Imaging and fusing time series for wearable sensors based human activity recognition. Information Fusion, 53, 80–87.
Reddy, K. K. & Shah, M. (2013). Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24(5), 971–981.
Rodríguez-Moreno, I., Martínez-Otzeta, J. M., Sierra, B., Rodriguez, I. & Jauregi, E. (2019). Video Activity Recognition: State-of-the-Art. Sensors , 19(14), 3160.
Shan, J. & Akella, S. (2014, September). 3D human action segmentation and recognition using pose kinetic energy. In 2014 IEEE International Workshop on Advanced Robotics and Its Social Impacts, IEEE, pp. 69–75.
Sharma, S., Kiros, R. & Salakhutdinov, R. (2015). Action recognition using visual attention. arXiv preprint arXiv:1511.04119.
Sigurdsson, G.A., Gupta, A., Schmid, C., Farhadi, A. & Alahari, K. (2018). Charades-ego: a large-scale dataset of paired third and firstperson videos. arXiv preprint arXiv:1804 - 09626.
Singh, R., Sonawane, A. & Srivastava, R. (2019). Recent evolution of modern datasets for human activity recognition: a deep survey. Multimedia Systems. 26(2), 83–106.
Singh, S., Arora, C. & Jawahar, C.V. (2017). Trajectory aligned features for first person action recognition. Pattern Recognit, 62, 45–55.
Sonwalkar, P., Sakhare, T., Patil, A. & Kale, S. (2015). Hand gesture recognition for real time human machine interaction system. International Journal of Engineering Trends and Technology (IJETT), 19(5), 262{264.
Srijan, D., Michal, K., Francois, B. & Gianpiero, F. (2018). A Fusion of Appearance based CNNs and Temporal evolution of Skeleton with LSTM for Daily Living Action Recognition. arXiv 2018, arXiv:1802.00421v1.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D. & Rabinovich, A. (2015). In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9.
Torralba, A. & Efros, A. A. (2011, June). Unbiased look at dataset bias. In CVPR 2011, IEEE, pp. 1521–1528.
Wang, J., Liu, Z., Wu, Y. & Yuan, J. (2013). Learning Actionlet Ensemble for 3D Human Action Recognition. IEEE transactions on pattern analysis and machine intelligence, 36(5), 914–927.
Wang, P., Li, W., Ogunbona, P., Wan, J. & Escalera, S. (2018). RGB-Dbased human motion recognition with deep learning: A survey. In Computer Vision and Image Understanding, 171, pp. 118–139.
Xu, Z., Hu, J. & Deng, W. (2016, July). Recurrent convolutional neural network for video classification. In 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE.
Zhang, L., Feng, Y., Han, J. & Zhen, X. (2016, March). Realistic human action recognition: When deep learning meets VLAD. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1352–1356.
Zhang, N., Hu, Z., Lee, S. & Lee, E. (2017). Human Action Recognition Based on Global Silhouette and Local Optical Flow. Adv. Eng. Res., 134, 1–5.
Zhao, R., Ali, H. & Van der Smagt, P. (2017, September). Two-stream RNN/CNN for action recognition in 3D videos. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 4260–4267.
Abebe, G., Catala, A. & Cavallaro, A. (2018, August). A first-person vision dataset of office activities. In IAPR Workshop on Multimodal Pattern Recognition of Social Signals in Human Computer Interaction, Springer, Cham, pp. 27–37.
Alfaro, A., Mery, D. & Soto, A. (2016). Action recognition in video using sparse coding and relative features. arXiv preprint arXiv:1605.03222. DOI: 10.3390/app7010110.
Ali, K.H. & Wang, T. (2014, July). Learning features for action recognition and identity with deep belief networks. In 2014 International Conference on Audio, Language and Image Processing, IEEE, pp. 129{132 (2014).
Damen, D., Doughty, H., Farinella, G. M., Fidler, S., Furnari, A., Kazakos, E. & Wray, M. (2018). Scaling egocentric vision: The epic-kitchens dataset. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 720–736.
Fathi, A., Li, Y. & Rehg, J.M. (2012, October). Learning to recognize daily actions using gaze. In European Conference on Computer Vision (pp. 314–327). Springer, Berlin, Heidelberg.
Gu, C., Sun, C., Ross, D. A., Vondrick, C., Pantofaru, C., Li, Y. … Malik, J. (2018). AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. doi:10.1109/cvpr.2018.00633)
Iwashita, Y., Takamine, A., Kurazume, R. & Ryoo, M.S. (2014, August). Firstperson animal activity recognition from egocentric videos. In 2014 22nd International Conference on Pattern Recognition (pp. 4310–4315). IEEE.
Li, Y., Liu, M. & Rehg, J. M. (2018). In the eye of beholder: joint learning of gaze and actions in first person video. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 619–635.
Li, Y., Ye, Z. & Rehg, J. M. (2015). Delving into egocentric actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 287–295).
Pirsiavash, H. & Ramanan, D. (2012, June). Detecting activities of daily livingin first-person camera views. In 2012 IEEE conference on computer vision and pattern recognition, pp. 2847–2854. IEEE.
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overtting. Journal of Machine Learning Research 15(1), 1929–1958.
Steinkraus, D., Buck, I. & Simard, P. Y. (2005, August). Using GPUs for machine learning algorithms. In Eighth International Conference on Document Analysis and Recognition (ICDAR’05), IEEE, pp. 1115-1120.
Sung, J., Ponce, C., Selman, B. & Saxena, A. (2011). Human activity detection from rgbd images. CoRRabs/1107.0169 DOI:10.1109/APSIPA.2014.7041588

Muhamada, A. W., & Mohammed, A. A. (2022). Review on recent Computer Vision Methods for Human Action Recognition. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 10(4), 361–379. https://doi.org/10.14201/ADCAIJ2021104361379