Methods for Assessing, Predicting, and Improving Data Veracity: A survey

Abstract

Data is an essential part of smart cities, and data can play an important role indecision making processes. Data generated through web applications and devicesutilize the Internet of Things (IoT) and related technologies. Thus, it is also importantto be able to create big data, which has historically been defined as having threekey dimensions: volume, variety, and velocity. However, recently, veracity has beenadded as the fourth dimension. Data veracity relates to the quality of the data. Anypotential issues with the quality of the data must be corrected because low-quality dataleads to poor software construction, and ultimately bad decision making. In this work,we reviewed the existing literature on related technical solutions that address dataveracity based on the domain of its application, including social media, web, and IoTapplications. The challenges or limitations and related gaps in existing work will bediscussed, and future research directions will be proposed to address the critical issuesof data veracity in the era of big data
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Agarwal, B., Ravikumar, A., and Saha, S., 2016. A Novel Approach to Big Data Veracity Using Crowdsourcing Techniques and Bayesian Predictors. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 1020-1023.

Aman, S., Chelmis, C., and Prasanna, V., 2014. Addressing data veracity in big data applications. In 2014 IEEE International Conference on Big Data (Big Data), pages 1-3. IEEE.

Amini, M., Chang, S., and Malmir, B., 2016. A fuzzy MADM method for uncertain attributes using ranking distribution. In Proceedings of the industrial and systems engineering research conference.

Batini, C., Cappiello, C., Francalanci, C., and Maurino, A., 2009. Methodologies for data quality assessment and improvement. ACM computing surveys (CSUR), 41(3):1-52.

Batista, A. F., da Silva, D. L., and Correa, P. L., 2017. Enabling Data Legitimacy in Data-Driven Projects. In 2017 International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC), pages 50-54. IEEE.

Berti-Equille, L. and Borge-Holthoefer, J., 2015. Veracity of Data: From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics. Morgan and Claypool.

Berti-Équille, L., 2015. Data veracity estimation with ensembling truth discovery methods. In 2015 IEEE International Conference on Big Data (Big Data), pages 2628-2636.

Booth, D., Haas, H., McCabe, F., Newcomer, E., Champion, M., Ferris, C., and Orchard, D., 2004. W3C working group note 11: Web Services architecture. World Wide Web Consortium (W3C).

Chen, F. and Jiang, L., 2014. A parallel algorithm for datacleansing in incomplete information systems using mapreduce. In 2014 Tenth International Conference on Computational Intelligence and Security, pages 273-277. IEEE.

De Lucia, A., 2001. Program slicing: Methods and applications. In Proceedings First IEEE International Workshop on Source Code Analysis and Manipulation, pages 142-149. IEEE.

Debattista, J., Lange, C., Scerri, S., and Auer, S., 2015. Linked’Big’Data: towards a manifold increase in big data value and veracity. In 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC), pages 92-98. IEEE.

Deshpande, A., Guestrin, C., Madden, S. R., Hellerstein, J. M., and Hong, W., 2004. Model-driven data acquisition in sensor networks. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pages 588-599.

Devi, P. S., Karthika, S., Venugopal, P., and Geetha, R., 2020. Veracity Analysis and Prediction in Social Big Data. In Information and Communication Technology for Sustainable Development, pages 289-298. Springer.

Diao, Y., Li, B., Liu, A., Peng, L., Sutton, C., Tran, T., and Zink, M., 2009. Capturing data uncertainty in high-volume stream processing. arXiv preprint arXiv:0909.1777.

Elloumi, O., Block, T. D., and Samovich, N., 2019. Market Drivers and High Level Architecture for IoT-enabled Data Market places. Technical report.

Gao, W., Hatcher, W. G., and Yu, W., 2018. A Survey of Blockchain: Techniques, Applications, and Challenges. In 2018 27th International Conference on Computer Communication and Networks (ICCCN), pages 1-11.

García Holgado, A., Marcos Pablos, S., García Peñalvo, F. J. et al., 2020. Guidelines for performing Systematic Research Projects Reviews. International Journal of Interactive Multimedia and Artificial Intelligence, 6(2):9.

Giasemidis, G., Singleton, C., Agrafiotis, I., Nurse, J. R., Pilgrim, A., Willis, C., and Greetham, D. V., 2016. Determining the veracity of rumours on Twitter. In International Conference on Social Informatics, pages 185-205. Springer.

Guptill, S. C. and Morrison, J. L., 2013. Elements of spatial data quality. Elsevier.

Herrera, A. E. H., Walshaw, C., Bailey, C., and Yin, C., 2019. Failure Mode Effect Analysis for Improving Data Veracity and Validity. In 2019 International Conference on Computing, Electronics Communications Engineering (iCCECE).

Hirst, G., 2007. Views of Text Meaning in Computational Linguistics: Past, Present, and Future. na.

Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., and Shahabi, C., 2014. Big data and its technical challenges. Communications of the ACM, 57(7):86-94.

Jamil, N. B. C. E. ., Ishak, I. B., Sidi, F., Affendey, L. S., and Mamat, A., 2015. A Systematic Review on the Profiling of Digital News Portal for Big Data Veracity. Procedia Computer Science, 72:390-397. ISSN 1877-0509.

Jeffery, S. R., Garofalakis, M., and Franklin, M. J., 2006. Adaptive cleaning for RFID data streams. In Vldb, volume 6, pages 163-174. Citeseer.

Jiang, B., Huang, G., Wang, T., Gui, J., and Zhu, X., 2020. Trust based energy efficient data collection with unmanned aerial vehicle in edge network. Transactions on Emerging Telecommunications Technologies, page e3942.

Kaisler, S., Armour, F., Espinosa, J. A., and Money, W., 2013. Big data: Issues and challenges moving forward. In 2013 46th Hawaii International Conference on System Sciences, pages 995-1004. IEEE.

Kepner, J., Gadepally, V., Michaleas, P., Schear, N., Varia, M., Yerukhimovich, A., and Cunningham, R. K., 2014. Computing on masked data: a high performance method for improving big data veracity. In 2014 IEEE High Performance Extreme Computing Conference (HPEC), pages 1-6. IEEE.

Kitchenham, B. and Charters, S., 2007. Guidelines for performing systematic literature reviews in software engineering.

Klein, A. and Lehner, W., 2009. Representing data quality in sensor data streaming environments. Journal of Data and Information Quality (JDIQ), 1(2):1-28.

Kreindler, D. M. and Lumsden, C. J., 2006. The effects of the irregular sample and missing data in time series analysis. Nonlinear dynamics, psychology, and life sciences.

Kumar, A., Sangwan, S. R., and Nayyar, A., 2019. Rumour veracity detection on twitter using particle swarm optimized shallow classifiers. Multimedia Tools and Applications, 78(17):24083-24101.

Kwon, S., Cha, M., and Jung, K., 2017. Rumor detection over varying time windows. PloS one, 12(1).

Li, T., Liu, W., Wang, T., Ming, Z., Li, X., and Ma, M., 2020. Trust data collections via vehicles joint with unmanned aerial vehicles in the smart Internet of Things. Transactions on Emerging Telecommunications Technologies, page e3956.

Lin, H., Hu, J., Liu, J., Xu, L., and Wu, Y., 2015. A Context Aware Reputation Mechanism for Enhancing Big Data Veracity in Mobile Cloud Computing. In 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pages 2049-2054.

Lin, H., Hu, J., Tian, Y., Yang, L., and Xu, L., 2017. Toward better data veracity in mobile cloud computing: A context-aware and incentive-based reputation mechanism. Information Sciences, 387:238-253.

Liu, X., Tamminen, S., Su, X., Siirtola, P., Röning, J., Riekki, J., Kiljander, J., and Soininen, J.-P., 2018. Enhancing Veracity of IoT Generated Big Data in Decision Making. In 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pages 149-154. IEEE.

Lozano, M. G., Brynielsson, J., Franke, U., Rosell, M., Tjörnhammar, E., Varga, S., and Vlassov, V., 2020. Veracity assessment of online data. Decision Support Systems, 129:113132.

Lozano, M. G., Franke, U., Rosell, M., and Vlassov, V., 2015. Towards automatic veracity assessment of open source information. In 2015 IEEE International Congress on Big Data, pages 199-206. IEEE.

L’Heureux, A., Grolinger, K., Elyamany, H. F., and Capretz, M. A. M., 2017. Machine Learning With Big Data: Challenges and Approaches. IEEE Access, 5:7776-7797.

Ma, J., Gao, W., Wei, Z., Lu, Y., and Wong, K.-F., 2015. Detect rumors using time series of social context information on microblogging websites. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 1751-1754.

McArdle, G. and Kitchin, R., 2016. Improving the veracity of open and real-time urban data. Built Environment, 42(3):457-473.

Moyne, J. and Iskandar, J., 2017. Big data analytics for smart manufacturing: Case studies in semiconductor manufacturing. Processes, 5(3):39.

Oguz, D., Ergenc, B., Yin, S., Dikenelli, O., and Hameurlain, A., 2015. Federated query processing on linked? data: a qualitative survey and open challenges.

Olivieri, A. C., Shabani, S., Sokhn, M., and Cudré-Mauroux, P., 2017. Assessing data veracity through domain specific knowledge base inspection. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), pages 291-296.

Paryani, J., TK, A. K., and George, K., 2017. Entropy-Based Model for Estimating Veracity of Topics from Tweets. In International Conference on Computational Collective Intelligence, pages 417-427. Springer.

Patgiri, R. and Ahmed, A., 2016. Big data: The v’s of the game changer paradigm. In 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pages 17-24. IEEE.

Ramachandramurthy, S., Subramaniam, S., and Ramasamy, C., 2015. Distilling big data: Refining quality information in the era of yottabytes. The Scientific World Journal, 2015.

Razzaque, M. A., Bleakley, C., and Dobson, S., 2013. Compression in wireless sensor networks: A survey and comparative evaluation. ACM Transactions on Sensor Networks (TOSN), 10(1):1-44.

Reed, D., Sprony, M., Longley, D., Allen, C., Grant, R., and Sabadello, M., 2018. Decentralized identifiers (DIDs) v0. 11 data model and syntaxes for decentralized identifiers (DIDs). W3C. W3C, Cambridge, MA, USA, Tech. Rep.

Ren, Y., Zeng, Z., Wang, T., Zhang, S., and Zhi, G., 2020. A trust-based minimum cost and quality aware data collection scheme in P2P network. Peer-to-Peer Networking and Applications, pages 1-24.

Reps, T., Horwitz, S., Sagiv, M., and Rosay, G., 1994. Speeding up slicing. ACM SIGSOFT Software Engineering Notes, 19(5):11-20.

Rodríguez, C. C. G. and Servigne, S., 2013. Managing Sensor Data Uncertainty: a data quality approach. International Journal of Agricultural and Environmental Information Systems (IJAEIS), 4(1):35-54.

Rosenthal, S., Mohammad, S. M., Nakov, P., Ritter, A., Kiritchenko, S., and Stoyanov, V., 2019. Semeval-2015 task 10: Sentiment analysis in twitter. arXiv preprint arXiv:1912.02387.

Rubin, V. and Lukoianova, T., 2013. Veracity roadmap: Is big data objective, truthful and credible? Advances in Classification Research Online, 24(1):4.

Rubin, V. L., 2006. Identifying certainty in texts. Unpublished Doctoral Thesis, Syracuse University, Syracuse, NY.

Samuel, H. and Zaiane, O., 2018. MedFact: Towards improving veracity of medical information in social media using applied machine learning. In Canadian Conference on Artificial Intelligence, pages 108-120. Springer.

Sanyal, S. and Zhang, P., 2018. Improving quality of data: IoT data aggregation using device to device communications. IEEE Access, 6:67830-67840.

Shahnawaz and Astya, P., 2017. Sentiment analysis: Approaches and open issues. In 2017 International Conference on Computing, Communication and Automation (ICCCA), pages 154-158.

Shannon, C. E., 2001. A mathematical theory of communication. ACM SIGMOBILE mobile computing and communications review, 5(1):3-55.

Singh, J. P., Rana, N. P., and Dwivedi, Y. K., 2019. Rumour Veracity Estimation with Deep Learning for Twitter. In International Working Conference on Transfer and Diffusion of IT, pages 351-363. Springer.

Srivastava, P. and Mostafavi, A., 2018. Challenges and opportunities of crowdsourcing and participatory planning in developing infrastructure systems of smart cities. Infrastructures, 3(4):51.

Tarmizi, F. A. A., Tan, P. X., Sharif, K. Y., and Kamioka, E., 2019. Online news veracity assessment using emotional weight. In Proceedings of the 2019 2nd International Conference on Information Science and Systems, pages 60-64.

Tekiner, F. and Keane, J. A., 2013. Big data framework. In 2013 IEEE International Conference on Systems, Man, and Cybernetics, pages 1494-1499. IEEE.

Turner, S., 2004. Defining and measuring traffic data quality: White paper on recommended approaches. Transportation research record, 1870(1):62-69.

Tzeng, G.-H. and Huang, J.-J., 2011. Multiple attribute decision making: methods and applications. CRC press. Vosoughi, S., Mohsenvand, M. and Roy, D., 2017. Rumor gauge: Predicting the veracity of rumors on Twitter. ACM transactions on knowledge discovery from data (TKDD), 11(4):1-36.

Wibowo, S. and Sandikapura, T., 2019. Improving Data Security, Interoperability, and Veracity using Blockchain for One Data Governance, Case Study of Local Tax Big Data. In 2019 International Conference on ICT for Smart Society (ICISS), volume 7, pages 1-6. IEEE.

Wibowo, S. and Sumari, A. D. W., 2020. The Utilization of Blockchain for Enhancing Big Data Security and Veracity. In Combating Security Challenges in the Age of Big Data, pages 157-187. Springer.

Wu, K., Yang, S., and Zhu, K. Q., 2015. False rumors detection on sina weibo by propagation structures. In 2015 IEEE 31st international conference on data engineering, pages 651-662. IEEE.

Yin, S. and Kaynak, O., 2015. Big data for modern industry: challenges and trends [point of view]. Proceedings of the IEEE, 103(2):143-146.

Zaparniuk, J., Yuille, J. C., and Taylor, S., 1995. Assessing the credibility of true and false statements. International Journal of Law and Psychiatry.

Zhang, J., Wong, J.-S., Pan, Y., and Li, T., 2014. A parallel matrix-based method for computing approximations in incomplete information systems. IEEE Transactions on Knowledge and Data Engineering, 27(2):326-339.

Zhou, Y., De, S., Wang, W., and Moessner, K., 2016. Search techniques for the web of things: A taxonomy and survey. Sensors, 16(5):600.
Assiri, F. (2020). Methods for Assessing, Predicting, and Improving Data Veracity: A survey. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 9(4), 5–30. https://doi.org/10.14201/ADCAIJ202094530

Downloads

Download data is not yet available.

Author Biography

Fatmah Assiri

,
University of Jeddah
University of Jeddah, College of Computer Science and Engineering, Jeddah, Saudi Arabia
+