Matrix Hashing with Random Probing in 1D Array

  • Rajeev Ranjan Kumar Tripathi
    Department of Computer Science and Engineering, Madan Mohan Malaviya University of Technology, Gorakhpur, Uttar Pradesh, India, 273008 rajeevranjankumartripathi[at]gmail.com
  • Pradeep Kumar Singh
    Department of Computer Science and Engineering, Madan Mohan Malaviya University of Technology, Gorakhpur, Uttar Pradesh, India, 273008
  • Sarv Pal Singh
    Department of ITCA, Madan Mohan Malaviya University of Technology, Gorakhpur, Uttar Pradesh, India, 273008

Abstract

The current computing era enables the generation of vast amounts of data, which must be processed to extract valuable insights. This processing often requires multiple query operations, where hashing plays a crucial role in accelerating query response times. Among hashing techniques, Cuckoo Hashing has demonstrated greater efficiency than conventional methods, offering simplicity and ease of integration into various real-world applications. However, Cuckoo Hashing also has limitations, including data collisions, data loss due to collisions, and the potential for endless loops that lead to high insertion latency and frequent rehashing. To address these challenges, this work introduces a modified Matrix hashing technique. The core concept of the proposed scheme is to utilize both a 2D array and an additional 1D array with random probing to create a more robust technique that competes effectively with Cuckoo Hashing. This study also introduces degree of dexterity as a new performance metric, in addition to the traditional load factor. Furthermore, the Even-Odd hash function is proposed to ensure a more balanced load distribution. Through rigorous experimental analysis in a single-threaded environment, this modified Matrix hashing with random probing in the 1D array is shown to effectively resolve key issues associated with Cuckoo Hashing, such as excessive data migration, inefficient memory usage, and high insertion latency.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Awad, M. A., Ashkiani, S., Porumbescu, S. D., Farach-Colton, M., & Owens, J. D. (2023). Analyzing and Implementing GPU Hash Tables. 2023 Symposium on Algorithmic Principles of Computer Systems (APOCS), 33–50. https://doi.org/10.1137/1.9781611977578.ch3.

Bai, X., Yang, H., Zhou, J., Ren, P., & Cheng, J. (2014). Data-dependent hashing based on p-stable distribution. IEEE Transactions on Image Processing, 23(12), 5033–5046. https://doi.org/10.1109/TIP.2014.2352458.

Balasundaram, R., & Sudha, G. F. (2021). Retrieval performance analysis of multibiometric database using optimized multidimensional spectral hashing-based indexing. Journal of King Saud University – Computer and Information Sciences, 33(1), 110–117. https://doi.org/10.1016/j.jksuci.2018.02.003.

Bozsolik, T. (2019). Random numbers. https://www.kaggle.com/code/timoboz/a-lot-of-random-numbers. https://doi.org/10.34740/KAGGLE/DSV/816507.

Cai, D. (2021). A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search. IEEE Transactions on Knowledge and Data Engineering, 33(6), 2337–2348. https://doi.org/10.1109/TKDE.2019.2953897.

Carter, J. L., & Wegman, M. N. (1979). Universal classes of hash functions. Journal of Computer and System Sciences, 18(2), 143–154. https://doi.org/10.1016/0022-0000(79)90044-8.

Chi, L., & Zhu, X. (2017). Hashing techniques: A survey and taxonomy. ACM Computing Surveys (CSUR), 50(1), 1–36. https://doi.org/10.1145/3047307.

Debnath, B. K., Sengupta, S., & Li, J. (2010). ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory. USENIX Annual Technical Conference, 1–16.

Fan, B., Andersen, D. G., & Kaminsky, M. (2013). Memc3: Compact and concurrent memcache with dumber caching and smarter hashing. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), 371–384.

Fang, S.-H., Fei, Y.-X., Xu, Z., & Tsao, Y. (2017). Learning transportation modes from smartphone sensors based on deep neural network. IEEE Sensors Journal, 17(18), 6111–6118. https://doi.org/10.1109/JSEN.2017.2737825.

Ferdman, M., Lotfi-Kamran, P., Balet, K., & Falsafi, B. (2011). Cuckoo directory: A scalable directory for many-core systems. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture, 169–180. https://doi.org/10.1109/HPCA.2011.5749726.

García-Peñalvo, F., Vázquez-Ingelmo, A., García-Holgado, A., Sampedro-Gómez, J., Sánchez-Puente, A., Vicente-Palacios, V., Dorado-Díaz, P. I., & Sánchez, P. L. (2024). KoopaML: A Graphical Platform for Building Machine Learning Pipelines Adapted to Health Professionals. International Journal of Interactive Multimedia and Artificial Intelligence, 8(6), 112–119. https://doi.org/10.9781/ijimai.2023.01.006.

Guibas, L. J. (1978). The analysis of hashing techniques that exhibit k-ary clustering. Journal of the ACM (JACM), 25(4), 544–555. https://doi.org/10.1145/322092.322096.

Jiménez, A., Elizalde, B., & Raj, B. (2018). Acoustic scene classification using discrete random hashing for Laplacian kernel machines. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 146–150. https://doi.org/10.1109/ICASSP.2018.8461631.

Kirsch, A., & Mitzenmacher, M. (2010). More robust hashing: Cuckoo Hashing with a stash. SIAM Journal on Computing, 39(4), 1543–1561. https://doi.org/10.1137/080728743.

Kurpicz, F., Lehmann, H.-P., & Sanders, P. (2023). Pachash: Packed and compressed hash tables. In 2023 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), 162–175. https://doi.org/10.1137/1.9781611977561.ch14.

Larson, P. (1983). Analysis of uniform hashing. Journal of the ACM (JACM), 30(4), 805–819. https://doi.org/10.1145/2157.322407.

Liang, X., & Wang, G. (2017). A convolutional neural network for transportation mode detection based on smartphone platform. In Proc. MASS, 338–342. https://doi.org/10.1109/MASS.2017.81.

Maurer, W. D., & Lewis, T. G. (1975). Hash table methods. ACM Computing Surveys (CSUR), 7(1), 5–19. https://doi.org/10.1145/356643.356645.

Minaud, B., & Papamanthou, C. (2023). Generalized Cuckoo Hashing with a stash, revisited. Information Processing Letters, 181, 106356. https://doi.org/10.1016/j.ipl.2022.106356.

Pagh, R. (2001). On the cell probe complexity of membership and perfect hashing. In Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing, 425–432. https://doi.org/10.1145/380752.380836.

Pagh, R., & Rodler, F. F. (2004). Cuckoo Hashing. Journal of Algorithms, 51(2), 122–144. https://doi.org/10.1016/j.jalgor.2003.12.002.

Patel, F. S., & Kasat, D. (2017). Hashing based indexing techniques for content-based image retrieval: A survey. In 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 279–283. https://doi.org/10.1109/ICIMIA.2017.7975619.

Pontarelli, S., Reviriego, P., & Mitzenmacher, M. (2018). Emoma: Exact match in one memory access. IEEE Transactions on Knowledge and Data Engineering, 30(11), 2120–2133. https://doi.org/10.1109/TKDE.2018.2818716.

Shi, S., & Qian, C. (2020). Ludo hashing: Compact, fast, and dynamic key-value lookups for practical network systems. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(2), 1–32. https://doi.org/10.1145/3392140.

Skarlatos, D., Kokolis, A., Xu, T., & Torrellas, J. (2020). Elastic Cuckoo Page Tables: Rethinking Virtual Memory Translation for Parallelism. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’20, 1093–1108. https://doi.org/10.1145/3373376.3378493.

Stevens, H. (2018). Hans Peter Luhn and the birth of the hashing algorithm. IEEE Spectrum, 55(2), 44–49. https://doi.org/10.1109/MSPEC.2018.8278136.

Sun, Y., Hua, Y., Feng, D., Yang, L., Zuo, P., Cao, S., & Guo, Y. (2016). A Collision-mitigation Cuckoo Hashing scheme for large-scale storage systems. IEEE Transactions on Parallel and Distributed Systems, 28(3), 619–632. https://doi.org/10.1109/TPDS.2016.2594763.

Suruliandi, T., Idhaya, S., & Raja, P. (2024). Drug Target Interaction Prediction Using Machine Learning Techniques – A Review. International Journal of Interactive Multimedia and Artificial Intelligence, 8(6), 86–100. https://doi.org/10.9781/ijimai.2022.11.002.

Vitter, J. S. (1983). Analysis of the search performance of coalesced hashing. Journal of the ACM (JACM), 30(2), 231–258. https://doi.org/10.1145/322374.322375.

Wu, Y., Liu, Z., Yu, X., Gui, J., Gan, H., Han, Y., Li, T., Rottenstreich, O., & Yang, T. (2021). MapEmbed: Perfect Hashing with High Load Factor and Fast Update. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, 1863–1872. https://doi.org/10.1145/3447548.3467240.

Wu, X., Zhu, X., & Wu, M. (2022). The Evolution of Search: Three Computing Paradigms. ACM Transactions on Management Information Systems (TMIS), 13(2). https://doi.org/10.1145/3495214.
Tripathi, R. R. K., Singh, P. K., & Singh, S. P. (2025). Matrix Hashing with Random Probing in 1D Array. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 14, e31698. https://doi.org/10.14201/adcaij.31698

Downloads

Download data is not yet available.
+