ck-NN: A Clustered k-Nearest Neighbours Approach for Large-Scale Classification

  • Rafi Ullah
    Karachi Institute of Economics and Technology
  • Ayaz H. Khan
    Karachi Institute of Economics and Technology ayaz.hassan[at]pafkiet.edu.pk
  • S.M. Emaduddin
    Karachi Institute of Economics and Technology

Abstract

k-Nearest Neighbor (k-NN) is a non-parametric algorithm widely used for the estimation and classification of data points especially when the dataset is distributed in several classes. It is considered to be a lazy machine learning algorithm as most of the computations are done during the testing phase instead of performing this task during the training of data. Hence it is practically inefficient, infeasible and inapplicable while processing huge datasets i.e. Big Data. On the other hand, clustering techniques (unsupervised learning) greatly affect results if you do normalization or standardization techniques, difficult to determine "k" Value. In this paper, some novel techniques are proposed to be used as pre-state mechanism of state-of-the-art k-NN Classification Algorithm. Our proposed mechanism uses unsupervised clustering algorithm on large dataset before applying k-NN algorithm on different clusters that might running on single machine, multiple machines or different nodes of a cluster in distributed environment. Initially dataset, possibly having multi dimensions, is pass through clustering technique (K-Means) at master node or controller to find the number of clusters equal to the number of nodes in distributed systems or number of cores in system, and then each cluster will be assigned to exactly one node or one core and then applies k-NN locally, each core or node in clusters sends their best result and the selector choose best and nearest possible class from all options. We will be using one of the gold standard distributed framework. We believe that our proposed mechanism could be applied on big data. We also believe that the architecture can also be implemented on multi GPUs or FPGA to take flavor of k-NN on large or huge datasets where traditional k-NN is very slow.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Barrientos, R., Gómez, J., Tenllado, C., and Prieto, M., 2010. Heap based k-nearest neighbor search on GPUs. In
Congreso Espanol de Informática (CEDI), pages 559–566.

Callahan, P. B. and Kosaraju, S. R., 1995. A Decomposition of Multidimensional Point Sets with Applications to K-nearest-neighbors and N-body Potential Fields. J. ACM, 42(1):67–90. ISSN 0004-5411. doi:
10.1145/200836.200853.

Chae, D.-K., Lee, S.-C., Lee, S.-
Y., and Kim, S.-W., 2018. On identifying k-nearest neighbors in neighborhood models for efficient and effective collaborative filtering. Neurocomputing, 278:134–143.

Connor, M. and Kumar, P., 2010. Fast construction of k-nearest neighbor graphs for point clouds. IEEE transactions on visualization and computer graphics, 16(4):599–608.

Deng, Z., Zhu, X., Cheng, D., Zong, M., and Zhang, S., 2016. Efficient kNN classification algorithm for big data.
Neurocomputing, 195:143–148.

Dong, W., Moses, C., and Li, K., 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World wide web, pages 577–586. ACM.

Garcia, V., Debreuve, E., and Barlaud, M., 2008. Fast k Nearest Neighbor Search using GPU. CoRR, abs/0804.1448.

Li, H., Li, H., and Wei, K., 2018.
Automatic fast double KNN classification algorithm based on ACC and hierarchical clustering for big data. International Journal of Communication Systems, 31(16):e3488.
doi:10.1002/dac.3488. E3488 IJCS-17-0750.R1.

Liang, S., Liu, Y., Wang, C., and Jian, L., 2009. A CUDA-based parallel implementation of K-nearest neighbor algorithm. In Cyber-Enabled Distributed Computing and Knowledge Discovery, 2009. CyberC’09.
International Conference on, pages 291–296. IEEE.

Liang, S., Liu, Y., Wang, C., and Jian, L., 2010. Design and evaluation of a parallel k-nearest neighbor algorithm
on CUDA-enabled GPU. In Web Society (SWS), 2010 IEEE 2nd Symposium on, pages 53–60. IEEE.

Lu,W., Shen, Y., Chen, S., and Ooi, B. C., 2012. Efficient processing of k nearest neighbor joins using mapreduce.
Proceedings of the VLDB Endowment, 5(10):1016–1027.

Masek, J., Burget, R., Karasek, J., Uher, V., and Dutta, M. K., 2015. Multi-GPU implementation of knearest neighbor algorithm. In Telecommunications and Signal Processing (TSP), 2015 38th International
Conference on, pages 764–767. IEEE.

Patwary, M. M. A., Satish, N. R., Sundaram, N., Liu, J., Sadowski, P., Racah, E., Byna, S., Tull, C., Bhimji, W., Dubey, P. et al., 2016. PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures.
In Parallel and Distributed Processing Symposium, 2016 IEEE International, pages 494–503. IEEE.

Pu, Y., Peng, J., Huang, L., and Chen, J., 2015. An efficient knn algorithm implemented on fpga based heterogeneous computing system using opencl. In Field-Programmable Custom Computing Machines
(FCCM), 2015 IEEE 23rd Annual International Symposium on, pages 167–170. IEEE.

Rehman, A.-u., Ullah, R., and Abdullah, F., 2015. Big Data Analysis in IoT. In Handbook of Research on Trends
and Future Directions in Big Data and Web Intelligence, pages 313–327. IGI Global.

Seidl, T. and Kriegel, H.-P., 1998. Optimal Multi-step K-nearest Neighbor Search. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD ’98, pages 154–165. ACM,
New York, NY, USA. ISBN 0-89791-995-5. doi:10.1145/276304.276319.

Sohani, A., Ullah, R., Rai, A., and Karni, O., 2018. Closest Match Based Information Retrieval and
Recommendation Engine using Signature-Trees and Fuzzy Relevance Sorting. American Scientific Research
Journal for Engineering, Technology, and Sciences (ASRJETS), 45(1):120–134.

Wang, L., Zhu, X., Yang, B., Guo, J., Liu, S., Li, M., Zhu, J., and Abraham, A., 2018. Accelerating nearest
neighbor partitioning neural network classifier based on CUDA. Engineering Applications of Artificial
Intelligence, 68:53–62.
Ullah, R., Khan, A. H., & Emaduddin, S. (2019). ck-NN: A Clustered k-Nearest Neighbours Approach for Large-Scale Classification. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 8(3), 67–77. https://doi.org/10.14201/ADCAIJ2019836777

Downloads

Download data is not yet available.

Author Biography

Ayaz H. Khan

,
Karachi Institute of Economics and Technology
Computer Science DepartmentAssistant Professor
+