Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space: Experimental Results

Abstract

Finding the diameter of a dataset in multidimensional Euclidean space is a well-established problem, with well-known algorithms. However, most of the algorithms found in the literature do not scale well with large values of data dimension, so the time complexity grows exponentially in most cases, which makes these algorithms impractical. Therefore, we implemented 4 simple greedy algorithms to be used for approximating the diameter of a multidimensional dataset; these are based on minimum/maximum l2 norms, hill climbing search, Tabu search and Beam search approaches, respectively. The time complexity of the implemented algorithms is near-linear, as they scale near-linearly with data size and its dimensions. The results of the experiments (conducted on different machine learning data sets) prove the efficiency of the implemented algorithms and can therefore be recommended for finding the diameter to be used by different machine learning applications when needed.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Agarwal, P. K., Har-Peled, S. & Varadarajan, K. R., 2005. Geometric approximation via coresets. Combinatorial and computational geometry, Volume 52, pp. 1-30.

Agarwal, P. K., Matoušek, J. & Suri, S., 1992. Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Computational Geometry, 1(4), pp. 189-201. - https://doi.org/10.1016/0925-7721(92)90001-9

Aggarwal, C. C., 2016. Outlier Analysis. Second Edition ed. New York: Springer.

Bespamyatnikh, S., 1998. An efficient algorithm for the three-dimensional diameter problem. s.l., s.n., p. 137-146.

Broder, A. Z., Glassman, S. C., Manasse, M. S. & Zweig, G., 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8), pp. 1157-1166. - https://doi.org/10.1016/S0169-7552(97)00031-7

Chan, T. M., 2002. Approximating the diameter, width, smallest enclosing cylinder, and minimum-width annulus. International journal of computational geometry and applications, 12(1), pp. 67-85. - https://doi.org/10.1142/S0218195902000748

Chazelle, B., Edelsbrunner, H., Guibas, L. & Sharir, M., 1992. Diameter, width, closest line pair, and parametric searching. s.l., ACM, pp. 120-129. - https://doi.org/10.1145/142675.142702

Clarkson, K. L. & Shor, P. W., 1989. Applications of random sampling in computational geometry, II. Discrete & Computational Geometry, 4(5), pp. 387-421. - https://doi.org/10.1007/BF02187740

Eg?eciog?lu, Ö. & Kalantari, B., 1989. Approximating the diameter of a set of points in the Euclidean space. Information Processing Letters, 32(4), pp. 205-211. - https://doi.org/10.1016/0020-0190(89)90045-8

Figueroa, K., Navarro, G. & Chavez, E., 2007. Metric Spaces Library. [Online]

Available at: http://www.sisap.org/Metric\_Space\_Library.html

Finocchiaro, D. V. & Pellegrini, M., 2002. On computing the diameter of a point set in high dimensional Euclidean space. Theoretical Computer Science, Volume 287, p. 501-514. - https://doi.org/10.1016/S0304-3975(01)00258-4

Gudivada, V. N. & Raghavan, V. V., 1995. Content based image retrieval systems. Computer, 28(9), pp. 18-22. - https://doi.org/10.1109/2.410145

Har-Peled, S., 2001. A practical approach for computing the diameter of a point set. s.l., ACM, pp. 177-186. - https://doi.org/10.1145/378583.378662

Hassanat, A. B. & Tarawneh, A. S., 2016. Fusion of Color and Statistic Features for Enhancing Content-Based Image Retrieval Systems. Journal of Theoretical & Applied Information Technology, 88(3).

Imanparast, M., Hashemi, S. N. & Mohades, A., 2016. An efficient approximation for point set diameter in fixed dimensions. preprint arXiv:1610.08543 [cs.CG].

Le Bourdais, F., 2015. The Farthest Neighbors Algorithm. [Online]

Available at: https://flothesof.github.io/farthest-neighbors.html

[Accessed 1 Feb 2018].

Lichman, M., 2013. [Online]

Available at: http://archive.ics.uci.edu/ml

Matoušek, J. & Schwarzkopf, O., 1993. A deterministic algorithm for the three-dimensional diameter problem. s.l., ACM, pp. 478-484. - https://doi.org/10.1145/167088.167217

Pagh, R., Silvestri, F., Sivertsen, J. & Skala, M., 2015. Approximate furthest neighbor in high dimensions. s.l., Springer, pp. 3-14. - https://doi.org/10.1007/978-3-319-25087-8_1

Pagh, R., Silvestri, F., Sivertsen, J. & Skala, M., 2017. Approximate furthest neighbor with application to annulus query. Information Systems, Volume 64, pp. 152-162. - https://doi.org/10.1016/j.is.2016.07.006

Preparata, F. & Shamos, M., 1985. Computational Geometry: an Introduction. New York: Springer. - https://doi.org/10.1007/978-1-4612-1098-6

Ramos, E., 1997. Construction of 1-d lower envelopes and applications. s.l., s.n., p. 57-66. - https://doi.org/10.1145/262839.262864

Ramos, E. A., 2000. Deterministic algorithms for 3-D diameter and some 2-D lower envelopes. s.l., ACM, pp. 290-299. - https://doi.org/10.1145/336154.336215

Supowit, K. J., 1990. New techniques for some dynamic closest-point and farthest-point problems. s.l., Society for Industrial and Applied Mathematics, pp. 84-90.

Williams, R., 2018. On the Difference Between Closest, Furthest, and Orthogonal Pairs: Nearly-Linear vs Barely-Subquadratic Complexity. s.l., Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1207-1215. - https://doi.org/10.1137/1.9781611975031.78

Yao, A. C. C., 1982. On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM Journal on Computing, 11(4), pp. 721-736. - https://doi.org/10.1137/0211059
Hassanat, A. (2018). Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space: Experimental Results. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 7(3), 15–30. https://doi.org/10.14201/ADCAIJ2018731530

Downloads

Download data is not yet available.

Author Biography

Ahmad Hassanat

,
Mutah University
Ahmad B. A Hassanat is an associate professor of computer science, born and grew up in Jordan; he received his Ph.D. in Computer Science from the University of Buckingham at Buckingham, UK in 2010, and B.S. and M.S. degrees in Computer Science from Mutah University/Jordan and Al al-Bayt University/Jordan in 1995 and 2004, respectively. He has been a faculty member of Information Technology department at Mutah University since 2010. His main interests include computer vision; Machine learning, Big data analysis, Evolutionary algorithms and AI. 
+