Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space: Experimental Results

Ahmad HASSANAT

Abstract


Finding the diameter of a dataset in multidimensional Euclidean space is a well-established problem, with well-known algorithms. However, most of the algorithms found in the literature do not scale well with large values of data dimension, so the time complexity grows exponentially in most cases, which makes these algorithms impractical. Therefore, we implemented 4 simple greedy algorithms to be used for approximating the diameter of a multidimensional dataset; these are based on minimum/maximum l2 norms, hill climbing search, Tabu search and Beam search approaches, respectively. The time complexity of the implemented algorithms is near-linear, as they scale near-linearly with data size and its dimensions. The results of the experiments (conducted on different machine learning data sets) prove the efficiency of the implemented algorithms and can therefore be recommended for finding the diameter to be used by different machine learning applications when needed.


Keywords


Furthest pair; computational geometry; hill climbing; Tabu search; Beam search

Full Text:

PDF

References


Agarwal, P. K., Har-Peled, S. & Varadarajan, K. R., 2005. Geometric approximation via coresets. Combinatorial and computational geometry, Volume 52, pp. 1-30.

Agarwal, P. K., Matoušek, J. & Suri, S., 1992. Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Computational Geometry, 1(4), pp. 189-201.

Aggarwal, C. C., 2016. Outlier Analysis. Second Edition ed. New York: Springer.

Bespamyatnikh, S., 1998. An efficient algorithm for the three-dimensional diameter problem. s.l., s.n., p. 137–146.

Broder, A. Z., Glassman, S. C., Manasse, M. S. & Zweig, G., 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8), pp. 1157-1166.

Chan, T. M., 2002. Approximating the diameter, width, smallest enclosing cylinder, and minimum-width annulus. International journal of computational geometry and applications, 12(1), pp. 67-85.

Chazelle, B., Edelsbrunner, H., Guibas, L. & Sharir, M., 1992. Diameter, width, closest line pair, and parametric searching. s.l., ACM, pp. 120-129.

Clarkson, K. L. & Shor, P. W., 1989. Applications of random sampling in computational geometry, II. Discrete & Computational Geometry, 4(5), pp. 387-421.

Eg̃eciog̃lu, Ö. & Kalantari, B., 1989. Approximating the diameter of a set of points in the Euclidean space. Information Processing Letters, 32(4), pp. 205-211.

Figueroa, K., Navarro, G. & Chavez, E., 2007. Metric Spaces Library. [Online]

Available at: http://www.sisap.org/Metric_Space_Library.html

Finocchiaro, D. V. & Pellegrini, M., 2002. On computing the diameter of a point set in high dimensional Euclidean space. Theoretical Computer Science, Volume 287, p. 501–514.

Gudivada, V. N. & Raghavan, V. V., 1995. Content based image retrieval systems. Computer, 28(9), pp. 18-22.

Har-Peled, S., 2001. A practical approach for computing the diameter of a point set. s.l., ACM, pp. 177-186.

Hassanat, A. B. & Tarawneh, A. S., 2016. Fusion of Color and Statistic Features for Enhancing Content-Based Image Retrieval Systems. Journal of Theoretical & Applied Information Technology, 88(3).

Imanparast, M., Hashemi, S. N. & Mohades, A., 2016. An efficient approximation for point set diameter in fixed dimensions. preprint arXiv:1610.08543 [cs.CG].

Le Bourdais, F., 2015. The Farthest Neighbors Algorithm. [Online]

Available at: https://flothesof.github.io/farthest-neighbors.html

[Accessed 1 Feb 2018].

Lichman, M., 2013. [Online]

Available at: http://archive.ics.uci.edu/ml

Matoušek, J. & Schwarzkopf, O., 1993. A deterministic algorithm for the three-dimensional diameter problem. s.l., ACM, pp. 478-484.

Pagh, R., Silvestri, F., Sivertsen, J. & Skala, M., 2015. Approximate furthest neighbor in high dimensions. s.l., Springer, pp. 3-14.

Pagh, R., Silvestri, F., Sivertsen, J. & Skala, M., 2017. Approximate furthest neighbor with application to annulus query. Information Systems, Volume 64, pp. 152-162.

Preparata, F. & Shamos, M., 1985. Computational Geometry: an Introduction. New York: Springer.

Ramos, E., 1997. Construction of 1-d lower envelopes and applications. s.l., s.n., p. 57–66.

Ramos, E. A., 2000. Deterministic algorithms for 3-D diameter and some 2-D lower envelopes. s.l., ACM, pp. 290-299.

Supowit, K. J., 1990. New techniques for some dynamic closest-point and farthest-point problems. s.l., Society for Industrial and Applied Mathematics, pp. 84-90.

Williams, R., 2018. On the Difference Between Closest, Furthest, and Orthogonal Pairs: Nearly-Linear vs Barely-Subquadratic Complexity. s.l., Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1207-1215.

Yao, A. C. C., 1982. On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM Journal on Computing, 11(4), pp. 721-736.




DOI: http://dx.doi.org/10.14201/ADCAIJ2018731530





Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

Clarivate Analytics