Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space: Experimental Results
Abstract Finding the diameter of a dataset in multidimensional Euclidean space is a well-established problem, with well-known algorithms. However, most of the algorithms found in the literature do not scale well with large values of data dimension, so the time complexity grows exponentially in most cases, which makes these algorithms impractical. Therefore, we implemented 4 simple greedy algorithms to be used for approximating the diameter of a multidimensional dataset; these are based on minimum/maximum l2 norms, hill climbing search, Tabu search and Beam search approaches, respectively. The time complexity of the implemented algorithms is near-linear, as they scale near-linearly with data size and its dimensions. The results of the experiments (conducted on different machine learning data sets) prove the efficiency of the implemented algorithms and can therefore be recommended for finding the diameter to be used by different machine learning applications when needed.
- Referencias
- Cómo citar
- Del mismo autor
- Métricas
Agarwal, P. K., Har-Peled, S. & Varadarajan, K. R., 2005. Geometric approximation via coresets. Combinatorial and computational geometry, Volume 52, pp. 1-30.
Agarwal, P. K., Matoušek, J. & Suri, S., 1992. Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Computational Geometry, 1(4), pp. 189-201. - https://doi.org/10.1016/0925-7721(92)90001-9
Aggarwal, C. C., 2016. Outlier Analysis. Second Edition ed. New York: Springer.
Bespamyatnikh, S., 1998. An efficient algorithm for the three-dimensional diameter problem. s.l., s.n., p. 137-146.
Broder, A. Z., Glassman, S. C., Manasse, M. S. & Zweig, G., 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8), pp. 1157-1166. - https://doi.org/10.1016/S0169-7552(97)00031-7
Chan, T. M., 2002. Approximating the diameter, width, smallest enclosing cylinder, and minimum-width annulus. International journal of computational geometry and applications, 12(1), pp. 67-85. - https://doi.org/10.1142/S0218195902000748
Chazelle, B., Edelsbrunner, H., Guibas, L. & Sharir, M., 1992. Diameter, width, closest line pair, and parametric searching. s.l., ACM, pp. 120-129. - https://doi.org/10.1145/142675.142702
Clarkson, K. L. & Shor, P. W., 1989. Applications of random sampling in computational geometry, II. Discrete & Computational Geometry, 4(5), pp. 387-421. - https://doi.org/10.1007/BF02187740
Eg?eciog?lu, Ö. & Kalantari, B., 1989. Approximating the diameter of a set of points in the Euclidean space. Information Processing Letters, 32(4), pp. 205-211. - https://doi.org/10.1016/0020-0190(89)90045-8
Figueroa, K., Navarro, G. & Chavez, E., 2007. Metric Spaces Library. [Online]
Available at: http://www.sisap.org/Metric\_Space\_Library.html
Finocchiaro, D. V. & Pellegrini, M., 2002. On computing the diameter of a point set in high dimensional Euclidean space. Theoretical Computer Science, Volume 287, p. 501-514. - https://doi.org/10.1016/S0304-3975(01)00258-4
Gudivada, V. N. & Raghavan, V. V., 1995. Content based image retrieval systems. Computer, 28(9), pp. 18-22. - https://doi.org/10.1109/2.410145
Har-Peled, S., 2001. A practical approach for computing the diameter of a point set. s.l., ACM, pp. 177-186. - https://doi.org/10.1145/378583.378662
Hassanat, A. B. & Tarawneh, A. S., 2016. Fusion of Color and Statistic Features for Enhancing Content-Based Image Retrieval Systems. Journal of Theoretical & Applied Information Technology, 88(3).
Imanparast, M., Hashemi, S. N. & Mohades, A., 2016. An efficient approximation for point set diameter in fixed dimensions. preprint arXiv:1610.08543 [cs.CG].
Le Bourdais, F., 2015. The Farthest Neighbors Algorithm. [Online]
Available at: https://flothesof.github.io/farthest-neighbors.html
[Accessed 1 Feb 2018].
Lichman, M., 2013. [Online]
Available at: http://archive.ics.uci.edu/ml
Matoušek, J. & Schwarzkopf, O., 1993. A deterministic algorithm for the three-dimensional diameter problem. s.l., ACM, pp. 478-484. - https://doi.org/10.1145/167088.167217
Pagh, R., Silvestri, F., Sivertsen, J. & Skala, M., 2015. Approximate furthest neighbor in high dimensions. s.l., Springer, pp. 3-14. - https://doi.org/10.1007/978-3-319-25087-8_1
Pagh, R., Silvestri, F., Sivertsen, J. & Skala, M., 2017. Approximate furthest neighbor with application to annulus query. Information Systems, Volume 64, pp. 152-162. - https://doi.org/10.1016/j.is.2016.07.006
Preparata, F. & Shamos, M., 1985. Computational Geometry: an Introduction. New York: Springer. - https://doi.org/10.1007/978-1-4612-1098-6
Ramos, E., 1997. Construction of 1-d lower envelopes and applications. s.l., s.n., p. 57-66. - https://doi.org/10.1145/262839.262864
Ramos, E. A., 2000. Deterministic algorithms for 3-D diameter and some 2-D lower envelopes. s.l., ACM, pp. 290-299. - https://doi.org/10.1145/336154.336215
Supowit, K. J., 1990. New techniques for some dynamic closest-point and farthest-point problems. s.l., Society for Industrial and Applied Mathematics, pp. 84-90.
Williams, R., 2018. On the Difference Between Closest, Furthest, and Orthogonal Pairs: Nearly-Linear vs Barely-Subquadratic Complexity. s.l., Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1207-1215. - https://doi.org/10.1137/1.9781611975031.78
Yao, A. C. C., 1982. On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM Journal on Computing, 11(4), pp. 721-736. - https://doi.org/10.1137/0211059
Agarwal, P. K., Matoušek, J. & Suri, S., 1992. Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Computational Geometry, 1(4), pp. 189-201. - https://doi.org/10.1016/0925-7721(92)90001-9
Aggarwal, C. C., 2016. Outlier Analysis. Second Edition ed. New York: Springer.
Bespamyatnikh, S., 1998. An efficient algorithm for the three-dimensional diameter problem. s.l., s.n., p. 137-146.
Broder, A. Z., Glassman, S. C., Manasse, M. S. & Zweig, G., 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8), pp. 1157-1166. - https://doi.org/10.1016/S0169-7552(97)00031-7
Chan, T. M., 2002. Approximating the diameter, width, smallest enclosing cylinder, and minimum-width annulus. International journal of computational geometry and applications, 12(1), pp. 67-85. - https://doi.org/10.1142/S0218195902000748
Chazelle, B., Edelsbrunner, H., Guibas, L. & Sharir, M., 1992. Diameter, width, closest line pair, and parametric searching. s.l., ACM, pp. 120-129. - https://doi.org/10.1145/142675.142702
Clarkson, K. L. & Shor, P. W., 1989. Applications of random sampling in computational geometry, II. Discrete & Computational Geometry, 4(5), pp. 387-421. - https://doi.org/10.1007/BF02187740
Eg?eciog?lu, Ö. & Kalantari, B., 1989. Approximating the diameter of a set of points in the Euclidean space. Information Processing Letters, 32(4), pp. 205-211. - https://doi.org/10.1016/0020-0190(89)90045-8
Figueroa, K., Navarro, G. & Chavez, E., 2007. Metric Spaces Library. [Online]
Available at: http://www.sisap.org/Metric\_Space\_Library.html
Finocchiaro, D. V. & Pellegrini, M., 2002. On computing the diameter of a point set in high dimensional Euclidean space. Theoretical Computer Science, Volume 287, p. 501-514. - https://doi.org/10.1016/S0304-3975(01)00258-4
Gudivada, V. N. & Raghavan, V. V., 1995. Content based image retrieval systems. Computer, 28(9), pp. 18-22. - https://doi.org/10.1109/2.410145
Har-Peled, S., 2001. A practical approach for computing the diameter of a point set. s.l., ACM, pp. 177-186. - https://doi.org/10.1145/378583.378662
Hassanat, A. B. & Tarawneh, A. S., 2016. Fusion of Color and Statistic Features for Enhancing Content-Based Image Retrieval Systems. Journal of Theoretical & Applied Information Technology, 88(3).
Imanparast, M., Hashemi, S. N. & Mohades, A., 2016. An efficient approximation for point set diameter in fixed dimensions. preprint arXiv:1610.08543 [cs.CG].
Le Bourdais, F., 2015. The Farthest Neighbors Algorithm. [Online]
Available at: https://flothesof.github.io/farthest-neighbors.html
[Accessed 1 Feb 2018].
Lichman, M., 2013. [Online]
Available at: http://archive.ics.uci.edu/ml
Matoušek, J. & Schwarzkopf, O., 1993. A deterministic algorithm for the three-dimensional diameter problem. s.l., ACM, pp. 478-484. - https://doi.org/10.1145/167088.167217
Pagh, R., Silvestri, F., Sivertsen, J. & Skala, M., 2015. Approximate furthest neighbor in high dimensions. s.l., Springer, pp. 3-14. - https://doi.org/10.1007/978-3-319-25087-8_1
Pagh, R., Silvestri, F., Sivertsen, J. & Skala, M., 2017. Approximate furthest neighbor with application to annulus query. Information Systems, Volume 64, pp. 152-162. - https://doi.org/10.1016/j.is.2016.07.006
Preparata, F. & Shamos, M., 1985. Computational Geometry: an Introduction. New York: Springer. - https://doi.org/10.1007/978-1-4612-1098-6
Ramos, E., 1997. Construction of 1-d lower envelopes and applications. s.l., s.n., p. 57-66. - https://doi.org/10.1145/262839.262864
Ramos, E. A., 2000. Deterministic algorithms for 3-D diameter and some 2-D lower envelopes. s.l., ACM, pp. 290-299. - https://doi.org/10.1145/336154.336215
Supowit, K. J., 1990. New techniques for some dynamic closest-point and farthest-point problems. s.l., Society for Industrial and Applied Mathematics, pp. 84-90.
Williams, R., 2018. On the Difference Between Closest, Furthest, and Orthogonal Pairs: Nearly-Linear vs Barely-Subquadratic Complexity. s.l., Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1207-1215. - https://doi.org/10.1137/1.9781611975031.78
Yao, A. C. C., 1982. On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM Journal on Computing, 11(4), pp. 721-736. - https://doi.org/10.1137/0211059
Hassanat, A. (2018). Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space: Experimental Results. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 7(3), 15–30. https://doi.org/10.14201/ADCAIJ2018731530
Downloads
Download data is not yet available.
+
−