Designing a Web Spam Classifier Based on Feature Fusion in the Layered Multi-Population Genetic Programming Framework
Abstract Nowadays, Web spam pages are a critical challenge for Web retrieval systems which have drastic influence on the performance of such systems. Although these systems try to combat the impact of spam pages on their final results list, spammers increasingly use more sophisticated techniques to increase the number of views for their intended pages in order to have more commercial success. This paper employs the recently proposed Layered Multi-population Genetic Programming model for Web spam detection task as well application of correlation coefficient analysis for feature space reduction. Based on our tentative results, the designed classifier, which is based on a combination of easy to compute features, has a very reasonable performance in comparison with similar methods.
- Referencias
- Cómo citar
- Del mismo autor
- Métricas
Official Website of the Web Spam Challenge 2008, 2008, http://Webspam.lip6.fr/wiki/pmwiki.php?n=Main.PhaseIII, Accessed 17 August 2013
FETTERLY, D., MANASSE, M., & NAJORK, M. Spam, damn spam, and statistics: using statistical analysis to locate spam Web pages. In: 7th international workshop on the Web and Databases, pp. 1-6, 2004
NTOULAS, A., NAJORK, M., MANASSE, M., & FETTERLY, D. Detecting spam Web pages through content analysis. In: 15th international conference on World Wide Web, pp. 83-92, 2006
PISKORSKI, J., SYDOW, M., & WEISS D. Exploring linguistic features for Web spam detection: a preliminary study, In: 4th international workshop on Adversarial information retrieval on the Web, pp. 25-28, 2008
JACINT, I.B., ANDRAS, S., & BENCZUR, A. Latent Dirichlet Allocation in Web Spam Filtering. In: 4th international workshop on Adversarial information retrieval on the Web, pp. 29-32, 2008
MARTINEZ-ROMO, J., & ARAUJO, L. Web Spam Identification through Language Model Analysis. In: 5th international workshop on Adversarial Information Retrieval on the Web, pp. 21-28, 2009
BECCHETTI, L., CASTILLO, C., DONATO, D., LEONARDI, S., & BAEZA-YATES, R. Using rank propagation and probabilistic counting for link-based spam detection. In: Workshop on Web Mining and Web Usage Analysis, 2006
BECCHETTI, L., CASTILLO, C., DONATO, D., LEONARDI, S., & BAEZA-YATES, R. Link-based characterization and detection of Web spam. In: second international workshop on Adversarial information retrieval on the Web, 2006
GYONGYI, Z., GARCIA-MOLINA, H., & PEDERSEN, J. Combating Web spam with TrustRank, In: 30th international conference on Very large data bases, VLDB Endowment, pp. 576-587, 2004
ZHOU, D., BURGES, C., & TAO, T. Transductive link spam detection. In: 3rd international workshop on Adversarial information retrieval on the Web, pp. 21-28, 2007
ABERNETHY, J., CHAPELLE, O., & CASTILLO, C. Webspam identification through content and hyperlinks. In: 4th international workshop on Adversarial information retrieval on the Web, pp. 41-44, 2008
CASTILLO, C., DONATO, D., GIONIS, A., MURDOCK, V., & Silvestri, F. Know your neighbors: Web spam detection using the Web topology. In: 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 423-430, 2007
BENCZUR, A., BIRO, I., CSALOGANY, K., & UHER, M. Detecting nepotistic links by language model disagreement. In: 15th international conference on World Wide Web, pp. 939-940, 2006
NAJORK, M. Web Spam Detection, In: Encyclopedia of Database Systems, ed. by Liu, L., & Ozsu, M.T., pp. 3520-3523, 2009
GUYON, I., GUNN, S., NIKRAVESH, M., & ZADEH, L.A. Feature Extraction: Foundations and Applications, Series Studies in Fuzziness and Soft Computing, First ed., Springer, 2006
UK-2007 Dataset Website, 2008, http://www.yr-bcn.es/Webspam/datasets/uk2007/features/, Accessed 17 August 2013
L. DENOYER, Web Spam Challenge Results, 2008, http://airWeb.cse.lehigh.edu/2008/Web_spam_challenge/results.pdf, Accessed 17 August 2013
VOGEL, D.S., GOTTSCHALK, E., & WANG, M.C. Anti-matter detection: Particle Physics Model for KDD Cup 2004. ACM SIGKDD Explorations Newsletter, 6(2): 109-112, 2004
KOTSIANTIS, S.B. Supervised Machine Learning: A Review of Classification Techniques, Informatica, 31: 249-268, 2007
FETTERLY, D., MANASSE, M., & NAJORK, M. Spam, damn spam, and statistics: using statistical analysis to locate spam Web pages. In: 7th international workshop on the Web and Databases, pp. 1-6, 2004
NTOULAS, A., NAJORK, M., MANASSE, M., & FETTERLY, D. Detecting spam Web pages through content analysis. In: 15th international conference on World Wide Web, pp. 83-92, 2006
PISKORSKI, J., SYDOW, M., & WEISS D. Exploring linguistic features for Web spam detection: a preliminary study, In: 4th international workshop on Adversarial information retrieval on the Web, pp. 25-28, 2008
JACINT, I.B., ANDRAS, S., & BENCZUR, A. Latent Dirichlet Allocation in Web Spam Filtering. In: 4th international workshop on Adversarial information retrieval on the Web, pp. 29-32, 2008
MARTINEZ-ROMO, J., & ARAUJO, L. Web Spam Identification through Language Model Analysis. In: 5th international workshop on Adversarial Information Retrieval on the Web, pp. 21-28, 2009
BECCHETTI, L., CASTILLO, C., DONATO, D., LEONARDI, S., & BAEZA-YATES, R. Using rank propagation and probabilistic counting for link-based spam detection. In: Workshop on Web Mining and Web Usage Analysis, 2006
BECCHETTI, L., CASTILLO, C., DONATO, D., LEONARDI, S., & BAEZA-YATES, R. Link-based characterization and detection of Web spam. In: second international workshop on Adversarial information retrieval on the Web, 2006
GYONGYI, Z., GARCIA-MOLINA, H., & PEDERSEN, J. Combating Web spam with TrustRank, In: 30th international conference on Very large data bases, VLDB Endowment, pp. 576-587, 2004
ZHOU, D., BURGES, C., & TAO, T. Transductive link spam detection. In: 3rd international workshop on Adversarial information retrieval on the Web, pp. 21-28, 2007
ABERNETHY, J., CHAPELLE, O., & CASTILLO, C. Webspam identification through content and hyperlinks. In: 4th international workshop on Adversarial information retrieval on the Web, pp. 41-44, 2008
CASTILLO, C., DONATO, D., GIONIS, A., MURDOCK, V., & Silvestri, F. Know your neighbors: Web spam detection using the Web topology. In: 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 423-430, 2007
BENCZUR, A., BIRO, I., CSALOGANY, K., & UHER, M. Detecting nepotistic links by language model disagreement. In: 15th international conference on World Wide Web, pp. 939-940, 2006
NAJORK, M. Web Spam Detection, In: Encyclopedia of Database Systems, ed. by Liu, L., & Ozsu, M.T., pp. 3520-3523, 2009
GUYON, I., GUNN, S., NIKRAVESH, M., & ZADEH, L.A. Feature Extraction: Foundations and Applications, Series Studies in Fuzziness and Soft Computing, First ed., Springer, 2006
UK-2007 Dataset Website, 2008, http://www.yr-bcn.es/Webspam/datasets/uk2007/features/, Accessed 17 August 2013
L. DENOYER, Web Spam Challenge Results, 2008, http://airWeb.cse.lehigh.edu/2008/Web_spam_challenge/results.pdf, Accessed 17 August 2013
VOGEL, D.S., GOTTSCHALK, E., & WANG, M.C. Anti-matter detection: Particle Physics Model for KDD Cup 2004. ACM SIGKDD Explorations Newsletter, 6(2): 109-112, 2004
KOTSIANTIS, S.B. Supervised Machine Learning: A Review of Classification Techniques, Informatica, 31: 249-268, 2007
Keyhanipour, A. H., & Moshiri, B. (2013). Designing a Web Spam Classifier Based on Feature Fusion in the Layered Multi-Population Genetic Programming Framework. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 2(3), 15–27. https://doi.org/10.14201/ADCAIJ2014261527
Downloads
Download data is not yet available.
+
−