Designing a Web Spam Classifier Based on Feature Fusion in the Layered Multi-Population Genetic Programming Framework

Amir Hosein KEYHANIPOUR, Behzad MOSHIRI

Abstract


Nowadays, Web spam pages are a critical challenge for Web retrieval systems which have drastic influence on the performance of such systems. Although these systems try to combat the impact of spam pages on their final results list, spammers increasingly use more sophisticated techniques to increase the number of views for their intended pages in order to have more commercial success. This paper employs the recently proposed Layered Multi-population Genetic Programming model for Web spam detection task as well application of correlation coefficient analysis for feature space reduction. Based on our tentative results, the designed classifier, which is based on a combination of easy to compute features, has a very reasonable performance in comparison with similar methods.


Keywords


Web Spam; Feature Fusion; Layered Multi-Population Genetic Programming

Full Text:

PDF

References


Official Website of the Web Spam Challenge 2008, 2008, http://Webspam.lip6.fr/wiki/pmwiki.php?n=Main.PhaseIII, Accessed 17 August 2013

FETTERLY, D., MANASSE, M., & NAJORK, M. Spam, damn spam, and statistics: using statistical analysis to locate spam Web pages. In: 7th international workshop on the Web and Databases, pp. 1-6, 2004

NTOULAS, A., NAJORK, M., MANASSE, M., & FETTERLY, D. Detecting spam Web pages through content analysis. In: 15th international conference on World Wide Web, pp. 83-92, 2006

PISKORSKI, J., SYDOW, M., & WEISS D. Exploring linguistic features for Web spam detection: a preliminary study, In: 4th international workshop on Adversarial information retrieval on the Web, pp. 25-28, 2008

JACINT, I.B., ANDRAS, S., & BENCZUR, A. Latent Dirichlet Allocation in Web Spam Filtering. In: 4th international workshop on Adversarial information retrieval on the Web, pp. 29-32, 2008

MARTINEZ-ROMO, J., & ARAUJO, L. Web Spam Identification through Language Model Analysis. In: 5th international workshop on Adversarial Information Retrieval on the Web, pp. 21-28, 2009

BECCHETTI, L., CASTILLO, C., DONATO, D., LEONARDI, S., & BAEZA-YATES, R. Using rank propagation and probabilistic counting for link-based spam detection. In: Workshop on Web Mining and Web Usage Analysis, 2006

BECCHETTI, L., CASTILLO, C., DONATO, D., LEONARDI, S., & BAEZA-YATES, R. Link-based characterization and detection of Web spam. In: second international workshop on Adversarial information retrieval on the Web, 2006

GYONGYI, Z., GARCIA-MOLINA, H., & PEDERSEN, J. Combating Web spam with TrustRank, In: 30th international conference on Very large data bases, VLDB Endowment, pp. 576-587, 2004

ZHOU, D., BURGES, C., & TAO, T. Transductive link spam detection. In: 3rd international workshop on Adversarial information retrieval on the Web, pp. 21-28, 2007

ABERNETHY, J., CHAPELLE, O., & CASTILLO, C. Webspam identification through content and hyperlinks. In: 4th international workshop on Adversarial information retrieval on the Web, pp. 41-44, 2008

CASTILLO, C., DONATO, D., GIONIS, A., MURDOCK, V., & Silvestri, F. Know your neighbors: Web spam detection using the Web topology. In: 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 423-430, 2007

BENCZUR, A., BIRO, I., CSALOGANY, K., & UHER, M. Detecting nepotistic links by language model disagreement. In: 15th international conference on World Wide Web, pp. 939-940, 2006

NAJORK, M. Web Spam Detection, In: Encyclopedia of Database Systems, ed. by Liu, L., & Ozsu, M.T., pp. 3520-3523, 2009

GUYON, I., GUNN, S., NIKRAVESH, M., & ZADEH, L.A. Feature Extraction: Foundations and Applications, Series Studies in Fuzziness and Soft Computing, First ed., Springer, 2006

UK-2007 Dataset Website, 2008, http://www.yr-bcn.es/Webspam/datasets/uk2007/features/, Accessed 17 August 2013

L. DENOYER, Web Spam Challenge Results, 2008, http://airWeb.cse.lehigh.edu/2008/Web_spam_challenge/results.pdf, Accessed 17 August 2013

VOGEL, D.S., GOTTSCHALK, E., & WANG, M.C. Anti-matter detection: Particle Physics Model for KDD Cup 2004. ACM SIGKDD Explorations Newsletter, 6(2): 109-112, 2004

KOTSIANTIS, S.B. Supervised Machine Learning: A Review of Classification Techniques, Informatica, 31: 249-268, 2007




DOI: http://dx.doi.org/10.14201/ADCAIJ2014261527





Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

Clarivate Analytics