Revealing non-alphabetical guises of spam-trigger vocables

Unsolicited bulk email (spam) nowadays accounts for nearly 75% of daily email traffic, a figure that speaks strongly for the need of finding better protection mechanisms against its dissemination. A clever trick recently exploited by email spammers in order to circumvent textual-based filters, invol...

Full description

Autores:
Rojas Galeano, Sergio Andres
Tipo de recurso:
Article of journal
Fecha de publicación:
2013
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/42641
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/42641
http://bdigital.unal.edu.co/32738/
Palabra clave:
Uncovering of spam vocables
approximate string matching algorithm
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
Description
Summary:Unsolicited bulk email (spam) nowadays accounts for nearly 75% of daily email traffic, a figure that speaks strongly for the need of finding better protection mechanisms against its dissemination. A clever trick recently exploited by email spammers in order to circumvent textual-based filters, involves obfuscation of black-listed words with visually equivalent text substitutions from non-alphabetic symbols, in such a way it still conveys the semantics of the original word to the human eye (e.g. masking viagra as v1@gr@ or as v-i-a-g-r-a). In this paper we discuss how a simple-yet-effective adaptation of a classical algorithm for string matching may meet this stylish challenge to effectively reveal the similarity between genuine spam-trigger terms with their disguised alpha-numeric variants.