Java Spam Filter
        Posted  
        
            by JackSparrow
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by JackSparrow
        
        
        
        Published on 2010-04-30T13:19:52Z
        Indexed on 
            2010/04/30
            13:27 UTC
        
        
        Read the original article
        Hit count: 546
        
java
I'm trying to create a spam filter in Java using the Bayesian algorithm.
I use a text file that contains email messages and split the tokens using regex, storing these values into a hashmap.
My problem is, with regex, the email addresses are split so instead of: [email protected]
regex causes the token to be: john smith example
The same holds true for ip addresses, so for example, instead of: 192.55.34.322
regex splits the tokens to be: 192 55 34 322
So does anybody know of a way that I could read the email messages and store their contents as is?
© Stack Overflow or respective owner