Today I started to play with multithreading. Its something I’ve been meaning to do for quite some time, but haven’t gotten around to it.
Multithreading is basically getting a single process to execute code to split off into another thread and operate independently, to run in parallel. Here is the wikipedia article if you want to know more about it. The reason why its so important for Twitter Spam Assassin is because of how I receive tweets from Twitter. Essentially, I open a connection with Twitter asking them to send me tweets from the public timeline. Twitter sends me a tweet, my code looks at it, and when I’m done processing that tweet, Twitter sends me another one.
If I introduce multithreading, each incoming tweet would spin off into a separate thread to be analyzed. This allows me to receive the next tweet immediately and NOT wait for the analyzation of the last tweet to finish (which can take up to 3 seconds). You may to yourself, 3 seconds isn’t a long time, and you’d be right. But if you keep in mind that i’m analyzing thousands of tweets a day and I can get around about 45 tweets per second (best estimate), the numbers get large very quickly.
But with anything, there are fall backs. One of my concerns is that my server(s) cannot analyze as quickly as the tweets are coming in, which Im not sure if thats true or not at the moment. If it is true, there are ways to deal with that like load balancing.
But that isn’t the real reason why I haven’t explored multithreading in the past. The reason why is because of Twitter’s API call limit. If i start processing more tweets on a server, I’ll need to make more API request to Twitter per day. And I’m already hitting my call request cap several times a day. With a little testing, I find that if I enable multithreading I find that I see enormous performance boosts. But I use all of my API call requests very quickly and only have my code search for spam for 10 minutes per hour!
The good news is I am doing my best to work with Twitter and increase my API call limit. Getting some assistance from them would really increase the efficiency of finding and reporting spam.
