Now that i’ve opened the flood gates of tweets to my code, I started to run into an issue where my MySQL (Amazon RDS) was starting to fail under the pressure. Simply, I was making too many queries for it to handle. To remedy that, I’ve added LRU (Least Recently Used) caching algorithm to my code. This allows me to cache a large list of twitter accounts that I’ve already analyzed and not have to ask the db if I’ve seen this account or not.
When my code starts up, it loads a very large list of all active twitter accounts that I’ve already looked at into memory and then is maintained locally after that. This allows me to make only one query to my db instead of thousands (or even hundred of thousands). It seems to of solved my db crashing issue that I was running into. At least, so far it seems that way.
