Google Flu Trends And The Forgotten Facebook Lexicon
We all heard about Google Flu Trends which was released this week which showed some impressive relationships to the U.S. Centers for Disease Control and Prevention (CDC) which can be seen below:

This got me thinking about a feature released by Facebook on April 15th 2008 called Facebook Lexicon. What’s that? No, I wouldn’t expect you to know as it received so little press that you probably missed it. So, what does it do? Unlike Google Trends or Technorati which can be used for trending keywords in search, websites or blogs, Lexicon shows trends on the public and semi-public forums across Facebook (also known as Walls). OK, so why are you telling me this?
Hypothesis
If you think about what Google Flu Trends is showing it’s the search frequency of anything related to “flu” typed in by Google users across the world. This could be genuine searches for symptoms, remedies and treatments, or it may just be kids searching for answers to a biology quiz. Whilst I imagine Google applies some form of filtering and smoothing algorithms on the data there is always room for error. But with a tool such as Facebook Lexicon if a user is posting on a friend’s wall that they can’t come out because they have the flu, or are setting their status to having flu, it is potentially going to be more accurate than the Google Flu Trends data. Why do I say this? Because of the social component. Facebook is not a search engine in the same respect that people would search Google for symptoms of flu, and so a greater majority of occurrences of the word “flu” would directly relate to somebody either having it, or knowing a friend who did.
Methods and Results
Using Facebook Lexicon I searched for both “flu” and “cold” and generated the following:
![]()
Unfortunately the data only goes back to October 2007 (Facebook have presumably only allowed access to a subset of their data) but can you see the same increase around January and the tail off into April that was shown in the Google Flu Trends video and the CDC data? Co-incidence, I think not.
Further Study
I would be interested to explore this further across other tools and platforms, such as Bebo, Digg, or MySpace to see if trends could be compared across potentially different user demographics. For example, would MySpace users make more references to “(un)safe sex”? Would Bebo users make more references to “periods” or “masturbation”? Another potential area of study could be comparing trends across different user locations to identify global health “communities”.
Conclusions
Tools such as Google Flu Trends, Google Trends, or Technorati can be powerful at predicting trends within Internet searches. However, social tools such as Facebook Lexicon offer an even more powerful tool for trending human interactions across the social web. There are massive implications for health care if data can be mined in a sensible way using these tools. For example, what impact would releasing demographic data alongside keywords have? What if we could tell who was talking about sensitive issues such as “safe sex”, “diabetes” or even “AIDS”? I predict that this is only the beginning and as companies start releasing more tools such as Google Flu Trends or Facebook Lexicon we will begin to uncover more knowledge about ourselves than we ever thought possible. And if we as the general public are able to do this sort of analysis, think about what major corporations and governments will be able to do.
Update: Excellent round-up of privacy concerns and Google Flu Trends by U.S. health care lawyer Bob Coffield.
This is just a quick little note to say that if you enjoyed this post you can follow me on twitter too.

