Estimating crowd sizes with mobile phones and Twitter
The use of mobile phones and Twitter can offer accurate estimates of crowd sizes, according to new findings from the Data Science Lab at the Warwick Business School.
Federico Botta, Suzy Moat and Tobias Preis of Warwick Business School analysed Twitter and mobile phone data from Milan, Italy, and found that they could estimate attendance numbers for football matches at the San Siro stadium, as well as the number of people at Linate Airport at any given time.
Their research, published in Royal Society Open Science today, could be of value in a range of emergency situations, such as evacuations and crowd disasters.
"Measuring crowd size is a difficult task, as the hugely varying estimates we see of the number of people at protests underline," Botta begins. "Given that most people now carry a mobile phone with them, we wondered if we could measure the number of people in a given location simply by analysing data on usage of these mobile phones.
"We found that this automatically generated data provides an excellent basis for estimating the size of a crowd," Botta says. "Quick and accurate measurements of crowd size could be of vital use for police and other authorities charged with avoiding crowd disasters."
In the paper, Quantifying crowd size with mobile phone and Twitter data, the scientists analysed two months of both Twitter data and mobile phone data from Milan, from November 1 to December 31, 2013. The mobile phone activity dataset was provided by Telecom Italia and reflects both the volume of outgoing and incoming calls and text messages, as well as the number of active internet connections. Both datasets make it possible for the scientists to determine not only when mobile phones were active, but where their users were.
Remarkably, they found that the size of spikes in Twitter and mobile phone activity allowed them to estimate the number of attendees at football matches in the San Siro stadium, home of AC Milan and Internazionale.
"We plotted mobile phone calls, Twitter and SMS activity in the geographical area in which the San Siro is located and in all three we observed 10 distinct spikes," Tobias Preis, Associate Professor of Behavioural Science and Finance, says. "We found that the dates these spikes occurred coincided exactly with the dates on which the 10 football matches took place in the stadium.
"Furthermore, we noted that the relative sizes of the spikes strongly resembled the official attendance figure for each match. By drawing on historic internet activity in the San Siro, we were able to generate estimates of the number of attendees which fell within 13% of the true value," Preis says
Suzy Moat, Assistant Professor of Behavioural Science, continues, "One of the key challenges we faced was to identify situations for which we had a reliable measurement of the number of people present, against which we could calibrate our method.
"The football stadium at the San Siro was ideal, as football fans need to buy a ticket to attend a match. We found that data on nine football matches was sufficient for us to generate accurate estimates of the number of people attending a 10th match," she says.
"The relationship between data on internet usage and match attendance was strongest of all – perhaps because smartphones automatically check services such as email, without the need for the user to actively intervene."
The researchers also investigated how mobile phone and Twitter usage related to passenger activity at Linate Airport. While exact passenger counts were not available, the researchers estimated the number of people in the airport by assuming passengers arrived two hours before their flight and those landing left an hour after touching down.
Botta says, "Again, we found that the greater phone call and SMS activity corresponded with a larger number of estimated passengers at the airport. Similarly, we discovered that greater internet activity related to a higher estimated number of passengers and the same with Twitter activity.
He says, "The relationships are weaker than those found in the case study at San Siro, but remarkable given the coarse nature of our estimate of the number of passengers.
Preis adds, "Our research provides evidence that accurate estimates of the number of people in a given location at a given time can be extrapolated from mobile phone or Twitter data.
"This shows that data generated through everyday interactions with our mobile phones could be of clear value for a range of business and policy stakeholders, potentially offering an almost instant measurement of the size of crowds," says Botta.