Lapidarium notes RSS

Amira Skomorowska's notes

"Everything you can imagine is real."— Pablo Picasso

Homepage
Lapidarium
Reading Space
A Box Of Stories

Tags:

Twitter

Facebook

Contact

Archive

Sep
11th
Sun
permalink

Supercomputer predicts revolution: Forecasting large-scale human behavior using global news media tone in time and space


Figure 1. Global geocoded tone of all Summary of World Broadcasts content, 2005. Note: Click on image to see animation.

“Feeding a supercomputer with news stories could help predict major world events, according to US research.

While the analysis was carried out retrospectively, scientists say the same processes could be used to anticipate upcoming conflict. (…)

The study’s information was taken from a range of sources including the US government-run Open Source Centre and BBC Monitoring, both of which monitor local media output around the world.

News outlets which published online versions were also analysed, as was the New York Times’ archive, going back to 1945.

In total, Mr Leetaru gathered more than 100 million articles.

Reports were analysed for two main types of information: mood - whether the article represented good news or bad news, and location - where events were happening and the location of other participants in the story.

Mood detection, or “automated sentiment mining” searched for words such as “terrible”, “horrific” or “nice”.

Location, or “geocoding” took mentions of specific places, such as “Cairo” and converted them in to coordinates that could be plotted on a map.

Analysis of story elements was used to create an interconnected web of 100 trillion relationships. (…)

The computer event analysis model appears to give forewarning of major events, based on deteriorating sentiment.

However, in the case of this study, its analysis is applied to things that have already happened.

According to Kalev Leetaru, such a system could easily be adapted to work in real time, giving an element of foresight. (…)

“It looks like a stock ticker in many regards and you know what direction it has been heading the last few minutes and you want to know where it is heading in the next few.

“It is very similar to what economic forecasting algorithms do.” (…)

“The next iteration is going to city level and beyond and looking at individual groups and how they interact.

“I liken it to weather forecasting. It’s never perfect, but we do better than random guessing.”

Supercomputer predicts revolution, BBC News, 9 September 2011

Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space

“News is increasingly being produced and consumed online, supplanting print and broadcast to represent nearly half of the news monitored across the world today by Western intelligence agencies. Recent literature has suggested that computational analysis of large text archives can yield novel insights to the functioning of society, including predicting future economic events. Applying tone and geographic analysis to a 30–year worldwide news archive, global news tone is found to have forecasted the revolutions in Tunisia, Egypt, and Libya, including the removal of Egyptian President Mubarak, predicted the stability of Saudi Arabia (at least through May 2011), estimated Osama Bin Laden’s likely hiding place as a 200–kilometer radius in Northern Pakistan that includes Abbotabad, and offered a new look at the world’s cultural affiliations. Along the way, common assertions about the news, such as “news is becoming more negative” and “American news portrays a U.S.–centric view of the world” are found to have merit.”

The emerging field of Culturomics” seeks to explore broad cultural trends through the computerized analysis of vast digital book archives, offering novel insights into the functioning of human society (Michel, et al., 2011). Yet, books represent the “digested history” of humanity, written with the benefit of hindsight. People take action based on the imperfect information available to them at the time, and the news media captures a snapshot of the real–time public information environment (Stierholz, 2008). News contains far more than just factual details: an array of cultural and contextual influences strongly impact how events are framed for an outlet’s audience, offering a window into national consciousness (Gerbner and Marvanyi, 1977). A growing body of work has shown that measuring the “tone” of this real–time consciousness can accurately forecast many broad social behaviors, ranging from box office sales (Mishne and Glance, 2006) to the stock market itself (Bollen, et al., 2011). (…)


Figure 2. Global geocoded tone of all Summary of World Broadcasts content, January 1979–April 2011 mentioning “bin Laden”

Most theories of civilizations feature some approximation of the degree of conflict or cooperation between each group. Figure 3 displays the average tone of all links between cities in each civilization, visualizing the overall “tone” of the relationship between each. Group 1, which roughly encompasses the Asiatic and Australian regions, has largely positive links to the rest of the world and is the only group with a positive connection to Group 4 (Middle East). Group 3 (Africa) has no positive links to any other civilization, while Group 2  (North and South America excluding Canada) has negative links to all but Group 1. As opposed to explicit measures of conflict or cooperation based on armed conflict or trade ties, this approach captures the latent view of conflict and cooperation as portrayed by the world’s news media.

            
Figure 3. Average tone of links between world “civilizations” according to SWB, 1979–2009.

Figure 4 shows the world civilizations according to the New York Times 1945–2005. It divides the world into five civilizations, but paints a very different picture of the world, with a far greater portion of the global landmass arrayed around the United States. Geographic affinity appears to play a far lesser role in this grouping, and the majority of the world is located in a single cluster with the United States. It is clear from comparing the SWB and NYT civilization maps that even within the news media there is no one “universal” set of civilizations, but that each country’s media system may portray the world very differently to its audience. By pooling all of these varied viewpoints together, SWB’s view of the world’s civilizations offers a “crowdsourced” aggregate view of civilization, but it too is likely subject to some innate Western bias.


Figure 4. World “civilizations” according to NYT, 1945–2005. A full–resolution version of this figure is available here

Monitoring first broadcast then print media over the last 70 years, nearly half of the annual output of Western intelligence global news monitoring is now derived from Internet–based news, standing testament to the Web’s disruptive power as a distribution medium. Pooling together the global tone of all news mentions of a country over time appears to accurately forecast its near–term stability, including predicting the revolutions in Egypt, Tunisia, and Libya, conflict in Serbia, and the stability of Saudi Arabia.

Location plays a critical role in news reporting, and “passively crowdsourcing” the media to find the locations most closely associated with Bin Laden prior to his capture finds a 200km – wide swath of northern Pakistan as his most likely hiding place, an area which contains Abbottabad, the city he was ultimately captured in. Finally, the geographic clustering of the news, the way in which it frames localities together, offers new insights into how the world views itself and the “natural civilizations” of the news media.

While heavily biased and far from complete, the news media captures the only cross–national real–time record of human society available to researchers. The findings of this study suggest that Culturomics, which has thus far focused on the digested history of books, can yield intriguing new understandings of human society when applied to the real–time data of news. From forecasting impending conflict to offering insights on the locations of wanted fugitives, applying data mining approaches to the vast historical archive of the news media offers promise of new approaches to measuring and understanding human society on a global scale.”

Kalev Leetaru is Senior Research Scientist for Content Analysis at the Institute for Computing in the Humanities, Arts, and Social Science at the University of Illinois, Center Affiliate of the National Center for Supercomputing Applications, and Research Coordinator at the University of Illinois Cline Center for Democracy. His award-winning work centers on the application of high performance computing to grand challenge problems using news and open sources intelligence. He holds three US patents and more than 50 University Invention Disclosures.

To see full research click University of Illinois at Chicago - UI, Volume 16, Number 9, 5 September 2011

See also:

Culturomics: Quantitative Analysis of Culture Using Millions of Digitized Books

“Construct a corpus of digitized texts containing about 4% of all books ever printed, and then analyze that corpus using advanced software and the investigatory curiosity of thousands, and you get something called “Culturomics,” a field in which cultural trends are represented quantitatively.

In this talk Erez Lieberman Aiden and Jean-Baptiste Michel — co-founders of the Cultural Observatory at Harvard and Visiting Faculty at Google — show how culturomics can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology.”

— E. Lieberman Aiden, Harvard Society of Fellows & Jean-Baptiste Michel, FQEB Fellow at Harvard, Culturomics: Quantitative Analysis of Culture Using Millions of Digitized Books, May 10, 2011

See also:

What we learned from 5 million books, TED.com, 2011 (video)