Global Attention Profiles – A working paper

First steps towards a quantitative approach to the study of media attention

Ethan Zuckerman

 

Results

GAP’s first aim is to provide a picture of a media source’s attention profile on a given day. Because the scrapers are constrained by the date range of the given search engine, a given picture might represent a time period from the past 14 days to the past several years.

 

The following is a picture of Reuters’ attention profile for June 11th – 25th. The coloring of the map represents what percentage of stories detected by the GAP scraper reference a particular nation. A search for Iraq turns up 1,352 stories, of 13,360 total retrieved in this time period, or 10.11% – as a result, Iraq is colored bright red. Algeria, by contrast, retrieves 2 stories, or 0.015%, and is colored deep blue. In the two week period, there are no stories about Mauritania, Turkmenistan, Madagascar and a few others, so they are colored grey.

 

 

A map for this brief a time period does a good job of revealing breaking stories. Liberia and neighboring Sierra Leone stand out in red, against neighbors in blue and grey, due to rebel activity in Liberia, some from bases in Sierra Leone. The area around Iraq is still bright red, in the aftermath of the US/UK invasion. These maps change quite quickly – a map taken two weeks later will use data that has no overlap with data plotted in this map, and it’s likely there will be some major coloration change. (Readers can check – a map approximately two weeks later is available online at http://h2odev.law.harvard.edu/ezuckerman/maps/reuthits20030711.jpg)

 

Maps of longer time periods are useful to get a clearer sense for overall media trends. The following map of CNN represents stories from 1996 to the present. As a result, it does a poor job of showing current stories, but a better job of showing overall patterns of coverage.

 

 

Generally speaking, coverage is concentrated in Western Europe, the Middle East and Southeast Asia, with good coverage in the large economies of China, Japan, Mexico and Canada. There is very little coverage in most of Africa (Kenya, with the 1998 US embassy bombings, and South Africa, the largest economy in the region are exceptions), in Central Asia, Eastern Europe and in most of Central and South America (the large economies of Mexico, Brazil and Argentina are exceptions.

 


A map of BBC coverage for a similar time period (1997 – present) contrasts sharply:

 

 

The pastel tones imply a more even media distribution than on the CNN map. (If each country got the same number of stories, each would have 0.54% of stories and would be colored light pink.) Africa is a major contrast – while French-speaking parts of West Africa are blue, the English speaking parts of West Africa, as well as most of East and Southern Africa, are well covered. Central Asia and Central America are still sparsely represented, and there is a surprising blue patch over Scandinavia, better represented on the CNN map. (It is possible that some of the low counts in Western Europe are a result of BBC’s tendency to refer to European cities without mentioning the country they’re located in, something American media sources do less frequently.)

 

Such different maps suggest that BBC and CNN have different criteria for story selection, place reporters differently, and generally have a different way of paying attention. Correlating story counts to population and GDP bears this suspicion out. CNN shows correlation to population (R2=0.49), but much stronger correlation to GDP (R2=0.69). BBC is just the opposite – it is loosely correlated to GDP (R2=0.38) but tightly correlated to population (R2=0.67)[1].

 

The maps thus far speak volumes about how stories are distributed, but not how they should be distributed. In every map generated thus far, Iraq has at least 1% of total stories, more than 3.2% in two of the three maps. Is Iraq receiving more attention than one would generally expect, due to the recent war, or is Iraq sufficiently important to warrant this attention?

 

To answer this question, GAP estimates likely story distributions, extrapolating from actual story distributions.

 

An extremely naďve estimator model would make the assumption that every nation should receive the same amount of attention. Thus, one would assume each nation should have 0.54% of retrieved stories, and would mark nations that received more as “high attention” and those receiving fewer as “low attention”. This model does not stand up to close examination, though – is it really reasonable to expect Tonga to receive as much attention, with a population of 100,000, as China would with a population of 1.3 billion?

 

Acknowledging this problem, one might advance the “Andy Warhol model” – an assumption that everyone will receive 15 minutes of fame – and assume that story distribution would be directly proportional to population distribution. For this to hold true, every story on Tonga would be counterbalanced by 12,700 stories on China. Obviously, this is not the case – even a small nation like Tonga appears periodically in mainstream media, if only to acknowledge its participation in UN votes or international rugby matches.

 

To create a less naďve population-based estimator model, actual results from the scrapers are examined, to look for correlation between population and story count. On news.google.com, a loose correlation exists (R2=0.45) between population and story count. Using the equation from the best fit curve, one can speculate what a story distribution would be if story count and population were perfectly correlated. The next step is to compare actual distribution to this estimation and map the differences. (This process is described in more detail in the Correlation subsection of the preceding Methodology section.


 

Here is the resulting map for news.google.com on June 27, 2003

 

 

Western Europe, Australia, New Zealand and Canada are in shades of red – each has a comparatively small population but receives a good deal of media attention, generally 2 to 4 times what one would expect based on their population. The Middle East, the Korean Peninsula and a few African nations appear in red, probably due to breaking news (the ongoing violence in Liberia, Mugabe’s struggle for power in Zimbabwe, North Korea’s nuclear threats).

 

Most of Central and South America are blue, as is most of the African continent, Eastern Europe and Central Asia. Some countries receive fewer than 1/4 of the coverage one would expect based on their population. China, Indonesia and India, three of the four most populous nations, show up medium blue – with such large populations, they would need a large number of stories to meet their expected distributions.

 

If one expects the 4,500 news sources tracked by Google News[2] to represent the world’s population evenly, this map suggests imminent disappointment, especially if one is searching for news on poor countries. Given this map’s resemblance to a map of GDP per capita (rich nations in red, poor ones in blue), a logical next step is to build a second estimate based on national GDP. Using the same technique, the following map is generated, representing deviation of actual Google News results from a GDP-based estimation on June 27, 2003:

 

The predominance of white and pastel colors reflects the fact that GDP is far more closely correlated to Google News results than population (R2=0.62 versus R2=0.45) Several large economies – Western Europe, Japan, South Korea – suddenly appear as underrepresented, while a number of African nations register as over-represented. Central Asia and South America remain blue, less represented than would be expected in either GDP or population terms.

 

CNN’s GDP and population maps give one a sense for how these variations play out in the long term, as CNN results represent over half a decade of data.


 

CNN variation from population estimates, June 27, 2003:

 

 

Over the long term, Africa, South and Central America, Eastern Europe and Central Asia receive less attention than predicted, while Western Europe, the Middle East, Russia and Oceania receive more than predicted. The picture in terms of GDP is quite different:

 

While West Africa still goes largely unwatched, Southern and Central Africa receive attention disproportionate to their economies. Parts of Central Asia now see attention proportional to their GDP, if not to their population. Southeast Asia also receives more attention than would be predicted, while parts of Western Europe receive less than anticipated.

 

Largely unchanged between the two maps is the Middle East, better represented than one should expect based on either population or GDP. South and Central America go underrepresented by both estimations. Especially interesting are Brazil and Argentina, large countries (5th and 31st in 2001 population) with large economies (11th and 17th respectively in 2001 GDP).

Since neither population nor GDP gives a fully accurate estimate of story distribution, one may ask whether any other factor provides a better picture of how stories are distributed. To answer this question, the results from all nine scrapers were correlated with 21 data sets provided by the World Bank. A chart below summarizes the correlations:


 

Values shown are the value of the squared correlation (R2) between the data set and the power series regression equation. For all correlations, p<0.0001.

 

 

AP

AltaVista

BBC

CNN

Google

NYPost

NYTimes

Reuters

WPost

Average

GDP

0.53

0.66

0.38

0.69

0.62

0.64

0.66

0.52

0.53

0.58

Goods and service imports

0.53

0.67

0.31

0.69

0.64

0.62

0.66

0.52

0.53

0.58

Total PCs

0.53

0.66

0.35

0.69

0.62

0.64

0.64

0.49

0.53

0.57

Urban Population

0.50

0.55

0.62

0.58

0.53

0.49

0.59

0.50

0.53

0.55

Goods and service exports

0.50

0.64

0.29

0.66

0.61

0.58

0.62

0.46

0.49

0.54

Military personnel

0.41

0.45

0.58

0.56

0.48

0.41

0.50

0.45

0.44

0.47

Internet Users

0.44

0.58

0.24

0.59

0.55

0.53

0.52

0.40

0.44

0.48

Population

0.42

0.45

0.67

0.49

0.45

0.37

0.50

0.44

0.44

0.47

Mobile Phones

0.42

0.55

0.26

0.53

0.52

0.50

0.53

0.45

0.42

0.47

Literate Population

0.40

0.44

0.64

0.49

0.46

0.38

0.48

0.38

0.46

0.46

Aircraft Departures

0.37

0.52

0.23

0.53

0.50

0.42

0.42

0.35

0.41

0.42

Kilometers of road

0.37

0.41

0.45

0.44

0.41

0.36

0.44

0.36

0.34

0.40

Foreign direct investment

0.35

0.46

0.14

0.45

0.41

0.46

0.42

0.34

0.37

0.38

Tourism, arrivals

0.34

0.44

0.13

0.42

0.42

0.42

0.44

0.34

0.35

0.37

Currency transfer from abroad

0.32

0.34

0.24

0.36

0.32

0.40

0.42

0.24

0.32

0.33

Tourism, receipts

0.27

0.44

0.08

0.41

0.37

0.41

0.38

0.30

0.35

0.33

Arable Land

0.28

0.28

0.50

0.29

0.29

0.24

0.31

0.29

0.30

0.31

Workers remittances

0.20

0.30

0.27

0.35

0.31

0.48

0.35

0.20

0.23

0.30

Surface area

0.21

0.17

0.40

0.21

0.18

0.14

0.23

0.20

0.19

0.22

Freshwater resources

0.07

0.08

0.19

0.14

0.08

0.07

0.09

0.06

0.12

0.10

Development assistance

0.09

0.07

0.14

0.08

0.08

0.07

0.10

0.10

0.10

0.09

 

Five of the factors – total GDP, imports and exports of goods and services, total personal computers nationwide and urban population – correlate well with scraper results: their squared correlation(R2) is 0.5 or better, which means that more than half the data distribution is explainable by the correlating equation. Seven factors – military personnel, Internet users, population, literate population, aircraft departures and kilometers of road – are loosely correlated to scraper results: their R2 correlation is above 0.4. The remaining nine factors are probably not correlated to article distribution as reported by scrapers.

 

GDP and Goods and Service Imports tie for highest correlation, with R2=0.58 on average. All data sets except BBC are most closely correlated to either GDP or goods and service imports – BBC, alone, is most closely correlated to population.

 

Dividing the 21 World Bank data sets into five categories – Economic Indicators, Population Indicators, Technology Indicators, Globalization Indicators and Physical Indicators – helps provide a sense for what types of data correlate most closely to story distribution:


 

 

AP

AltaVista

BBC

CNN

Google

NYPost

NYTimes

Reuters

WPost

Average

Economic indicators

 

 

 

 

 

 

 

 

 

 

GDP

0.53

0.66

0.38

0.69

0.62

0.64

0.66

0.52

0.53

0.58

Goods and service imports

0.53

0.67

0.31

0.69

0.64

0.62

0.66

0.52

0.53

0.58

Goods and service exports

0.50

0.64

0.29

0.66

0.61

0.58

0.62

0.46

0.49

0.54

Foreign direct investment

0.35

0.46

0.14

0.45

0.41

0.46

0.42

0.34

0.37

0.38

 

 

 

 

 

 

 

 

 

 

 

Population indicators

 

 

 

 

 

 

 

 

 

 

Urban Population

0.50

0.55

0.62

0.58

0.53

0.49

0.59

0.50

0.53

0.55

Military personnel

0.41

0.45

0.58

0.56

0.48

0.41

0.50

0.45

0.44

0.47

Population

0.42

0.45

0.67

0.49

0.45

0.37

0.50

0.44

0.44

0.47

Literate Population

0.40

0.44

0.64

0.49

0.46

0.38

0.48

0.38

0.46

0.46

 

 

 

 

 

 

 

 

 

 

 

Technology indicators

 

 

 

 

 

 

 

 

 

 

Total PCs

0.53

0.66

0.35

0.69

0.62

0.64

0.64

0.49

0.53

0.57

Internet Users

0.44

0.58

0.24

0.59

0.55

0.53

0.52

0.40

0.44

0.48

Mobile Phones

0.42

0.55

0.26

0.53

0.52

0.50

0.53

0.45

0.42

0.47

 

 

 

 

 

 

 

 

 

 

 

Globalization indicators

 

 

 

 

 

 

 

 

 

 

Aircraft Departures

0.37

0.52

0.23

0.53

0.50

0.42

0.42

0.35

0.41

0.42

Tourism, arrivals

0.34

0.44

0.13

0.42

0.42

0.42

0.44

0.34

0.35

0.37

Currency transfer from abroad

0.32

0.34

0.24

0.36

0.32

0.40

0.42

0.24

0.32

0.33

Tourism, receipts

0.27

0.44

0.08

0.41

0.37

0.41

0.38

0.30

0.35

0.33

Workers remittances

0.20

0.30

0.27

0.35

0.31

0.48

0.35

0.20

0.23

0.30

Development assistance

0.09

0.07

0.14

0.08

0.08

0.07

0.10

0.10

0.10

0.09

 

 

 

 

 

 

 

 

 

 

 

Physical Indicators

 

 

 

 

 

 

 

 

 

 

Kilometers of road

0.37

0.41

0.45

0.44

0.41

0.36

0.44

0.36

0.34

0.40

Arable Land

0.28

0.28

0.50

0.29

0.29

0.24

0.31

0.29

0.30

0.31

Surface area

0.21

0.17

0.40

0.21

0.18

0.14

0.23

0.20

0.19

0.22

Freshwater resources

0.07

0.08

0.19

0.14

0.08

0.07

0.09

0.06

0.12

0.10

 

Three of four economic indicators show strong correlation, though foreign direct investment shows no meaningful correlation. Only one of four population indicators – urban population – shows strong correlation, though the other three show some correlation. Technology indicators fare similarly, with total PCs showing strong correlation and the other two factors showing some correlation.

 

Six indicators chosen to represent global interconnection correlate poorly to story counts. Aircraft departures shows some correlation, correlating strongly to AltaVista, Google and CNN’s results, but modestly overall. No other factors correlate strongly, and none are worse than development aid and assistance, which bears the dubious distinction of least correlated to media attention, a fact that comes as no surprise to anyone who works in the International development community. Four physical indicators – largely reflections of the size of a nation – also fail to correlate meaningfully with article distribution.

 

It is worth noting how abnormal BBC’s results appear in the comparison of nine media sources. Comparing each source’s correlation to a given World Bank data set to the average correlation to that data set, BBC is more than one standard deviation away from the norm on 20 of 21 possible indicators. By contrast, three sites are within standard deviation on all 21 indicators, and three other sites are only outside of standard deviation on one or two indicators. CNN is outside the standard deviation on four indicators, and the New York Post is outside on five.

 

In contrast to all other sites, BBC story distribution shows no meaningful correlation to any economic indicators (it verges on a loose correlation to GDP, with R2=0.38). It shows strong correlation to all four population indicators, and is the only site to show strong correlation to a physical indicator (Arable land, R2=0.5)

 

Why does BBC present such a different statistical profile from other news media outlets? In a word: empire. BBC appears to have an editorial policy that mandates regular coverage of nations formerly in the British Empire. Many of these nations have large populations and small GDPs, and therefore the BBC attention is more closely correlated to population factors than to economic ones. Before anointing the BBC the champion of the poor, it’s worth noting that the BBC does not spend noticeably more attention on poor countries in Central Asia or Central America (areas where the British Empire was not colonially involved) than other news media outlets.

 

It is also interesting to note that the six sites with most similar correlation patters – AP, Reuters, New York Times, Washington Post, AltaVista and Google – represent the shortest timeframes, ranging from 14 to 90 days. It’s possible that correlations over a wider timeframe show a different pattern than correlations over a short time period. In other words, if we could examine shorter time slices of CNN and New York Post data, they might look more similar to the data of the six most similar sites.

 

It remains to be seen whether such correlations will hold true over time. Early results suggest they will be. Correlations performed with Google data on May 5, 2003 – a period that should have no overlapping stories with the period considered in this paper – showed 0.67 correlation to GDP (compared to 0.62 with current data) and 0.44 correlation to population data (compared to 0.45 with current data).

 

While these correlations and lack of correlations suggest something about media distribution – namely that it may have more to do with economics than with population distribution – they suggest a challenging question: should one really expect media distribution to be connected to these sorts of factors? After all, one reads very few news stories that report that Japan’s GDP is still vastly larger than Nigeria’s – shouldn’t news to be closely correlated to things that occur, like natural disasters and wars?

 

Fortunately for the world’s population, and unfortunately for statisticians, all nations are not uniformly plagued with wars and natural disasters. While it would be statistically convenient to compare the coverage a war in Sudan receives to the coverage a war in similarly-sized Canada experiences, Canada has been reluctant to comply by engaging in a military conflict.

 

Instead of working from a 150-data point World Bank set, it is useful to consider Project Ploughshare’s Armed Conflict Report[3] , which lists 29 countries “hosting” armed conflicts in 2001, their most recent data set. When one examines CNN results for these 29 nations (because CNN is one of two data sets that includes all of 2001, and the other one, BBC, is not representative of the other eight sets), it becomes clear that that hosting a conflict increases a nation’s visibility, but not as much as might be expected. 15 of the 29 have fewer stories than predicted by a population estimation, while eight have more than predicted. (The remaining six are within the predicted range.) The results are almost inverted considering attention versus GDP – 18 of the 29 have more stories than predicted by GDP, while only five have fewer.

 

 

Stories

Pop Estimate

GDP estimate

Pop Variance

GDP Variance

Chad

5

475

110

-98.95 %

-95.50%

Nigeria

623

2959

922

-78.94%

-32.40%

Myanmar

414

1550

1213

-73.30%

-65.87%

Guinea

152

462

166

-67.09%

-8.17%

Senegal

201

545

221

-63.13%

-8.96%

India

5486

11472

4555

-52.18%

20.43%

Burundi

213

436

63

-51.14%

235.67%

Congo Dem. Rep.

815

1634

237

-50.12%

243.45%

Uganda

491

949

252

-48.24%

95.10%

Sudan

631

1177

422

-46.38%

49.48%

Colombia

782

1437

1446

-45.59%

-45.91%

Nepal

534

970

248

-44.95%

115.00%

Algeria

681

1156

1106

-41.08%

-38.41%

Angola

432

674

352

-35.91%

22.84%

Somalia

352

520

204

-32.30%

72.97%

Sierra Leone

363

358

67

1.39%

441.69%

Kenya

1191

1153

397

3.26%

200.09%

Sri Lanka

902

834

494

8.09%

82.74%

Indonesia

4917

4038

2094

21.77%

134.80%

Rwanda

634

476

115

33.23%

453.15%

Turkey

2659

1948

2116

36.49%

25.63%

Iran

2819

1873

1788

50.50%

57.69%

Pakistan

5286

3129

1158

68.95%

356.54%

Philippines

3670

2126

1317

72.64%

178.70%

Russian Federation

10273

3176

3435

223.43%

199.03%

Afghanistan

5866

1066

592

450.23%

891.36%

Yugoslavia

3973

577

385

588.65%

933.02%

Iraq

7796

975

1162

699.98%

570.85%

Israel

7456

412

1728

1709.83%

331.36%

In other words, a nation hosting a violent conflict appears likely to command more attention than it would simply based on its economic strength. This increased attention is not enough to raise the level of attention to that which a wealthy country of the same size would expect. Sudan demands more attention than Tanzania (similar size, similar size of economy) but less attention than Canada (similar size, much larger economy), despite the fact that Sudan is hosting a violent conflict.

 

The results listed above are obviously not comprehensive – charts and maps of all results generated by GAP scrapers are available for download at h2odev.law.harvard.edu/ezuckerman[4].

 

Conclusions

After comparing the GAP profiles of different media outlets, it is possible to make a few broad generalizations:

v     Media attention is not homogenous – nations are not covered equally. A small number of nations receive a large share of the attention of a given media outlet.

v     No single factor explains the distribution of media attention perfectly. If one estimates distribution based on population figures, fewer stories than expected tend to appear about poor nations and more stories than expected appear about small, wealthy nations. If one estimates based on national GDP, certain large economies, especially in South America, are underrepresented. And, with either estimation, the Middle East is overrepresented.

v     While no single factor correlates perfectly to the distribution of media attention, national GDP and imports of goods and services correlate more closely than any other factor. In general, economic and technology factors correlate more closely than population factors. Physical attributes of nations and factors related to international communications and travel do not appear to correlate to media attention distribution.

v     Some evidence exists that the relationship between media distribution and population or GDP holds true over both long and short periods of time.

v     While six of nine media outlets exhibited very similar behavior, and two others roughly similar behavior, BBC demonstrated radically different patterns. The distribution of BBC’s attention is closely correlated to population distribution and not strongly correlated – if at all – to GDP distribution.

v     Violent conflict draws attention to a nation, but less than might be expected. A nation hosting a violent conflict will receive more attention than a peaceful nation with a similarly sized economy. It will not receive more attention than a similarly-sized, peaceful nation with a much larger economy, suggesting that GDP may be a more important factor in explaining media distribution than violent conflict.

 

This paper focuses on correlating factors to observed patterns, rather than trying to demonstrate causality. In particular, the intent of this paper is not to suggest that media sources consciously tailor their reporting to national GDP, with editors checking the wealth of nations before deploying reporters abroad.

 

Figuring out what actually causes media distribution likely requires investigation of entirely different factors. Where do media outlets position their reporters, and how do they make those decisions? How does the ease or difficulty of traveling to a given nation (Myanmar, for instance) influence the amount of attention a media source is able to pay to it? These questions are beyond the scope of this paper, but need to be addressed before suggesting causality of media attention distribution.

 

A final conclusion of this paper is a warning for all media consumers – caveat emptor. It is clear that all news media outlets studied in this paper have large blank spots in their global attention maps. Future GAP papers will attempt to chart these blank spots more accurately and make it possible for media consumers to make better choices or lobby their media outlets for more global coverage.

 

Next section: Future Steps and Acknowledgements | Index

 



[1] It is unlikely that BBC consciously chose for its coverage to tightly track population distribution, just as it is unlikely CNN chose to closely track capital distribution. It’s more likely that BBC has an unstated policy of closely following former British colonies, which keeps it focused on Africa and South Asia.

[2] “Google News (Beta)”, http://news.google.com/intl/en_us/about_google_news.html, accessed July 31,2003.

[3] “The Armed Conflict Report 2000”, http://www.ploughshares.ca/CONTENT/ACR/ACR00/ACR00.html, accessed July 31,2003.

 

[4] Readers are welcome to download any or all data sets and correlate them to other factors, and this author welcomes correspondence, especially correspondence including additional results.