This website is intended to serve as a reference for researchers interested in different ways of measuring ethnicity and other types of identity-based diversity. For a discussion of issues related to definition and measurement, see Kyle L. Marquardt and Yoshiko M. Herrera. (Forthcoming 2015). "Ethnicity as a variable: An assessment of measures and datasets of ethnicity and related identities." Social Science Quarterly.

All references to the listed datasets should cite the relevant dataset authors, as indicated on their respective websites or articles.

This database can be downloaded [here], and should be cited as follows: Steven L. Wilson, Kyle L. Marquardt and Yoshiko M. Herrera. 2015. "Ethnicity as a variable: An annotated bibliography for sources on ethnic and cultural diversity."

Scholars who wish to have their work included on the website or who would like to edit the information provided should email:

Click column header to sort, hold shift-key and click to secondary sort by additional columns.
Primary Related Work      Authors Dataset Name # of countries     Years*       Time series Level of analysis Fractionalization       Polarization       Weights Availability    Description
Akturk (2011) Akturk, Sener Regimes of ethnicity 173 2014 N Country/Group N N N Website

Dataset of 1) contemporary ethnic demography in 173 countries with a population over 250,000; 2) 15 dichotomous indicators of whether or not a state pursued certain policies regarding ethnic and cultural factors in these countries (e.g. whether or not government maintains record of individual-level ethnic identity, religious education in schools). The project will consult three country experts with regard to each element of the dataset. Preliminary data available on website.

Alesina and Zhuravskaya (2011) Alesina, Alberto and Ekaterina Zhuravskaya Segregation and the quality of governance 78-97 2000 N Country Y N Y (Regional segregation) Website

Regional ethnic, native-language and religious fractionalization indices; aggegrated to country-level for segregation index (the degree to which identity groups are segregated by region) and fractionalization. Data from country censusesand Demographic and Health surveys. Also includes data on different religious groups' population share.

Alesina et al., 2003 Alesina, Alberto, Romain Wacziarg, Arnaud Devleeschauwer, William Easterly and Sergio Kurlat Fractionalization 215 Not specified N Country Y N N Website

Ethnic fractionalization. Groups listed.

Ashraf and Galor (2013b); Ashraf and Galor (2013a) Ashraf, Quamrul and Oded Galor "Out of Africa" Hypothesis 145 n/a N Country N N N Website

Database of genetic diversity, directly estimated for 21 countries and estimated as a function of migratory distance from Africa for 145 countries.

Baldwin and Huber (2010) Baldwin, Kate and John Huber. Inter-ethnic inequality and political development 46 Not specified N Country Y N Y (Between-group economic inequality) Website

Data regarding between-group inequality. Data on groups based on Fearon (2003) and cross-national survey results; data on inequality from these surveys. Also includes data from Fearon (2003).

Birnir et al. (2014) Birnir, Johanna K., Jonathan Wilkenfeld, James D. Fearon, David D. Laitin, Ted Robert Gurr, Dawn Brancati, Stephen M. Saideman, Amy Pate and Agatha S. Hultquist. All minorities at risk 163 Not specified N Group N N N Website

List of ethnic groups determined to be "socially-relevant" (a more broad definition than the traditional Minorities at Risk inclusion criteria). Includes 1,195 groups.

Bossert, D'Ambrosio and La Ferrara (2011) Bossert, Walter, Conchita D'Ambrosio and Eliana La Ferrara. Generalized index of fractionalization 1 1990 N Country Y N Y (Composite individual-level dissimilarity with regard to race, income, employment and education) Unavailable online; complete dataset in article.

Data on diversity in American states that uses a composite fractionalization score that incorporates individual-level dissimilarity on the metrices of race, income, employment and education. Data from 1990 US Census.

Cederman and Girardin (2007) Cederman, Lars-Erik and Luc Girardin. N* 88 Not specified N Country N N N Data available on Fearon, et al. (2007), (replication data). Data used to construct index available in paper.

Data used to construct index representing percentage of a population that an ethnic group in power represents vis-a-vis other groups in a state.

Cederman, Wimmer and Min (2010) Cederman, Lars Erik, Brian Min and Andreas Wimmer EPR-ETH v1.1 155 1946-2005 Y (Yearly) Group N N N Website

Dataset includes group size estimates, level of access to the executive branch, and whether or not ethnic group was involved in an armed conflict. Note: Geo-coded data also available on website. Yearly data on 733 politically relevant ethnic groups in 155 countries, 1946 - 2005.

Cederman, Wimmer and Min (2010) Cederman, Lars Erik and Manuel Voght EPR-ETH v2.0 165 1946-2009 Y (Yearly) Group N N N Website

Dataset updates EPR v1. It codes a range of access of group involvement in government, ranging from total control of the government to whether or not the group faces overt discrimination. Note: Geo-coded data also available on website. Yearly data on over 790 groups based on their access to executive state power, 1946 - 2009.

Chandra and Wilkinson (2008) Wilkinson, Steven Ethnic concentration index 40 Not specified N Country N N N Unavailable online; forthcoming.

Index representing the degree to which ethnic representation in the armed forces was imbalanced.

Collier and Hoeffler (2004); Collier, et al. (2009) Collier, Paul and Anke Hoeffler Greed and grievance in civil war 215 1964, 2003 N Country Y N Y (Composite measure of ethnic and religious fractionalization) Website

Ethnolinguistic and religious fractionalization, as well as social fractionalization, a composite measure of ethnic and religious fractionalization. Ethnolinguistic fractionalization based on ANM for 2004 article; Fearon and Laitin (2003) for 2008.

Desmet, Ortuno Ortin and Wacziarg (2012) Wacziarg, Romain, Klaus Desmet and Ignacio Ortuno-Ortin ELF and polarization 226 2005 N Country Y Y Y (Phylogenetic linguistic differences) Website 1, Website 2

Ethnic fractionalization and polarization indices, including weights based on phylogenetic linguistic difference at various levels of aggregation. Data from Ethnologue.

Desmet, Ortuno Ortin and Weber (2009) Desmet, Klaus, Ignacio Ortuno-Ortin, and Shlomo Weber Linguistic diversity and redistribution 225 1996 N Country Y Y Y (Cognate-based linguistic difference) Website

Measure of ethnic fractionalization weighted by cognate-based linguistic difference, as well as weighted distance between main language and peripheral languages; also includes Esteban-Ray, ELF and peripheral heterogeneity indices for these countries. Original indices based on Ethnologue.

Ellingsen (2000) Ellingsen, Tanja Ethnic composition 229 1945-1994 Y (Yearly) Country N N N Website

Information on the concentration of the largest and second largest linguistic (mother tongue), ethnic and religious groups; data coded by source.

Esteban, Mayoral and Ray (2012) Esteban, Joan, Laura Mayoral and Debraj Ray Ethnicity and Conflict 141 Not specified N Country Y Y Y (Phylogenetic linguistic differences) Website

Measures of ethnic polarization and fractionalization, weighted by phylogenetic linguistic difference. Data on groups from Fearon (2003).

Fearon & Laitin (2003) Fearon, James and David Laitin Ethnicity, insurgency and civil war 161 1960, 2003 N Country Y N N Website

ELF indices (based on ANM), ethnic and linguistic fractionalization indices from Fearon (2003), and population share of second-largest religious and ethnic groups in each country, as well as number of distinct languages spoken in a country. Groups enumerated; 161 countries.

Fearon (2003) Fearon, James Ethnic and cultural diversity by country 160 Not specified N Country Y N Y (Language-family resemblance) Website

Different measures of ethnic, ethnolinguistic fractionalization, as well as largest and second-largest groups in a territory. Also includes measure of ethnic group fractionalization weighted by language-family resemblance. Groups listed.

Fearon, Kasara and Laitin (2007) Fearon, James, Kimuli Kasara and David Laitin Ethnic minority rule and the onset of civil war 161 Not specified N Country Y N N Website

Measures of fractionalization(Fearon and Laitin 2003), various codings of Cederman and Girardin (2007) index, and a measure of whether or not a country's head of state was from a minority ethnic group.

Guiso, Sapienza and Zingales (2009) Guiso, Luigi, Paola Sapienza and Luigi Zingales Somatic distance 207 1996 N Country N N N Website

Somatic and genetic distances between populations of countries.

Lieberman and Singh (2012) Lieberman, Evan S. and Prerna Singh Institutionalized ethnicity 6 1900-2011 Y (Yearly) Country N N N Website

Database of five dichotomous indicators regarding whether or not a state makes use of different categories related to ethnicity (e.g. religion, ethnic identification) over time. Yearly data for six countries, 1900 - 2011.

Minorities at Risk Project, 2009   Minorities at risk 117 2004-2006 N Group N N N Website

Relevant political and cultural data on 283 ethnic groups perceived to be at risk of involvement in a political conflict.

Nardulli et al. (2012)   Composition of religious and ethnic groups v.1.02 156 1946-2014 Y (Yearly) Group N N N Website

Database of concentration of different ethnic and religious groups at the country level. Project will eventually include data regarding ascriptive differences between groups (e.g. sensory-based traits, attitudes) and country-specific traits that could affect ethnic relations. Yearly data for 156 countries (those with a population over 500,000 in 2004) 1946 - present.

Okediji (2005) Okediji, Tade Dynamics of ethnic fragmentation 132 Not specified N Country Y N Y (Composite index of race, ethnic, linguistic and religious affiliation) Unavailable online; complete dataset in article.

Ethnolinguistic fractionalization and composite social diversity indices (a weighted index of race, ethnic, linguistic and religious affiliation)

Ostby (2008) Ostby, Gudrun Polarization and horizontal inequalities 39 1986-2004 Y (Yearly) Country N Y Y (Composite measures of ethnic and economic polarization and social and economic inequality) Website

Yearly data on 11 measures of social and economic inequality, economic and ethnic polarization, and composites of the aforementioned factors. Measures based on data from cross-national Demographic and Household Surveys.

Posner (2004) Posner, Daniel Decade values for PREG 42 1960-1990 Y (Decades) Country Y N N Unavailable online; complete dataset in article.

Politically-relevant ethnic group fractionalization for African countries.

Reynal-Querol (2002); Montalvo and Reynal-Querol (2005) Reynal-Querol, Marta Ethnic and religious fractionalization and polarization 137 Not specified N Country Y Y N Website

Ethnic and religious fractionalization and polarization; ethnic data based on the World Christian Encyclopedia and religious data from L'Etat des religions dans le monde

Roeder (2001)   Roeder, Philip Ethnolinguistic Fractionalization (ELF) Indices 183 1961, 1985 N Country N N N Website

Ethnolinguistic fractionalization, based on Soviet sources (e.g. ANM) and Europa World Yearbook. At different levels of aggregation, based on ANM coding scheme.

Roeder (2003)   Roeder, Philip Clash of civilizations and escalation of ethnopolitical conflicts 130 1980-1999 Y (By decade) Group N N N Website

Dataset includes data on various ethnic groups in relation to majority population of their country of residence. Data on ethnic-groups-by-country, 1032 observations.

Roeder (2007) Roeder, Philip Nation-state crises worldwide 161 1955-1999 Y (Five-year increments) Group N N N Website

Dataset includes data on various ethnic groups, both related to their demographic and linguistic characteristics, as well as their historical relationship to their state (e.g. former or present regionized-homeland). Data on ethnic-groups-by-country, in five-year increments from 1955-1999. 8054 observations.

Scarritt and Mozaffar (1999); Mozaffar, Scarritt and Galaich(2003) Mozaffar, Shaheen, James R. Scarritt and Glen Galaich Electoral institutions, ethnopolitical cleavages, and party systems in Africa's emerging democracies 48 Not specified N Country Y N Y (Regional group concentration) Unavailable online

Database on ethnic fragmentation (based on politicized groups) and concentration for 48 African countries.

Selway (2011) Selway Joel Sawat Cross-cutting cleavages dataset 155 Not specified N Country Y N Y (Cross-cutting cleavages, subgroup polarization and an cross-fractionalization) Website

Indices of cross-cutting cleavages, sub-group fractionalization, sub-group polarization and cross-fractionalization. Data gathered from multiple cross-national surveys.

Spolaore and Wacziarg (2009a) Spolaore, Enrico and Romain Wacziarg Diffusion of development 206 Not specified N Country N N N Website

Genetic distance between countries, based on main ethnic groups in countries and general data on genetic differences between populations.

Taylor and Hudson (1984)   Taylor, Charles Lewis and Michael C. Hudson World handbook of political and social Indicators 136 1964 N Country Y N N Website

Main original source of ELF indices, based on ANM.

Vanhanen (1999) Vanhanen, Tatu Domestic Ethnic Conflict and Ethnic Nepotism 183 Not specified N Country N N N Website

Information on racial, linguistic and religious concentration of the largest groups; as well as composite score of these three concentration scores. Data compiled from multiple sources.

Cederman, Wimmer and Min (2010) Wimmer, Andreas and Philippe Duhart EPR v3.0 157 1946-2010 Y (Yearly) Group N N N Website

Dataset updates EPR v2.0. Annual data for 157 countries 1946-2010; 758 politically relevant groups. Includes marker identifying trait (e.g. language, skin color) that differentiates group members, as well as group's type of political inclusion. Fully geo-coded dataset available.

* Unless otherwise noted, the datasets include data from multiple sources, often from different years. The exact year(s) that the data represent is therefore not specified in many datasets, but is presumed to reflect levels of diversity at a time roughly contemporary to publication of the primary related work.