Mapping the Meaning of Diversity—and Who Is Counted as Diversity—in Large Language Corpora: Variation across time (1990–2015) and culture (41 languages)
Abstract: Few words in contemporary American discourse are as contentious as “diversity.” Yet how people understand diversity remains elusive, especially beyond the U.S. context, as most scholarly and public discussions predominantly reflect American perspectives. Here, we use natural language processing (word embeddings) to systematically map cultural understandings of diversity across 41 languages. Starting with contemporary English, we identify five latent dimensions underlying diversity discourse: fundamental principles, identity-based discrimination, inclusion ideals, democratic values, and institutional governance. Cross-linguistic analyses reveal substantial cultural variation. While identity-based discrimination consistently emerges as a central meaning of diversity across languages, other dimensions—such as democratic ideals or institutional governance—vary significantly. Similarly, social groups associated with diversity differ markedly: race and gender dominate Western discourse (particularly within the discrimination dimension) but are far less salient in Asia, the Middle East, and the Global South. By providing the first large-scale, cross-cultural analysis of how diversity is understood, we challenge dominant U.S.-centric assumptions and highlight the importance of cultural nuance in diversity-related scholarship, policy, and practice worldwide.
Keywords: diversity, cross cultural differences, global discourse, natural language processing
