The Endangered Languages Project

A project to support language preservation and documentation around the world

Introduction to the catalogue

Many languages of the world are at risk of becoming extinct soon. The crisis of endangered languages is one of the most serious issues facing humanity today, posing moral, practical, and scientific problems of enormous proportions. This catalogue informs users about the plight of endangered languages and encourages efforts to slow the loss. It provides information on the endangered languages of the world as a resource for the public, scholars, those whose languages are in peril, and funding agencies able to deploy limited resources.

Until now, with this Catalogue of Endangered Languages, there has been no single reliable source of information on the endangered languages of the world that describes how endangered each language is and to what extent it has been documented. For many of the languages in this catalogue, little or no accessible information exists yet. For others, the existing sources are often inaccurate, unreliable, or inaccessible. For those seeking to understand where documentation efforts and resources might most effectively be directed, and where language conservation or revitalization efforts are most needed, it is important to know not only how critically endangered a language is, but also how well it has already been described, how different or unique it might be, and how further description might contribute to our understanding of human language in general. It is this kind of information on the endangered languages of the world that the Catalogue presents.

The Catalogue of Endangered Languages Project Personnel

The Catalogue of Endangered Languages is under the direction of Lyle Campbell (University of Hawai‘i Mānoa) and Anthony Aristar and Helen Aristar­Dry (LINGUIST List/Eastern Michigan University). The team at Eastern Michigan University (EMU) is responsible for the programming, technical aspects of the Catalogue, bibliography management, and for the languages of Africa and Australia. The University of Hawai‘i Mānoa (UHM) team is responsible for the languages of Europe, the Caucasus, North Asia, East Asia, South Asia (the Indian subcontinent), Southeast Asia, North America, Central America, South America, and the Pacific, and for the endangerment scale and the need for documentation scale. The following individuals contributed to Phase I of the data collection:

EMU team: UHM team:
Dr. Anthony Aristar Dr. Lyle Campbell
Dr. Helen Aristar­Dry Sean Simpson (Project Coordinator 2012 ­ present)
Anna Belew (Project Manager) John Van Way (Project Coordinator 2011­2012)
Lwin Moe Raina Heaton
Kristen Dunkinson Eve Okura
Jacob Collard Huiying Nala Lee
Uliana Kazagasheva Dr. Kaori Ueki
Amy Brunett (2011­2012)
Brent Woo (2011­2012)

The initial catalogue content was prepared by the members of these two teams, with some input from Regional Directors­experts on the languages of specific regions who provide expertise to correct and expand the catalogue, and whose primary role begins in Phase II. The Regional Directors are:

Willem F.H. Adelaar (Central & South America) Alice C. Harris (Caucasus)
Greg Anderson (South Asia) Brian Joseph (Europe)
I. Wayan Arka (Indonesia) Juha Janhunen (Northern and Central Eurasia)
Habib Borjian (Near East) Martin Maiden (Romance languages)
Claire Bowern (Australia) Bill Palmer (Pacific)
David Bradley (East Asia) Keren Rice (North America)
Matthias Brenzinger (Africa) David Solnit (East and Southeast Asia)
Lyle Campbell (the Americas) George Van Driem (Himalayas and adjacent)
Charles Häberl (Near East) James Woodward (sign languages)

Data entry for phase I was accomplished with the help of the following volunteers from the University of Hawai‘i Mānoa:

Carolina Aragon Katie Butler
Joelle Kirtley Stephanie Locke
Colleen O'Brien Melody Ann Ross­Nathaniel
Sean Simpson

This is just the beginning

It is extremely important to understand that the Catalogue is a work in progress. At launch of this website, the Catalogue is still in Phase I, which is based only on the information available in existing publications and web resources about the individual endangered languages. Bringing in more recent and local information is critical to this project, which is the focus of Phase II. The second phase will continue over the next two years. It involves an international team of regional specialists (see above) reaching out to knowledgeable individuals and organizations to fill in the missing information for languages in their areas, to check the accuracy of information, and to make needed corrections. For this phase and long into the future, the goal is to modify, update, and improve the catalogue contents constantly, as new information becomes available or as the situation for a particular language changes. If users of this website have particular knowledge or information about specific languages, we encourage submission of comments and suggestions for improvement of language entries. Please submit comments and suggestions via the “add information” link at the bottom of each language page. We are grateful for your help in improving the collective knowledge of endangered languages.

How the Catalogue handles tough questions

Dialects vs. Languages

There are a number of language varieties that are believed to be independent languages by some scholars but are considered only dialects of a single language by others. In cases where it is not clear whether separate languages are involved or just dialects of one language, the entity in question is given its own entry as a potentially distinct language, but with the different opinions noted. In cases where the evidence is clear that two entities are in fact dialects of the same language, these entities are joined in a single entry, with differences of opinion registered. Similarly, in cases where the evidence is clear that separate independent languages are involved, though some believe they are dialects of a single language, these are given separate entries in the catalogue, with a description of the different interpretations. The thorny issue of distinguishing dialects from closely related languages is avoided simply by giving doubtful entities their own entries with comments representing the range of opinion. As more comes to be known, it will be possible to resolve the status of many of these entities; for others, the status may just remain unclear.

This benefit­of­the­doubt approach to inclusion in the Catalogue, however, means that it is not possible just to count the total number of entries in the catalogue to get an absolute number of how many endangered languages there are in the world. Almost certainly some entities given their own entry will turn out to be dialects that need to be joined with others in a single entry as representatives of a single language, reducing the total number of entries in the Catalogue. This approach results in the total number of entries being greater than the absolute number of true languages that are endangered, though hopefully not by a very large margin.

“Extinct” languages in the Catalogue

Opinions differ over the word “extinct.” In cases where there have been no known speakers for hundreds or thousands of years, extinction is clear. However, there are cases where one source says either “extinct,” “probably extinct,” “possibly extinct,” or “no known speakers,” while another credible source reports some speakers remaining. In unclear instances, we include the language in the Catalogue, but report the conflicting designations. This means that almost certainly some languages in the Catalogue are in fact extinct—not just endangered—though definitive information is not yet available. As work on the Catalogue progresses, more accurate information on these cases will be obtained and their situations clarified. However, this means that it is not possible to take the total number of entries in the Catalogue as the absolute number of endangered languages in the world today, since some of these languages will prove to be not just endangered, but extinct. There are currently 133 entries in the Catalogue that fall into this category.

Problems with the term “extinct”

The word “extinct” raises other questions. Some scholars consider a language extinct when there are no longer any completely fluent native speakers who learned the language as children from the previous generation. Often, however, even after there are no fully fluent native speakers, there remain speakers with some aptitude in the language, others with passive knowledge, and others who have learned or are learning their heritage tongue as a second language. Many oppose calling these languages “extinct,” and recommend avoiding premature declaration of extinction. One main reason for this is that to those attempting to learn or revitalize their language, it can be demoralizing to read that their language is deemed dead. In order to not discourage learning and revitalization efforts in these situations, many recommend reporting these languages as having “no known speakers,” or something equivalent. The practice of avoiding the word “extinct” in such situations is followed in this catalogue, and when the number of native speakers is given as 0, that is an indication that the language in question falls into this category a language with no known speakers.

For more information on the topic of language death and obsolescence, please see the document on Silenced Tongues.

The Language Endangerment Index (LEI)

The level of endangerment presented for each language is not meant to be the final word on the matter. The scores for individual languages will change as more information becomes available. These scores are provided for practical purposes, to give a quick but rough visual indication of a language’s endangerment status. The level of certainty accompanying each endangerment score shows the degree of confidence in that score; a label of “uncertain” may indicate that the level is not yet known, or that the score has been computed but further evaluation is needed. Detailed information about how a language’s level of endangerment is calculated is given below.

Language Endangerment Scale

Each language is assigned a score of 0–5 (safe – critically endangered) for each of the four categories in the first column of the chart below based on how well it meets the relevant criteria. If no information relevant to one of these four categories is available for a given language, that language is not scored for that category.

Level of Endangerment 5 Critically Endangered 4 Severely Endangered 3 Endangered 2 Threatened 1 Vulnerable 0 Safe
Intergenerational Transmission There are only a few elderly speakers Many of the grandparent generation speak the language, but younger people generally do not. Some adults in the community are speakers, but the language is not spoken by children. Most adults in the community are speakers, but children generally are not Most adults and some children are speakers. All members of the community, including children, speak the language.
Absolute Number of Speakers 1-­9 speakers 10­-99 speakers 100-­999 speakers 1,000­-9999 speakers 10,000­-99,999 speakers >100,000 speakers
Speaker Number Trends A small percentage of the community speaks the language, and speaker numbers are decreasing very rapidly. Less than half of the community speaks the language, and speaker numbers are decreasing at an accelerated pace. Only about half of community members speak the language. Speaker numbers are decreasing steadily, but not at an accelerated pace. A majority of community members speak the language. Speaker numbers are gradually decreasing. Most members of the community or ethnic group speak the language. Speaker numbers may be decreasing, but very slowly. Almost all community members or members of the ethnic group speak the language, and speaker numbers are stable or increasing.
Domains of use of the language Used only in a few very specific domains, such as in ceremonies, songs, prayer, proverbs, or certain limited domestic activities. Used mainly just in the home and/or with family, and may not be the primary language even in these domains for many community members. Used mainly just in the home and/or with family, but remains the primary language of these domains for many community members. Used in some non­official domains along with other languages, and remains the primary language used in the home for many community members. Used in most domains, including official ones such as government, mass media, education, etc. Used in most domains, including official ones such as government, mass media, education, etc.
Computing Levels of Endangerment and Certainty

Level of Endangerment: The level of endangerment is calculated based on the four factors listed in the table above: Intergenerational Transmission, Absolute Speaker Number, Speaker Trends, and Domains of use. Intergenerational Transmission is worth twice each of the other factors. Because many languages will not have reliable data for some of these factors, the total score will be based on the percentage of points a language scores out of the total points possible based on the number of factors considered.

100­81% = Critically Endangered 80­61% = Severely Endangered 60­41% = Endangered 40­21% = Threatened 20­1% = Vulnerable 0% = Safe

Level of Certainty: The level of Certainty is computed based simply on the percentage of factors that are known and entered. A language which has information entered for all four factors will have 25 total points available: 5 for the three categories of absolute speaker number, speaker trends, and domains of use; 10 for intergenerational transmission, as it is weighted as twice the other individual factors. A language which has information for all factors except domains of use will have only 20 total points available (since the category of domains of use is not scored if no information is available on it.), leading to a certainty determination of “mostly certain.” The certainty levels with their corresponding numbers of available points are given below.

25 points possible = Certain 20 points possible = Mostly Certain 15 points possible = Fairly Certain 10 points possible = Mostly Uncertain 5 points possible = Uncertain

Example calculations:

Intergen. Trans. (x2) Abs. # Speaker Trends Domains Total Levels of endangerment and certainty
Language A 6 4 3 3 16/25 (64%) Severely Endangered
Pts. possible 10 5 5 5 25 Certain
Language B 8 5 No Info No Info 13/15 (87%) Critically Endangered
Pts. possible 10 5 0 0 15 Fairly Certain
Language C No Info 3 No Info No Info 3/5 (60%) Endangered
Pts. possible 0 5 0 0 5 Uncertain


The research for the Endangered Language Catalogue project is funded by a grant from the National Science Foundation: Collaborative Research: Endangered Languages Catalog (ELCat), BCS­1058096 to the University of Hawai'i at Mānoa (Principal Investigator, Lyle Campbell) and BCS­1057725 to Eastern Michigan University (Principal Investigators Helen Aristar–Dry and Anthony Aristar). The goals and basic organization of the Catalogue were established in an international workshop with some 50 specialists from around the world supported by National Science Foundation grant Collaborative Research: Endangered Languages Information and Infrastructure Project (NSF 0924140 ).