Springe direkt zu Inhalt

Romania Amerindia

Das gesamte Kartenmaterial wurde von uns auf Basis der Informationen von der OpenSource-Quelle <https://native-land.ca/>  hergestellt.  Native Land wird von indigenen Mitarbeiter*innen betrieben und stetig aktualisiert

Languages and regions where we research. All map material was produced by us on the basis of information from the open source (https://native-land.ca/). Native Land is mostly run by indigenous staff and is constantly updated.

About the project

In this research area, we empirically investigate the linguistic reality of selected regions in South America where Spanish and Portuguese are spoken together with Amerindian languages1. In doing so, we pursue two main goals. On the one hand, we provide data on Amerindian languages in a way that will enable linguists worldwide to integrate these languages into their research perspectives. In this way, we want to contribute to broadening the typological basis of theory building. At the same time, these languages and the forms of life they express should be made more visible through their linguistic profiling in times of extreme threat (http://gbs.uni-koeln.de).

The second goal concerns the focus of our own descriptive and theoretical work. We want to achieve a better understanding of the interplay of linguistic means for processing presupposed common knowledge (Common Ground, Stalnaker 2002). This perspective requires a simultaneous acquisition of lexical, morphological, syntactic and prosodic techniques.

The prosodic form of an utterance is to be explained from the interplay of metrical, rhythmical and intonational rules or constraints, which are determined in different ways by single-linguistic and universal competences of multilingual speakers.

 1 We use the term Amerindian to refer to all languages that have evolved directly from the languages spoken in the Americas before the arrival of the first Europeans, without claiming the existence of the macrofamily proposed under this name by Greenberg (1987). The term is somewhat more precise than indigenous and less colonialist than indian.


The project was funded by the German Research Foundation (DFG) from 2015 to 2021 (DFG project number: 274614727). In the first phase of the project (2015-2017), the prosody of the Quechua of Conchucos in the Ancash province of Peru was empirically recorded and theoretically illuminated. In the second phase of the project (2018-2021), it was extended to Spanish & Guaraní in Asunción/Paraguay and Nheengatú & Portuguese in São Gabriel da Cachoeira/Brazil. We are currently preparing a new application to be able to research the complex multilingualism in this (Brazilian) region more intensively.

The creation and publication of language corpora is time-consuming and requires many steps: the experiments have to be designed and tested and adapted with the help of local experts. Materials have to be selected and created. Tecnically clean recordings must be produced very carefully. Then follows the linguistic processing of the raw data: transcription, translation and morphological glossing, according to uniform and comprehensible criteria (as far as possible based on the Leipzig Glossing Rules2). Finally, the data must be technically processed in a way that turns it suitable for online publication. All of this requires many different participants who all make their contributions. Our thanks go to everyone!

Language Country Colaborators
Guaraní and Spanisch in Asunción Paraguay Hedy Penner, Élodie Blestel (U Paris III), Irene E. Serna Lehmann (FU Berlin)
Nheengatú and Portuguese in Rio Negro


Antônio Lessa, André Amorim, Jan Potthoff (FU Berlin), Edson Baré, Dime Baré, Dadá Baniwa, Bárbara Heliodora Lemos de Pinheiro Santos

Quechua and Spanisch in Conchucos


Raúl Italo Bendezú Araujo, Timo Buchholz, Gabriel Barreto, Leonel Menacho, Elizabeth Pankratz, Irene E. Serna Lehmann (FU Berlin)

2The morphological glosses are always an approximation of the structural understanding at a point in time. Other readings cannot be excluded and should be communicated to us as appropriate.

News and publications

  • 2023:
    • June 5, 2023: The data from São Gabriel da Cachoeira ( Nheengatú / Portuguese ) is now online in the Refubium and freely available for everybody.
    • May 26, 2023: The data from Asunción (Spanish) is now online in the Refubium and freely available for everybody.
    • May 16, 2023: The data from Asunción (Guaraní) is now online in the Refubium and freely available for everybody.
    • Reich, Uli. 2023. System consistency across named languages in the Andes. In Pomino, Natascha et al. (eds.) Formal Linguistic Theory to the Art of Historical Editions The Multifaceted Dimensions of Romance Linguistics.Wien: Vienna University Press.
  • 2022:
    • Buchholz, Timo, Raúl Bendezú & Uli Reich. Unpublished. Spanish in contact with Quechua in Northern Peru. In Cerno, Leonardo et al. (eds.). Contact varieties of Spanish and Spanish-lexified contact varieties. Berlin: Mouton de Gruyter.
    • Reich, Uli. Unpublished. Language contact and prosody. In Cerno, Leonardo et al. (eds.). Contact varieties of Spanish and Spanish-lexified contact varieties. Berlin: Mouton de Gruyter.
    • April 29, 2022: Disputation by Timo Buchholz "Intonation between Phrasing and Accent: Spanish and Quechua in Huari".
  • 2021:
    • September 13, 2021: Disputation by Raúl Italo Bendezú Araujo "Identificación y aserción en la marcación del foco del Quechua de Conchucos (Áncash, Perú)".
    • February 10, 2021: The second part of the Conchucos data (Spanish) is now online in the Refubium and freely available for everybody.
  • 2019:
    • November 27, 2019: Publication in the TAZ (in German) about the threat to Brazil's cultural diversity under Bolsonaro's government.
    • October 28, 2019: The first part of the Conchucos data (Quechua) is now online in the Refubium and freely available for everybody.
    • August - October, 2019: Guaraní recordings in Asunción, Paraguay (Uli Reich and Élodie Blestel), and Nheengatu recordings São Gabriel da Cachoeira, Brazil (Uli Reich and Antônio Lessa). 
    • April 23, 2019: Publication in the FU-Tagesspiegel-Beilage (in German) about indigenous languages and the work done by this project.
  • 2018:
    • Buchholz, Timo & Uli Reich. 2018. The realizational coefficient: Devising a method for empirically determining prominent positions in Conchucos Quechua. In Ingo Feldhausen, Jan Fliessbach & Maria d. M. Vanrell (eds.), Methods in prosody: A Romance language perspective (Studies in Laboratory Phonology 6), 123–164. Berlin: Language Science Press. (available here / publication website)
    • Reich, Uli. 2018. Presupposed Modality. In Marco García García & Melanie Uth (eds.), Focus realization in Romance and beyond (Studies in language companion series Volume 201), 203–227. Amsterdam, Philadelphia: John Benjamins. (publication website)

Type and structure of the recordings

We achieve a high degree of comparability of the data by using the same recording procedures in all communities, which elicit typical conversational traits for processing the Common Ground in a controlled way. We give priority to communicative games in which context, discourse and lexical material are controlled over very strongly sentence-level controlled procedures such as elicitation of completely given utterances or discourse completion tasks since such experiments are very unfamiliar to the speakers and we want to produce the most natural language data possible. In each experiment, the speakers solve different communicative tasks in the form of a game and are recorded while doing so. All bilingual speakers perform each experiment twice, once in their local non-Romance and once in their local Romance variety. Through the choice of materials (see "Metrical control for experiment materials" in Spanish or English), which are carefully adapted for each language and region with the strong involvement of local experts, as well as the guidelines of the rules of the game ( es / en / pt ), we retain certain control over the content of each conversation as a whole, but the content and form of individual utterance are chosen spontaneously by the speakers. Together with the cooperation partners in the respective countries and regions, we have agreed on a core of common experiments. Some of these are generally known, some belong to linguistic literature and some of them have been specially developed by us. The individual experiments are briefly presented below. More detailed experiment descriptions (in Spanish, English or Portuguese) are linked.

Common experiments

Memoria/Concurso ( es / en / pt ): The speakers play a version of the well-known game "Memory", in which they have to identify and remember the positions of cards with certain images.

Maptask ( es / en / pt ): The speakers conduct a conversation that simulates the giving and receiving of directions, but the maps they use do not match. The Maptask experiment was originally developed by Anderson et al. (1991), whose English-language corpus is located at the University of Edinburgh: http://groups.inf.ed.ac.uk/maptask/

Cuento/Story ( es / en / pt ): In an adapted version of the game "Chinese Whispers", the speakers tell and retell a story invented by the researchers.

Quién/Who ( es / en / pt ): The speakers play a version of the game "Who am I?" in which one of them has to guess the identity of a person only the other knows.

Individual experiments

Cajas/Boxes ( es / en / pt ): The speakers try to guess the contents of different boxes without opening them and discuss and negotiate until they come to an agreed solution.

Condir ( es / en / pt ): This is an adapted version of a sociolinguistic interview with a speaker.

Imagenes( es / en / pt ): The speakers name objects that are shown to them on picture cards.

The data from all sub-projects are transcribed, translated into Spanish or Portuguese and into English, morphologically glossed and made available with metadata in English, Spanish and Portuguese in a repository and on OSF. Each individual corpus will be identified by a link. All informants have been informed by us in detail that the recordings will be published for research purposes. Consent forms have been received from all of them.

Individual corpus

Each individual corpus is represented by 4 files in the repository:

  1. The speech recording itself, usually the recording of a single experiment, in 16-bit PCM .wav format.
  2. a file in .eaf format, which contains a transcription and glossing temporally aligned with the audio recording at the utterance level, as well as a translation in Spanish and English (if the recording itself is in Spanish or Portuguese, the glossing is omitted). The .eaf format belongs to the annotation programme ELAN, which was developed by the Max Planck Institute for Psycholinguistics and is freely available here: https://tla.mpi.nl/tools/tla-tools/elan/ . There you can also find tutorials and guides on how to use the programme.
  3. a file with the same information in .TextGrid format. The TextGrid format belongs to the Praat programme, which was developed by Paul Boersma and David Weenink at the University of Amsterdam and which is the most popular software for analyzing speech data in phonetics and phonology. Praat is also freely available at http://www.fon.hum.uva.nl/praat/, where you can also find tutorials and guides on how to use the programme.
  4. a file in .pdf format with metadata on the recording, containing information on the experiment and the speakers.
The voice recordings produced as part of this research project are centrally stored in the repository of Freie Universität Berlin (Refubium), together with transcriptions, translations and morphological glosses, which are freely accessible to the public and non-commercially usable (Creative Commons License CC BY-NC-SA 4.0). You are cordially invited to conduct your own linguistic research with our data and we welcome any feedback (please contact: uli.reich@fu-berlin.de)