Springe direkt zu Inhalt

Romania Amerindia

Researched languages & regions

Researched languages & regions

(this site in german / em português / en español)

About the project

In this research area, we empirically investigate the linguistic reality of selected regions in South America where Spanish and Portuguese are spoken together with Amerindian languages1. In doing so, we pursue two main goals. On the one hand, we provide data on Amerindian languages in a way that will enable linguists worldwide to integrate these languages into their research perspectives. In this way, we want to contribute to broadening the typological basis of theory building. At the same time, these languages and the forms of life they express should be made more visible through their linguistic profiling in times of extreme threat (http://gbs.uni-koeln.de).

The second goal concerns the focus of our own descriptive and theoretical work. We want to achieve a better understanding of the interplay of linguistic means for processing presupposed common knowledge (Common Ground, Stalnaker 2002). This perspective requires a simultaneous acquisition of lexical, morphological, syntactic and prosodic techniques.

The prosodic form of an utterance is to be explained from the interplay of metrical, rhythmical and intonational rules or constraints, which are determined in different ways by single-linguistic and universal competences of multilingual speakers.

 1 We use the term Amerindian to refer to all languages that have evolved directly from the languages spoken in the Americas before the arrival of the first Europeans, without claiming the existence of the macrofamily proposed under this name by Greenberg (1987). The term is somewhat more precise than indigenous and less colonialist than indian.


Language Country Collaborators

Coordination and repository


Raúl Italo Bendezú Araujo, Timo Buchholz, Elizabeth Pankratz (FU Berlin)

Nheengatú and Portugese in Rio Negro


Antônio Lessa, André Amorim (FU Berlin), Edson Baré

Quechua and Spanish in Conchucos


Raúl Italo Bendezú Araujo, Timo Buchholz, Gabriel Barreto, Leonel Menacho

Guaraní and Spanish in Asuncion


Hedy Penner, Élodie Blestel (U Paris III)



Aldo Olate, Jaqueline Caniguan

News and publications

  • 2022:
    • Buchholz, Timo, Raúl Bendezú & Uli Reich. Unpublished. Spanish in contact with Quechua in Northern Peru. In Cerno, Leonardo et al. (eds.). Contact varieties of Spanish and Spanish-lexified contact varieties. Berlin: Mouton de Gruyter.
    • Reich, Uli. Unpublished. Language contact and prosody. In Cerno, Leonardo et al. (eds.). Contact varieties of Spanish and Spanish-lexified contact varieties. Berlin: Mouton de Gruyter.
  • 29.04.2022: Disputation von Timo Buchholz "Intonation between Phrasing and Accent: Spanish and Quechua in Huari"
  • Gabriel, Cristoph & Uli Reich. 2021. The phonology of Romance contact varieties. Manual of Romance Phonetics and Phonology, 462–502. https://doi.org/10.1515/9783110550283-016
  • 13.09.2021: Disputation by Raúl Italo Bendezú Araujo "Identificación y aserción en la marcación del foco del Quechua de Conchucos (Áncash, Perú)"
  • 10.02.2021: The second part of the Conchucos data (Spanish 1) is online in Refubium and from now on freely available for everybody
  • 27.11.2019: Publication in the TAZ about the threat to Brazil's cultural diversity under Bolsonaro's government.
  • Octobre 28, 2019: The first part of the Conchucos data (Quechua 1) is now online in the Refubium and freely available for everybody
  • August-Octobre, 2019: Uli Reich was in Asunción (Paraguay) together with Élodie Blestel (Paris III), and in São Gabriel da Cachoeira (Brazil) with Antônio Lessa, to make recordings of Guaraní and Nheengatu for this project
  • April 23, 2019: Publication in the FU-Tagesspiegel-Beilage (in German) about indigenous languages and the work done by this project
  • Buchholz, Timo & Uli Reich. 2018. The realizational coefficient: Devising a method for empirically determining prominent positions in Conchucos Quechua. In Ingo Feldhausen, Jan Fliessbach & Maria d. M. Vanrell (eds.), Methods in prosody: A Romance language perspective (Studies in Laboratory Phonology 6), 123–164. Berlin: Language Science Press. (available here) (publication website)
  • Reich, Uli. 2018. Presupposed Modality. In Marco García García & Melanie Uth (eds.), Focus realization in Romance and beyond (Studies in language companion series Volume 201), 203–227. Amsterdam, Philadelphia: John Benjamins. (publication website)

Project staff

Timo BuchholzRaúl Italo Bendezu Araujo, Elizabeth Pankratz

The creation and publication of language corpora is time-consuming and requires many steps: the experiments have to be designed and tested and adapted with the help of local experts. Materials have to be selected and created. Tecnically clean recordings must be produced very carefully. Then follows the linguistic processing of the raw data: transcription, translation and morphological glossing, according to uniform and comprehensible criteria (as far as possible based on the Leipzig Glossing Rules). Finally, the data must be technically processed in a way that turns it suitable for online publication. All of this requires many different participants who all make their contributions. Our thanks go to everyone!

Contributors Conchucos Quechua and Spanish

Academic direction Technical direction Local cooperation (Huaraz and Huari) Transcription and translation Quechua (Huaraz) Glosses Quechua (Lima) Transcription and translation Spanish (Berlin)

Raúl Bendezú Araujo

Timo Buchholz

Uli Reich

Elizabeth Pankratz

Gabriel Barreto

Leonel Menacho Lopez

Yuli Alicia Cadillo Tarazona

Merlín de la Cruz Huayanay

Efraín Rodolfo Montes Palacios

Leidy Felyna Rosales Gonzales

Jeny Elvira Rosas Julca

Nelson Yonatan Sánchez Evaristo

Marco Antonio Trigoso Aching

Loreta Alva Mansilla

Claudia Arbaiza Varela

Minerva Lucero Cerna Maguiña

Freyda Nisbeth Schuler Tovar

Alonso Vásquez Aguilar

Magalí Bertola

Catalina Torres Orjuela

Type and structure of the recordings

We achieve a high degree of comparability of the data by using the same recording procedures in all communities, which elicit typical conversational traits for processing the Common Ground in a controlled way. We give priority to communicative games in which context, discourse and lexical material are controlled over very strongly sentence-level controlled procedures such as elicitation of completely given utterances or discourse completion tasks since such experiments are very unfamiliar to the speakers and we want to produce the most natural language data possible. In each experiment, the speakers solve different communicative tasks in the form of a game and are recorded while doing so. All bilingual speakers perform each experiment twice, once in their local non-Romance and once in their local Romance variety. Through the choice of materials (see "Metrical control for experiment materials" in Spanish or English), which are carefully adapted for each language and region with the strong involvement of local experts, as well as the guidelines of the rules of the game, we retain certain control over the content of each conversation as a whole, but the content and form of individual utterance are chosen spontaneously by the speakers. Together with the cooperation partners in the respective countries and regions, we have agreed on a core of common experiments. Some of these are generally known, some belong to linguistic literature and some of them have been specially developed by us. The individual experiments are briefly presented below. More detailed experiment descriptions (in Spanish or English) are linked.

The individual experiments are briefly presented below. More detailed experiment descriptions (in Spanish and English) are given in the respective links.

Common Experiments

Imagenes (spa / eng): The speakers name objects that are shown to them on picture cards.

Memoria (spa / eng): The speakers play a version of the well-known game "Memory", in which they have to identify and remember the positions of cards with certain images.

Maptask (spa / eng): The speakers conduct a conversation that simulates the giving and receiving of directions, but the maps they use do not match. The Maptask experiment was originally developed by Anderson et al. (1991), whose English-language corpus is located at the University of Edinburgh: http://groups.inf.ed.ac.uk/maptask/

Cuento (spa / eng): In an adapted version of the game "Chinese Whispers", the speakers tell and retell a story invented by the researchers.

Quién (spa / eng): The speakers play a version of the game "Who am I?" in which one of them has to guess the identity of a person only the other knows.

The data from all sub-projects will be transcribed, translated into Spanish or Portuguese and into English, morphologically glossed and made available with metadata in English, Spanish and Portuguese in a repository and on Orcid. Each individual corpus will be identified by a link.

Available Corpora

The data from all sub-projects will be transcribed, translated into Spanish or Portuguese and into English, morphologically glossed and made available with metadata in English, Spanish and Portuguese in a repository and on Orcid. Each individual corpus will be identified by a link.

Each individual corpus is represented by 4 files in the repository:

1. The speech recording itself, usually the recording of a single experiment, in 16-bit PCM .wav format.

2. a file in .eaf format, which contains a transcription and glossing temporally aligned with the audio recording at the utterance level, as well as a translation in Spanish and English (if the recording itself is in Spanish or Portuguese, the glossing is omitted). The .eaf format belongs to the annotation programme ELAN, which was developed by the Max Planck Institute for Psycholinguistics and is freely available here: https://tla.mpi.nl/tools/tla-tools/elan/ . There you can also find tutorials and guides on how to use the programme.

3. A file with the same information in .TextGrid format. The TextGrid format belongs to the Praat programme, which was developed by Paul Boersma and David Weenink at the University of Amsterdam and which is the most popular software for analyzing speech data in phonetics and phonology. Praat is also freely available at http://www.fon.hum.uva.nl/praat/, where you can also find tutorials and guides on how to use the programme.

4. A file in .pdf format with metadata on the recording, containing information on the experiment and the speakers.

In the following table you will find all already availabe published corpora. Their number will steadily increase as the project progresses. By clicking on the name of the corpus you will get directly to the corresponding page in the repository of the FU, where you can download the files for each experiment individually.

Corpus Region Researchers Experiment types included language(s)
Quechua 1 Conchucos, Peru

Bendezú Araujo, Raúl


Reich, Uli


Memoria (7x)

Maptask (7x)

Cuento (7x)

Quién (4 x)


Cajas (4x)

Condir (1x)


The project was funded by the German Research Foundation (DFG) from 2015 to 2021 (DFG project number: 274614727). In the first phase of the project (2015-2017), the prosody of the Quechua of Conchucos in the Ancash province of Peru was empirically recorded and theoretically illuminated. In the second phase of the project (2018-2021), it was extended to Spanish & Guaraní in Asunción/Paraguay and Nheengatú & Portuguese in São Gabriel da Cachoeira/Brazil. We are currently preparing a new phase  to be able to research the complex multilingualism in this (Brazilian) region more intensively.

Recommendation for how to cite (using the example of Quechua 1): Bendezú Araujo, Raúl, Timo Buchholz & Uli Reich. 2019. Corpora of American languages: Interactive language games from multilingual Latin America (Quechua 1). Berlin: Freie Universität. https://refubium.fu-berlin.de/handle/fub188/25747