​ The following speakers have graciously agreed to give keynotes at EMNLP 2021.

Ido Dagan

Where next? Towards multi-text consumption via three inspired research lines

While the Dominican Republic is obviously my next exciting travel destination, in this talk I’ll share what I consider as exciting destinations for next steps in NLP research. I’ll start by pointing at a motivating grand application challenge: supporting effective human consumption of multi-text information -- an invaluable goal which has seen very little progress since search engine inception. I’ll then describe three individual, yet synergetic, research lines that were inspired by seeking this goal. First, supporting multi-text consumption is inherently an interactive process, where the user assists and directs the system in presenting most valuable information. As a first step, we propose a formulation of interactive summarization, turning it into a viable and measurable research task by extending summarization evaluation methods to the interactive setting. Second, presenting scattered information in a concise and consolidated manner requires extensive methods for linking cross-text information. Promoting such a research line, we propose several infrastructure contributions to cross-document coreference resolution, extend the scope of matching cross-text information to the interpretable levels of proposition spans and predicate-argument relations, and design a Cross-document Language Model (CDLM) which is geared for the multi-text setting. Lastly, we suggest that linking and consolidating multi-text information in a refined and controllable manner can benefit from some explicit interpretable representations of textual information. Rather than following traditional formal semantic representations, we propose a midway between those and opaque distributed neural representations. Text information is decomposed into a set of minimal natural language question-answer pairs, providing a generally appealing semi-structured representation for propositions in a single text, as well as a basis for aligning cross-text information units. Altogether, we advocate the promise of each individual research line for NLP progress, while suggesting human consumption of multi-text information as an inspiring research framework with a huge applied value.

Ido Dagan is a Professor at the Department of Computer Science at Bar-Ilan University, Israel, the founder of the Natural Language Processing (NLP) Lab at Bar-Ilan, the founder and head of the nationally funded Bar-Ilan University Data Science Institute, and a Fellow of the Association for Computational Linguistics (ACL). His interests are in applied semantic processing, focusing on textual inference, natural open semantic representations, consolidation and summarization of multi-text information, and interactive text summarization and exploration. Dagan and colleagues initiated and promoted textual entailment recognition (RTE, later aka NLI) as a generic empirical task. He was the President of the ACL in 2010 and served on its Executive Committee during 2008-2011. In that capacity, he led the establishment of the journal Transactions of the Association for Computational Linguistics, which became one of two premiere journals in NLP. Dagan received his B.A. summa cum laude and his Ph.D. (1992) in Computer Science from the Technion. He was a research fellow at the IBM Haifa Scientific Center (1991) and a Member of Technical Staff at AT&T Bell Laboratories (1992-1994). During 1998-2003 he was co-founder and CTO of FocusEngine and VP of Technology of LingoMotors, and has been regularly consulting in the industry. His academic research has involved extensive industrial collaboration, including funds from IBM, Google, Thomson-Reuters, Bloomberg, Intel and Facebook, as well as collaboration with local companies under funded projects of the Israel Innovation Authority.

Evelina Fedorenko

The language system in the human brain

The goal of my research program is to understand the representations and computations that enable us to share complex thoughts with one another via language, and their neural implementation. A decade ago, I developed a robust new approach to the study of language in the brain based on identifying language-responsive cortex functionally in individual participants. Originally developed for fMRI, we have since extended this approach to other modalities, like intracranial recordings. Using this functional-localization approach, I identified and characterized a set of frontal and temporal brain areas that i) support language comprehension and production (spoken and written); ii) are robustly separable from the lower-level perceptual (e.g., speech processing) and motor (e.g., articulation) brain areas; iii) are spatially and functionally similar across diverse languages (>40 languages from 11 language families); and iv) form a functionally integrated system with substantial redundancy across different components. In this talk, I will highlight a few discoveries from the last decade and argue that the primary goal of language is efficient information transfer rather than enabling complex thought, as has been argued in one prominent philosophical and linguistic tradition (e.g., Wittgenstein, 1921; Berwick & Chomsky, 2016). I will use two kinds of evidence to make this argument. First, I will examine the relationship between language and other aspects of cognition, including social cognitive abilities and complex thought/reasoning. I will show that the language brain regions are highly selective for language over diverse non-linguistic processes while also showing a deep and intriguing link with a system that supports social cognition. And second, I will examine different properties of language and argue that language both has a) properties that make it well-suited for communication, and b) properties that make it not suitable for complex thought. Both of these lines of evidence support the communicative function of language, and suggest that the idea that language evolved to allow for more complexity in thought is unlikely.

Dr. Fedorenko is a cognitive neuroscientist who studies the human language system. She received her bachelor’s degree from Harvard University in 2002, and her Ph.D. from the Massachusetts Institute of Technology in 2007. She was then awarded a K99R00 career development award from the National Institute for Child Health and Human Development at the U.S. National Institutes of Health. In 2014, she joined the faculty at Harvard Medical School/Massachusetts General Hospital in Boston, and in 2019 she returned to MIT where she is currently the Frederick A. (1971) and Carole J. Middleton Career Development Associate Professor of Neuroscience in the Brain and Cognitive Sciences Department and the McGovern Institute for Brain Research. Dr. Fedorenko uses fMRI, intracranial recordings and stimulation, EEG/ERPs, MEG, as well as computational modeling, to study adults and children, including those with developmental and acquired brain disorders.

Steven Bird

LT4All!? Rethinking the Agenda

The majority of the world’s languages are oral, emergent, untranslatable, and tightly coupled to a place. Yet it seems that the agenda is to supply all languages with the technologies that have been developed for written languages. It is as though standardised writing were the optimal way to safeguard the future of any language. It is as though the function of a language is exclusively for transmitting information, and that the same information can be rendered into any language. It is as though we can capture and model language data independently of people, purpose, and place. What would it be like if language technologies respected the self-determination of a local speech community and supported aspirations concerning the local repertoire of speech varieties? The answer will be different in different places, but there may be value in taking a close look at an individual community and trying to discern broader themes. In this talk I will share from my experience of living and working in a remote Aboriginal community in the far north of Australia. Here, local people have been teaching me participatory, relational, strengths-based approaches that my students and I have been exploring in the design of language technologies. I will reflect on five years of personal experiences in this space and share thoughts concerning an agenda for language technology in the interest of minority speech communities, and hopes for creating a world that sustains its languages.

Steven Bird has spent 25 years pursuing scalable computational methods for capturing, enriching, and analysing data from endangered languages, drawing on fieldwork in West Africa, South America, and Melanesia. Over the past 5 years he has begun to work with remote Aboriginal communities in northern Australia. Steven has held academic positions at U Edinburgh, U Pennsylvania, UC Berkeley, and U Melbourne. He currently holds the positions of professor at Charles Darwin University, linguist at Nawarddeken Academy, and producer at