Aarne Talman

PhD Student in Language Technology at University of Helsinki

I'm a PhD student in Language Technology at University of Helsinki. My research focuses on computational semantics, natural language understanding, machine translation and machine learning. I work in Jörg Tiedemann's research group and the ERC & Academy of Finland funded project Found in Translation: Natural Language Understanding with Cross-Lingual Grounding (FoTran). The goal is to build language-independent abstract meaning representations by training neural networks with massively parallel multilingual datasets. Prior to my academic career, I worked in the industry for 12 years in various positions ranging from software development, product management and consulting to leading a consulting practice at a global management consultancy.

I'm also a co-founder and the CEO of Basement AI, a natural language processing start-up and consultancy.


My research focuses on Natural Language Processing and Machine Learning.

Natural Language Semantics

I study computational models of natural language meaning - especially sentence-level meaning representations and natural language inference.

Multilingual NLP

I conduct research on representation learning of natural language meaning in a multilingual setting, utilizing multilingual sentence representations in various transfer learning tasks.

Neural Machine Translation

I develop machine translation models and neural machine translation systems, including speech-to-text translation.


I'm involved in two large-scale research projects.

FoTran: Found in Translation

Found in Translation: Natural Language Understanding with Cross-lingual Grounding is an ERC funded project led by Jörg Tiedemann. With this project, we propose a line of research that will focus on the development of novel data-driven models that can learn language-independent abstract meaning representations from indirect supervision provided by human translations covering a substantial proportion of the linguistic diversity in the world. A guiding principle is cross-lingual grounding, the effect of resolving ambiguities through translation. Eventually, this will lead to language-independent meaning representations and we will test our ideas with multilingual machine translation and tasks that require semantic reasoning and inference.

MeMad: Methods for Managing Audiovisual Data

MeMAD project provides novel methods for efficient re-use and re-purpose of multilingual audiovisual content. These methodologies revolutionize video management and digital storytelling in broadcasting and media production. We go far beyond the state-of-the-art automatic video description methods by making the machine learn from the human. The resulting description is thus not only a time-aligned semantic extraction of objects but makes use of the audio and recognizes action sequences. In the MeMad project my role is to study multimodal and document-level machine translation.

Papers and Talks


  1. Aarne Talman, Antti Suni, Hande Celikkanat, Sofoklis Kakouros, Jörg Tiedemann and Martti Vainio. 2019. Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations. Proceedings of NoDaLiDa 2019. [bibtex] [pdf] [corpus and code]
  2. Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, and Jörg Tiedemann. 2019. The University of Helsinki submissions to the WMT19 news translation task. Proceedings of the Fourth Conference on Machine Translation: Shared Task Papers. [bibtex] [pdf]
  3. Aarne Talman and Stergios Chatzikyriakidis. 2019. Testing the Generalization Power of Neural Network Models Across NLI Benchmarks. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. [bibtex] [pdf]
  4. Aarne Talman, Anssi Yli-Jyrä and Jörg Tiedemann. 2019. Sentence Embeddings in NLI with Iterative Refinement Encoders. Natural Language Engineering 25(4). [bibtex] [pdf] [code]


  1. Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations. 2 October 2019, NodaLiDa 2019, Turku. [pdf]
  2. Neural Network models of NLI fail to capture the general notion of inference, 8 March 2019, CLASP Seminar, University of Gothenburg. [pdf]
  3. State-of-the-Art Natural Language Inference Systems Fail to Capture the Semantics of Inference, 25 October 2018, Research Seminar in Language Technology, University of Helsinki. [pdf]
  4. Natural Language Inference with Hierarchical BiLSTM’s, 28 September 2018, FoTran 2018. [pdf]
  5. Natural Language Inference - Another Triumph for Deep Learning?, 23 November 2017, Research Seminar in Language Technology, University of Helsinki. [pdf]



  1. Prosody: A system for predicting prosodic prominence from written text.
  2. Natural Language Inference: Natural language inference system written in Python and PyTorch implementing the HBMP sentence encoder.


  1. Helsinki Prosody Corpus: The prosody corpus contains automatically generated, high quality prosodic annotations for the LibriTTS corpus (Zen et al. 2019) using the Continuous Wavelet Transform Annotation method (Suni et al. 2017).
    • Language: English
    • License: CC BY 4.0
    • Paper


University of Helsinki



Download the full CV as a pdf



  • 2018 - present, Doctoral Candidate, Language Technology, University of Helsinki
    Working on computational semantics and natural language processing.
  • 2019 - present, Founder & CEO, Basement AI
    Basement AI is a Nordic artificial intelligence research lab and consulting company specializing in natural language processing and machine learning.
  • 2015 - 2018, Associate Director, Consulting, Gartner.
  • 2012 - 2015, Consultant, Accenture.
  • 2011 - 2012, Research Student, London School of Economics.
  • 2009 - 2011, Product Manager, Nokia.
  • 2008 - 2009, Manager, Nokia.
  • 2006 - 2008, Systems Analyst, Tieto.
  • 2006 - 2006, Software Developer, Valuatum.