LEIA: Linguistic Embeddings for the Identification of Affect

Segun Taofeek Aroyehun; Lukas Malik; Hannah Metzler; Nikolas Haimerl; Anna Di Natale; David Garcia

doi:10.1140/epjds/s13688-023-00427-0

LEIA: Linguistic Embeddings for the Identification of Affect

Segun Taofeek Aroyehun, Lukas Malik, Hannah Metzler, Nikolas Haimerl, Anna Di Natale, David Garcia^*

^*Korrespondierende/r Autor/-in für diese Arbeit

Institute of Interactive Systems and Data Science (7060)

Publikation: Beitrag in einer Fachzeitschrift › Artikel › Begutachtung

Abstract

The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA’s robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer.

Originalsprache	englisch
Aufsatznummer	52
Fachzeitschrift	EPJ Data Science
Jahrgang	12
Ausgabenummer	1
DOIs	https://doi.org/10.1140/epjds/s13688-023-00427-0
Publikationsstatus	Veröffentlicht - Dez. 2023

ASJC Scopus subject areas

Modellierung und Simulation
Angewandte Informatik
Computational Mathematics

Zugriff auf Dokument

10.1140/epjds/s13688-023-00427-0Lizenz: CC BY 4.0

Andere Dateien und Links

Verknüpfung zur Publikation in Scopus

Dieses zitieren

@article{ec84ef65ca2645bea67cc3ffec6be227,

title = "LEIA: Linguistic Embeddings for the Identification of Affect",

abstract = "The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA{\textquoteright}s robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer.",

keywords = "Emotion detection, Natural language processing, Social media, Transfer learning",

author = "Aroyehun, {Segun Taofeek} and Lukas Malik and Hannah Metzler and Nikolas Haimerl and {Di Natale}, Anna and David Garcia",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s).",

year = "2023",

month = dec,

doi = "10.1140/epjds/s13688-023-00427-0",

language = "English",

volume = "12",

journal = "EPJ Data Science",

issn = "2193-1127",

publisher = "SpringerOpen",

number = "1",

}

TY - JOUR

T1 - LEIA

T2 - Linguistic Embeddings for the Identification of Affect

AU - Aroyehun, Segun Taofeek

AU - Malik, Lukas

AU - Metzler, Hannah

AU - Haimerl, Nikolas

AU - Di Natale, Anna

AU - Garcia, David

PY - 2023/12

Y1 - 2023/12

N2 - The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA’s robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer.

AB - The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA’s robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer.

KW - Emotion detection

KW - Natural language processing

KW - Social media

KW - Transfer learning

UR - http://www.scopus.com/inward/record.url?scp=85176955571&partnerID=8YFLogxK

U2 - 10.1140/epjds/s13688-023-00427-0

DO - 10.1140/epjds/s13688-023-00427-0

M3 - Article

AN - SCOPUS:85176955571

SN - 2193-1127

VL - 12

JO - EPJ Data Science

JF - EPJ Data Science

IS - 1

M1 - 52

ER -

LEIA: Linguistic Embeddings for the Identification of Affect

Abstract

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Dieses zitieren