Abstract
Digital Libraries benefit from the use of text classification strategies since they are enablers for performing many document management tasks like Information Retrieval. The effectiveness of such classification strategies depends on the amount of available data and the classifier used. The former leads to the design of data augmentation solutions where new samples are generated into small datasets based on the semantic similarity between existing samples and concepts defined within external linguistic resources. The latter relates to the capability of finding, which is the best learning principle to adopt for designing an effective classification strategy suitable for the problem. In this work, we propose a neural-based architecture thought for addressing the text classification problem on small datasets. Our architecture is based on BERT equipped with one further layer using the sigmoid function. The hypothesis we want to verify is that by using a BERT-based architecture, the vectors' semantic learned by the BERT model can perform effective classification on small datasets without the use of data augmentation strategies. We observed improvements up to 14% in the accuracy and up to 23% in the f-score with respect to baseline classifiers exploiting data augmentation.
Original language | English |
---|---|
Title of host publication | JCDL' 20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 |
Publisher | Association of Computing Machinery |
Pages | 319–327 |
Number of pages | 9 |
ISBN (Electronic) | 9781450375856 |
ISBN (Print) | 9781450375856 |
DOIs | |
Publication status | Published - Aug 2020 |
Event | 2020 ACM/IEEE Joint Conference on Digital Libraries - Virtuell, China Duration: 1 Aug 2020 → 5 Aug 2020 |
Conference
Conference | 2020 ACM/IEEE Joint Conference on Digital Libraries |
---|---|
Abbreviated title | JCDL 2020 |
Country/Territory | China |
City | Virtuell |
Period | 1/08/20 → 5/08/20 |
Keywords
- Data augmentation
- Small datasets
- Text classification
ASJC Scopus subject areas
- Engineering(all)