Abstract
Extracting keyphrases and entities can be an important first step in many Natural Language Processing (NLP) and Information Retrieval (IR) Tasks. There are many datasets to train models for standard entities, but it is hard to find data that can be used for more domain specific applications.
The types of keyphrases someone wants to extract vary enormously between different fields, which makes otherwise successful algorithms perform poorly on them. One of the fields where this is the case is Physics, specifically to process physics publications and patents. In comparison to news articles or social media, the typical entities like Organization, Location or Person are not helpful when extracting impor-
tant information from publications or patents. There are few dataset annotations for specific domains, and even when they exist they are not easily transferable. This work contributes an annotated dataset for the facilitation of information retrieval and extraction in Physics. The dataset spans Physics Patents as well as Publications. It covers both of these document types to enable future work between them. This can
facilitate future work such as tracking inventions from the first emergence in a publication to the adaption in a patent
The types of keyphrases someone wants to extract vary enormously between different fields, which makes otherwise successful algorithms perform poorly on them. One of the fields where this is the case is Physics, specifically to process physics publications and patents. In comparison to news articles or social media, the typical entities like Organization, Location or Person are not helpful when extracting impor-
tant information from publications or patents. There are few dataset annotations for specific domains, and even when they exist they are not easily transferable. This work contributes an annotated dataset for the facilitation of information retrieval and extraction in Physics. The dataset spans Physics Patents as well as Publications. It covers both of these document types to enable future work between them. This can
facilitate future work such as tracking inventions from the first emergence in a publication to the adaption in a patent
Originalsprache | englisch |
---|---|
Titel | Proceedings of the 3rd International Open Search Symposium #ossym2021 |
Untertitel | OSSYM 2021 |
Seiten | 45-49 |
ISBN (elektronisch) | 978-92-9083-633-9 |
DOIs | |
Publikationsstatus | Veröffentlicht - 2022 |
Veranstaltung | 3rd International Open Search Symposium: OSSYM 2021 - Virtuell, Österreich Dauer: 11 Okt. 2021 → 13 Okt. 2021 |
Konferenz
Konferenz | 3rd International Open Search Symposium |
---|---|
Kurztitel | OSSYM 2021 |
Land/Gebiet | Österreich |
Ort | Virtuell |
Zeitraum | 11/10/21 → 13/10/21 |
Fields of Expertise
- Information, Communication & Computing