Creating a Dataset for Keyphrase Extraction in Physics Publications and Patents

André Rattinger, Christian Gütl

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

Extracting keyphrases and entities can be an important first step in many Natural Language Processing (NLP) and Information Retrieval (IR) Tasks. There are many datasets to train models for standard entities, but it is hard to find data that can be used for more domain specific applications.
The types of keyphrases someone wants to extract vary enormously between different fields, which makes otherwise successful algorithms perform poorly on them. One of the fields where this is the case is Physics, specifically to process physics publications and patents. In comparison to news articles or social media, the typical entities like Organization, Location or Person are not helpful when extracting impor-
tant information from publications or patents. There are few dataset annotations for specific domains, and even when they exist they are not easily transferable. This work contributes an annotated dataset for the facilitation of information retrieval and extraction in Physics. The dataset spans Physics Patents as well as Publications. It covers both of these document types to enable future work between them. This can
facilitate future work such as tracking inventions from the first emergence in a publication to the adaption in a patent
Originalspracheenglisch
TitelProceedings of the 3rd International Open Search Symposium #ossym2021
UntertitelOSSYM 2021
Seiten45-49
ISBN (elektronisch)978-92-9083-633-9
DOIs
PublikationsstatusVeröffentlicht - 2022
Veranstaltung3rd International Open Search Symposium: OSSYM 2021 - Virtuell, Österreich
Dauer: 11 Okt. 202113 Okt. 2021

Konferenz

Konferenz3rd International Open Search Symposium
KurztitelOSSYM 2021
Land/GebietÖsterreich
OrtVirtuell
Zeitraum11/10/2113/10/21

Fields of Expertise

  • Information, Communication & Computing

Fingerprint

Untersuchen Sie die Forschungsthemen von „Creating a Dataset for Keyphrase Extraction in Physics Publications and Patents“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren