ATLAS-MVSNet: Attention Layers for Feature Extraction and Cost Volume Regularization in Multi-View Stereo

Rafael Weilharter; Friedrich Fraundorfer

doi:10.1109/ICPR56361.2022.9956633

ATLAS-MVSNet: Attention Layers for Feature Extraction and Cost Volume Regularization in Multi-View Stereo

Rafael Weilharter, Friedrich Fraundorfer

Institut für Maschinelles Sehen und Darstellen (7100)

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

We present ATLAS-MVSNet, an end-to-end deep learning architecture relying on local attention layers for depth map inference from multi-view images. Distinct from existing works, we introduce a novel module design for neural networks, which we termed hybrid attention block, that utilizes the latest insights into attention in vision models. We are able to reap the benefits of attention in both, the carefully designed multi-stage feature extraction network and the cost volume regularization network. Our new approach displays significant improvement over its counterpart based purely on convolutions. While many state-of-the-art methods need multiple high-end GPUs in the training phase, we are able to train our network on a single consumer grade GPU. ATLAS-MVSNet exhibits excellent performance, especially in terms of accuracy, on the DTU dataset. Furthermore, ATLAS-MVSNet ranks amongst the top published methods on the online Tanks and Temples benchmark.

Originalsprache	englisch
Titel	2022 26th International Conference on Pattern Recognition, ICPR 2022
Herausgeber (Verlag)	ACM/IEEE
Seiten	3557-3563
Seitenumfang	7
ISBN (elektronisch)	9781665490627
ISBN (Print)	978-1-6654-9063-4
DOIs	https://doi.org/10.1109/ICPR56361.2022.9956633
Publikationsstatus	Veröffentlicht - 25 Aug. 2022
Veranstaltung	26th International Conference on Pattern Recognition: ICPR 2022 - Montreal, Kanada Dauer: 21 Aug. 2022 → 25 Aug. 2022

Konferenz

Konferenz	26th International Conference on Pattern Recognition
Kurztitel	ICPR 2022
Land/Gebiet	Kanada
Ort	Montreal
Zeitraum	21/08/22 → 25/08/22

ASJC Scopus subject areas

Maschinelles Sehen und Mustererkennung

Zugriff auf Dokument

10.1109/ICPR56361.2022.9956633

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9956633

Andere Dateien und Links

Verknüpfung zur Publikation in Scopus

Dieses zitieren

Weilharter, R & Fraundorfer, F 2022, ATLAS-MVSNet: Attention Layers for Feature Extraction and Cost Volume Regularization in Multi-View Stereo. in 2022 26th International Conference on Pattern Recognition, ICPR 2022., 9956633, ACM/IEEE, S. 3557-3563, 26th International Conference on Pattern Recognition, Montreal, Quebec, Kanada, 21/08/22. https://doi.org/10.1109/ICPR56361.2022.9956633

@inproceedings{e078a0507dfe4c36b5cfd8285e58c8e4,

title = "ATLAS-MVSNet: Attention Layers for Feature Extraction and Cost Volume Regularization in Multi-View Stereo",

abstract = "We present ATLAS-MVSNet, an end-to-end deep learning architecture relying on local attention layers for depth map inference from multi-view images. Distinct from existing works, we introduce a novel module design for neural networks, which we termed hybrid attention block, that utilizes the latest insights into attention in vision models. We are able to reap the benefits of attention in both, the carefully designed multi-stage feature extraction network and the cost volume regularization network. Our new approach displays significant improvement over its counterpart based purely on convolutions. While many state-of-the-art methods need multiple high-end GPUs in the training phase, we are able to train our network on a single consumer grade GPU. ATLAS-MVSNet exhibits excellent performance, especially in terms of accuracy, on the DTU dataset. Furthermore, ATLAS-MVSNet ranks amongst the top published methods on the online Tanks and Temples benchmark.",

keywords = "Training, Three-dimensional displays, Costs, Memory management, Neural networks, Graphics processing units, Deep architecture",

author = "Rafael Weilharter and Friedrich Fraundorfer",

year = "2022",

month = aug,

day = "25",

doi = "10.1109/ICPR56361.2022.9956633",

language = "English",

isbn = "978-1-6654-9063-4",

pages = "3557--3563",

booktitle = "2022 26th International Conference on Pattern Recognition, ICPR 2022",

publisher = "ACM/IEEE",

note = "26th International Conference on Pattern Recognition : ICPR 2022, ICPR 2022 ; Conference date: 21-08-2022 Through 25-08-2022",

}

TY - GEN

T1 - ATLAS-MVSNet: Attention Layers for Feature Extraction and Cost Volume Regularization in Multi-View Stereo

AU - Weilharter, Rafael

AU - Fraundorfer, Friedrich

PY - 2022/8/25

Y1 - 2022/8/25

N2 - We present ATLAS-MVSNet, an end-to-end deep learning architecture relying on local attention layers for depth map inference from multi-view images. Distinct from existing works, we introduce a novel module design for neural networks, which we termed hybrid attention block, that utilizes the latest insights into attention in vision models. We are able to reap the benefits of attention in both, the carefully designed multi-stage feature extraction network and the cost volume regularization network. Our new approach displays significant improvement over its counterpart based purely on convolutions. While many state-of-the-art methods need multiple high-end GPUs in the training phase, we are able to train our network on a single consumer grade GPU. ATLAS-MVSNet exhibits excellent performance, especially in terms of accuracy, on the DTU dataset. Furthermore, ATLAS-MVSNet ranks amongst the top published methods on the online Tanks and Temples benchmark.

AB - We present ATLAS-MVSNet, an end-to-end deep learning architecture relying on local attention layers for depth map inference from multi-view images. Distinct from existing works, we introduce a novel module design for neural networks, which we termed hybrid attention block, that utilizes the latest insights into attention in vision models. We are able to reap the benefits of attention in both, the carefully designed multi-stage feature extraction network and the cost volume regularization network. Our new approach displays significant improvement over its counterpart based purely on convolutions. While many state-of-the-art methods need multiple high-end GPUs in the training phase, we are able to train our network on a single consumer grade GPU. ATLAS-MVSNet exhibits excellent performance, especially in terms of accuracy, on the DTU dataset. Furthermore, ATLAS-MVSNet ranks amongst the top published methods on the online Tanks and Temples benchmark.

KW - Training

KW - Three-dimensional displays

KW - Costs

KW - Memory management

KW - Neural networks

KW - Graphics processing units

KW - Deep architecture

UR - http://www.scopus.com/inward/record.url?scp=85143627578&partnerID=8YFLogxK

U2 - 10.1109/ICPR56361.2022.9956633

DO - 10.1109/ICPR56361.2022.9956633

M3 - Conference paper

SN - 978-1-6654-9063-4

SP - 3557

EP - 3563

BT - 2022 26th International Conference on Pattern Recognition, ICPR 2022

PB - ACM/IEEE

T2 - 26th International Conference on Pattern Recognition

Y2 - 21 August 2022 through 25 August 2022

ER -

ATLAS-MVSNet: Attention Layers for Feature Extraction and Cost Volume Regularization in Multi-View Stereo

Abstract

Konferenz

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Dieses zitieren