Understanding Privacy Awareness in Android App Descriptions Using Deep Learning

Johannes Feichtner; Stefan Gruber

doi:10.1145/3374664.3375730

Understanding Privacy Awareness in Android App Descriptions Using Deep Learning

Johannes Feichtner, Stefan Gruber

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

Permissions are a key factor in Android to protect users' privacy. As it is often not obvious why applications require certain permissions, developer-provided descriptions in Google Play and third-party markets should explain to users how sensitive data is processed. Reliably recognizing whether app descriptions cover permission usage is challenging due to the lack of enforced quality standards and a variety of ways developers can express privacy-related facts.

We introduce a machine learning-based approach to identify critical discrepancies between developer-described app behavior and permission usage. By combining state-of-the-art techniques in natural language processing (NLP) and deep learning, we design a convolutional neural network (CNN) for text classification that captures the relevance of words and phrases in app descriptions in relation to the usage of dangerous permissions. Our system predicts the likelihood that an app requires certain permissions and can warn about descriptions in which the requested access to sensitive user data and system features is textually not represented.

We evaluate our solution on 77,000 real-world app descriptions and find that we can identify individual groups of dangerous permissions with a precision between 71% and 93%. To highlight the impact of individual words and phrases, we employ a model explanation algorithm and demonstrate that our technique can successfully bridge the semantic gap between described app functionality and its access to security- and privacy-sensitive resources.

Originalsprache	englisch
Titel	CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy
Erscheinungsort	New York
Herausgeber (Verlag)	Association of Computing Machinery
Seiten	203-214
Seitenumfang	12
ISBN (elektronisch)	978-1-4503-7107-0
DOIs	https://doi.org/10.1145/3374664.3375730
Publikationsstatus	Veröffentlicht - 16 März 2020
Veranstaltung	10th ACM Conference on Data and Application Security and Privacy - New Orleans, Virtuell, USA / Vereinigte Staaten Dauer: 3 Aug. 2020 → 4 Aug. 2020 Konferenznummer: 20 http://www.codaspy.org/2020/

Publikationsreihe

Name	CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy

Konferenz

Konferenz	10th ACM Conference on Data and Application Security and Privacy
Kurztitel	CODASPY
Land/Gebiet	USA / Vereinigte Staaten
Ort	New Orleans, Virtuell
Zeitraum	3/08/20 → 4/08/20
Internetadresse	http://www.codaspy.org/2020/

ASJC Scopus subject areas

Software
Angewandte Informatik

Zugriff auf Dokument

10.1145/3374664.3375730

mainEndgültige, publizierte Fassung, 642 KB

Andere Dateien und Links

http://www.scopus.com/inward/record.url?scp=85083394819&partnerID=8YFLogxK

A-SIT - Zentrum für sichere Informationstechnologie Austria
Stranacher, K., Dominikus, S., Leitold, H., Marsalek, A., Teufl, P., Bauer, W., Aigner, M. J., Rössler, T., Neuherz, E., Dietrich, K., Zefferer, T., Mangard, S., Payer, U., Orthacker, C., Lipp, P., Reiter, A., Knall, T., Bratko, H., Bonato, M., Suzic, B., Zwattendorfer, B., Kreuzhuber, S., Oswald, M. E., Tauber, A., Posch, R., Bratko, D., Feichtner, J., Ivkovic, M., Reimair, F., Wolkerstorfer, J. & Scheibelhofer, K.
21/05/99 → 6/08/20
Projekt: Arbeitsgebiet

Dieses zitieren

Feichtner, J., & Gruber, S. (2020). Understanding Privacy Awareness in Android App Descriptions Using Deep Learning. in CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy (S. 203-214). (CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy). Association of Computing Machinery. https://doi.org/10.1145/3374664.3375730

Understanding Privacy Awareness in Android App Descriptions Using Deep Learning. / Feichtner, Johannes; Gruber, Stefan.
CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy. New York: Association of Computing Machinery, 2020. S. 203-214 (CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy).

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Feichtner, J & Gruber, S 2020, Understanding Privacy Awareness in Android App Descriptions Using Deep Learning. in CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy. CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy, Association of Computing Machinery, New York, S. 203-214, 10th ACM Conference on Data and Application Security and Privacy, New Orleans, Virtuell, Louisiana, USA / Vereinigte Staaten, 3/08/20. https://doi.org/10.1145/3374664.3375730

Feichtner J, Gruber S. Understanding Privacy Awareness in Android App Descriptions Using Deep Learning. in CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy. New York: Association of Computing Machinery. 2020. S. 203-214. (CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy). doi: 10.1145/3374664.3375730

Feichtner, Johannes ; Gruber, Stefan. / Understanding Privacy Awareness in Android App Descriptions Using Deep Learning. CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy. New York : Association of Computing Machinery, 2020. S. 203-214 (CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy).

@inproceedings{7cdceee5fa004cbe9f7b1e9b173c0df1,

title = "Understanding Privacy Awareness in Android App Descriptions Using Deep Learning",

abstract = "Permissions are a key factor in Android to protect users' privacy. As it is often not obvious why applications require certain permissions, developer-provided descriptions in Google Play and third-party markets should explain to users how sensitive data is processed. Reliably recognizing whether app descriptions cover permission usage is challenging due to the lack of enforced quality standards and a variety of ways developers can express privacy-related facts.We introduce a machine learning-based approach to identify critical discrepancies between developer-described app behavior and permission usage. By combining state-of-the-art techniques in natural language processing (NLP) and deep learning, we design a convolutional neural network (CNN) for text classification that captures the relevance of words and phrases in app descriptions in relation to the usage of dangerous permissions. Our system predicts the likelihood that an app requires certain permissions and can warn about descriptions in which the requested access to sensitive user data and system features is textually not represented.We evaluate our solution on 77,000 real-world app descriptions and find that we can identify individual groups of dangerous permissions with a precision between 71% and 93%. To highlight the impact of individual words and phrases, we employ a model explanation algorithm and demonstrate that our technique can successfully bridge the semantic gap between described app functionality and its access to security- and privacy-sensitive resources.",

keywords = "Android, Machine Learning, Description, Permission, NLP, CNN, cnn, nlp, android, description, permission, machine learning",

author = "Johannes Feichtner and Stefan Gruber",

year = "2020",

month = mar,

day = "16",

doi = "10.1145/3374664.3375730",

language = "English",

series = "CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy",

publisher = "Association of Computing Machinery",

pages = "203--214",

booktitle = "CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy",

address = "United States",

note = "10th ACM Conference on Data and Application Security and Privacy, CODASPY ; Conference date: 03-08-2020 Through 04-08-2020",

url = "http://www.codaspy.org/2020/",

}

TY - GEN

T1 - Understanding Privacy Awareness in Android App Descriptions Using Deep Learning

AU - Feichtner, Johannes

AU - Gruber, Stefan

N1 - Conference code: 20

PY - 2020/3/16

Y1 - 2020/3/16

N2 - Permissions are a key factor in Android to protect users' privacy. As it is often not obvious why applications require certain permissions, developer-provided descriptions in Google Play and third-party markets should explain to users how sensitive data is processed. Reliably recognizing whether app descriptions cover permission usage is challenging due to the lack of enforced quality standards and a variety of ways developers can express privacy-related facts.We introduce a machine learning-based approach to identify critical discrepancies between developer-described app behavior and permission usage. By combining state-of-the-art techniques in natural language processing (NLP) and deep learning, we design a convolutional neural network (CNN) for text classification that captures the relevance of words and phrases in app descriptions in relation to the usage of dangerous permissions. Our system predicts the likelihood that an app requires certain permissions and can warn about descriptions in which the requested access to sensitive user data and system features is textually not represented.We evaluate our solution on 77,000 real-world app descriptions and find that we can identify individual groups of dangerous permissions with a precision between 71% and 93%. To highlight the impact of individual words and phrases, we employ a model explanation algorithm and demonstrate that our technique can successfully bridge the semantic gap between described app functionality and its access to security- and privacy-sensitive resources.

AB - Permissions are a key factor in Android to protect users' privacy. As it is often not obvious why applications require certain permissions, developer-provided descriptions in Google Play and third-party markets should explain to users how sensitive data is processed. Reliably recognizing whether app descriptions cover permission usage is challenging due to the lack of enforced quality standards and a variety of ways developers can express privacy-related facts.We introduce a machine learning-based approach to identify critical discrepancies between developer-described app behavior and permission usage. By combining state-of-the-art techniques in natural language processing (NLP) and deep learning, we design a convolutional neural network (CNN) for text classification that captures the relevance of words and phrases in app descriptions in relation to the usage of dangerous permissions. Our system predicts the likelihood that an app requires certain permissions and can warn about descriptions in which the requested access to sensitive user data and system features is textually not represented.We evaluate our solution on 77,000 real-world app descriptions and find that we can identify individual groups of dangerous permissions with a precision between 71% and 93%. To highlight the impact of individual words and phrases, we employ a model explanation algorithm and demonstrate that our technique can successfully bridge the semantic gap between described app functionality and its access to security- and privacy-sensitive resources.

KW - Android

KW - Machine Learning

KW - Description

KW - Permission

KW - NLP

KW - CNN

KW - cnn

KW - nlp

KW - android

KW - description

KW - permission

KW - machine learning

UR - http://www.scopus.com/inward/record.url?scp=85083394819&partnerID=8YFLogxK

U2 - 10.1145/3374664.3375730

DO - 10.1145/3374664.3375730

M3 - Conference paper

T3 - CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy

SP - 203

EP - 214

BT - CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy

PB - Association of Computing Machinery

CY - New York

T2 - 10th ACM Conference on Data and Application Security and Privacy

Y2 - 3 August 2020 through 4 August 2020

ER -

Understanding Privacy Awareness in Android App Descriptions Using Deep Learning

Abstract

Publikationsreihe

Konferenz

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Projekte

A-SIT - Zentrum für sichere Informationstechnologie Austria

Dieses zitieren