TY - JOUR
T1 - Semantic stability in wikipedia
AU - Stanisavljevic, Darko
AU - Hasani-Mavriqi, Ilire
AU - Lex, Elisabeth
AU - Strohmaier, Markus
AU - Helic, Denis
PY - 2017/1/1
Y1 - 2017/1/1
N2 - In this paper we assess the semantic stability of Wikipedia by investigating the dynamics of Wikipedia articles’ revisions over time. In a semantically stable system, articles are infrequently edited, whereas in unstable systems, article content changes more frequently. In other words, in a stable system, the Wikipedia community has reached consensus on the majority of articles. In our work, we measure semantic stability using the Rank Biased Overlap method. To that end, we preprocess Wikipedia dumps to obtain a sequence of plain-text article revisions, whereas each revision is represented as a TF-IDF vector. To measure the similarity between consequent article revisions, we calculate Rank Biased Overlap on subsequent term vectors. We evaluate our approach on 10 Wikipedia language editions including the five largest language editions as well as five randomly selected small language editions. Our experimental results reveal that even in policy driven collaboration networks such as Wikipedia, semantic stability can be achieved. However, there are differences on the velocity of the semantic stability process between small and large Wikipedia editions. Small editions exhibit faster and higher semantic stability than large ones. In particular, in large Wikipedia editions, a higher number of successive revisions is needed in order to reach a certain semantic stability level, whereas, in small Wikipedia editions, the number of needed successive revisions is much lower for the same level of semantic stability.
AB - In this paper we assess the semantic stability of Wikipedia by investigating the dynamics of Wikipedia articles’ revisions over time. In a semantically stable system, articles are infrequently edited, whereas in unstable systems, article content changes more frequently. In other words, in a stable system, the Wikipedia community has reached consensus on the majority of articles. In our work, we measure semantic stability using the Rank Biased Overlap method. To that end, we preprocess Wikipedia dumps to obtain a sequence of plain-text article revisions, whereas each revision is represented as a TF-IDF vector. To measure the similarity between consequent article revisions, we calculate Rank Biased Overlap on subsequent term vectors. We evaluate our approach on 10 Wikipedia language editions including the five largest language editions as well as five randomly selected small language editions. Our experimental results reveal that even in policy driven collaboration networks such as Wikipedia, semantic stability can be achieved. However, there are differences on the velocity of the semantic stability process between small and large Wikipedia editions. Small editions exhibit faster and higher semantic stability than large ones. In particular, in large Wikipedia editions, a higher number of successive revisions is needed in order to reach a certain semantic stability level, whereas, in small Wikipedia editions, the number of needed successive revisions is much lower for the same level of semantic stability.
UR - http://www.scopus.com/inward/record.url?scp=85007337958&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-50901-3_31
DO - 10.1007/978-3-319-50901-3_31
M3 - Article
AN - SCOPUS:85007337958
SN - 1860-949X
VL - 693
SP - 385
EP - 395
JO - Studies in Computational Intelligence
JF - Studies in Computational Intelligence
ER -