Plagiarism detection in SQL student assignments

Nikolai Scerbakov, Alexander Schukin, Oleg Sabinin

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

In this paper we present an original method of detecting similarity between SQL fragments. The method is based on identifying so-called "SQL lexemes" - persistent elements of an SQL statement, and "SQL variables" - easily modifiable elements of SQL statements. Thus, any SQL statements can be replaced with a so-called token - sequence of SQL lexemes and SQL variables. Distance between SQL tokens can be calculated using such a well-known algorithm as Levenshtein Metric. Small values of Levenshtein distance between tokens detect such SQL statements that were built by modifications of others.
We also present first practical results of actual application of the algorithm, and discuss further developments of the method.
Original languageEnglish
Title of host publicationProceedings of 20th International Conference on Interactive Collaborative Learning
Pages321-326
Number of pages6
Publication statusPublished - 2017

ASJC Scopus subject areas

  • Computer Science(all)

Cite this