Utilizing a modified Levenshtien distance calculation, Galileo produces a percent similarity value between all submissions’ content, which allows for a quick way of determining if submissions are far too similar.
If a submissions content contained just the string "hello world", and another submission contained just the string "hello worlT", the percent similarity would be 90.9%. In other words, in a string of 11 characters, 1 change was necessary to alter the string to match the other. This method scales out for entire documents and is able to compensate for word substitutions.
In The Report
When you click on a raised flag, a dropdown of information will appear with relevant related information. In the case of content similarity, the suspect submission will be listed as well as a percentage indicating the similarity of its content.
Process Time Options
Like many of the other checks, there are a few options which can be used to fine tune the behaviour of it at process time, as well as can be set in the default options in the Preferences section.
We have been investigating how to further improve this method, if not create a new check which looks at the language structure used in sentences and breaks it apart to be able to check for similarity faster. We will also be introducing a content removal option in 2018.2, allowing for an assignment itself to be included in the process as safe text.