Content Similarity Check

KB-016

The evaluation of a submission’s content is the most critical and time consuming portion of the entire process Galileo undertakes.

Utilizing a modified Levenshtien distance calculation, Galileo produces a percent similarity value between all submissions’ content, which allows for a quick way of determining if submissions are far too similar.

Explanation

If a submissions content contained just the string "hello world", and another submission contained just the string "hello worlT", the percent similarity would be 90.9%. In other words, in a string of 11 characters, 1 change was necessary to alter the string to match the other. This method scales out for entire documents and is able to compensate for word substitutions.

In The Report

The submissions with similarcontent are listed in the report for all submissions and can be seen in the individual submisson section.

When you click on a raised flag, a dropdown of information will appear with relevant related information. In the case of content similarity, the suspect submission will be listed as well as a percentage indicating the similarity of its content.
Content comparison in submission summary

Process Time Options

Like many of the other checks, there are a few options which can be used to fine tune the behaviour of it at process time, as well as can be set in the default options in the Preferences section.

Process Options Type Default
Check.Content.Enabled Boolean True
A true/false setting of if the content similarity is checked against other submissions.
Check.Content.MaximumLength Integer 10000
The maximum character length of content to evaluate and process. Any content over this amount will be ignored when processing. This value is set to improve preformance times, but at a cost that only up to so much of an individual submission is checked.
Check.Content.Threshold Float 0.9
A 0-1 value (percentage) of similarity required for the check to trigger a flag to be raised when a comparison is made between other submissions.
Check.Content.Weight Float 0.7
A 0-1 value (percentage) of how heavily a flag raised by this check should be weighted in the report. ≥ 0.85 is considered dangerous, ≥ 0.65 is considered a warning, leaving everything below as just a tertiary notice.

Future Developments

We have been investigating how to further improve this method, if not create a new check which looks at the language structure used in sentences and breaks it apart to be able to check for similarity faster. We will also be introducing a content removal option in 2018.2, allowing for an assignment itself to be included in the process as safe text.

Created on 2018-03-30.

Important Articles

2018-05-11Latest Release
A knife is only useful so long as it remains sharp. Galileo is an innovative and ever evolving piece of software which should be kept up-to-date.
2018-04-23Getting Started
Designed with simplicity and efficiency in mind, Galileo’s workflow is built from the ground up with educators in mind. Here’s how to get started with this revolutionary tool.

Categories

Popular Tags