Finished the cheater checking on both assignments now... finally. It took forever.
8.5 hours on assignment 3 (using Damocles, an online comparison program to catch cheating). I have to look at each submission and ignore anything that was highlighted because people cut and pasted from the assignment specification. 3 hours on assignment 4 (without chasing up the original code and doing a by-hand comparison - Jon's job, thank goodness). Strangely enough, the same names came up on both of them, many of those names I had caught before in previous subjects/pracs/assignments.
So... how do I catch these 'cheaters'?
Kinda easy with Damocles == it analyses runs of matching words (typically about 8 words in a row) between two submissions, and then highlights the other words in that paragraph/page that match. When you get a piece of work that has loads of one colour, then typically it means one student has copied from another. You can select the section to get a side by side (well top/bottom) comparison of the matching section. Then I eyeball it to see just how much it matches. Non-native english speakers are easier to catch as they often don't find and fix each other's spelling or grammar mistakes.
Also kinda easy, now that we use MOSS (provided by Berkeley). MOSS produces web pages of two programming submissions, analysed for semantic similarity. It ignores variable names (the most common form of covering up cheating is changing variable names), and produces coloured sections similar to Damocles, with a side by side comparison of the whole of the two submissions. It also gives a 'percentage' match, which at the moment (along with the colouring) is completely wrong. Their recent version upgrade did not improve things.
Anyhow... I look at the higher matches (one in assignment 4 had a 81%-91% match!), and read through the code, looking for implementations of difficult concepts. These types of cheater checkers don't work all that well when there is a well defined and common way of implementing something, i.e. being given a skeleton program or if the solution is a one line function, as there will be a high correspondance).
Once again, it's very easy to pick the non-native English speaking cheaters, as they have strange variable names in common, spelling and grammar errors in the comments, or identical comments. The other way of noticing cheating is to look at what lines are commented out, and the actual layout of the code (indenting, blank lines, the format used for placing braces or spaces between components of an expression). It's much easier and faster to detect cheating in code, as opposed to written work.
Why am I talking about this? Well, my dreams last night were odd. I was trying to explain to a student who I had caught cheating, exactly why it was obvious they cheated. I was pointing out the matches using the techniques above. The student was impressed and did admit to it. I also tried to explain why cheating was bad (it will disadvantage them later), but I don't think that was understood. For some reason, this was happening in my primary school, in a 3rd/4th grade classroom where it was horribly noisy (and being taught by my grade 6 teacher - John F. Kennedy), because I no longer had an office or computer in CSSE..... Talk about an amalgamation of my life....