The Beauty of Plagiarism Scanners

For my current part-time job I'm at the moment investigating plagiarism scanners. Students submit their papers and reports and before being handed to the teacher this software checks what the student turned in with previously submitted papers but also other sources such as the internet. To get started I thought I'd just copy a piece of text from my own website and put in a Word document and submit it, see if it actually worked. I decided to copy a piece of my podcasting tutorial. I submitted the document and waited for a couple of minutes. As expect it found the document to be 100% plagiarism, good!

However, it found two sources of plagiarism. One at my own website, one at another. Have a look at that page, scroll a bit down to the "How to prepare a great podcast!" section. Looks familiar? Kinda. Well... more than kinda. It's kinda completely ripped including all the images, and without any credits. Granted, the guy made a couple of improvements, but still.

Anyway, I e-mailed the "author" a couple of days ago, no response so far. Still, a funny way to find out. These pieces of software really are plagiarism detectors. It would be interesting if somebody developed an open source or free version of such software. At its core it's not very complicated. You extract the text from a document, and do a phrase search for each sentence on Google or another search engine. There's a bit more to it than that, but this is the main idea.

It would be great to have a tool that you could just run on your website and that would check if anybody has plagiarized any of your stuff.