Archive for the 'research' Category

Open source Optical Mark Recognition: a handy tool for cross-sectional research

September 30, 2008

For a medical student, questionnaire-based short term cross-sectional studies are shortcuts to research papers. A little more effort in study design and we can turn them into some kind of case-control study or at least publish it in a fairly reputed journal, a feat for someone at our level.

While designing one such cross sectional study, a friend wondered if we can use a scantron-like system on our questionnaire to automate data entry. This would save us time and increase accuracy, at least relatively. On each questionnaire, there could be bubbles to shade for multiple-choice questions, just like SAT or TOEFL. Once we get the questionnaire back from our respondents, we would shade out corresponding bubbles for the choices they made. Then, we would simply run it through a document feeding scanner (not a flatbed, that would be way more laborius) and finally, run it through a computer program that would look at the scanned file, see what bubbles are shaded, translate that into what choices the respondent made and make it available in a format friendly to SPSS or some other statistical package. Looking at shaded areas and recognizing them is called Optical Mark Recognition or OMR.

A little research and I found that there were no open-source or freeware/shareware OMR solutions available except one developed by Aaditeshwar Seth at Udai Waterloo Chapter. It was called the Udai OMR tool and developed for an Indian NGO initially, later made available for all NGO and non-profit use. The tool is written in Java and although it is not the most user-friendly piece of software around, it is the only free OMR software and it actually works. Once you make a questionnaire with the right spots to shade using the templates they provide on their website, you first scan an all-bubbles-shaded questionnaire for the program to recognize where to look for. Then, you assign variables and values to all sites recognized and finally, you start reading your forms. It reads one form at a time and writes the data to a text file. So for each page/form, you have to run the program again and it makes a new text file.

About 3-4 days of continuous fidgeting with this and then about a day to automate so that multiple pages can be read in a sequence and the data is handled and stored into a quick database and there, the next study I am part of is using OMR :)

Follow

Get every new post delivered to your Inbox.