“Natural-language processing” makes marking fast yet accurate
Cutting-edge technology used to translate documents and chat with customers may soon help the Medical Council of Canada (MCC) mark written exam answers faster and just as accurately as traditional human marking. But candidates can rest assured they are not just being marked “by a machine.” Physicians will continue to ensure that exams are being scored fairly, correctly and according to a strict protocol.
While marking multiple-choice answers is easy to automate, “write-in” or “open-ended” exam answers are traditionally scored in a lengthy, labour-intensive process, says Dr. André De Champlain, MCC’s Director of Psychometrics and Assessment Services.
For example, answers to the Medical Council of Canada Qualifying Examination (MCCQE) Part I Clinical Decision-Making component were traditionally rated by about 50 residents working for up to three days. “That’s time consuming and can be expensive,” said Dr. De Champlain.
But the expense is not the only concern. With so many people involved, there may be variations in how they mark answers despite pre-determined answer keys. And, with the move to offer the MCCQE Part I several times a year in Canada and around the world, this traditional approach will become impractical.
The MCC has already taken steps to automate marking of written exam answers. It developed an application — called the “Aggregator” — to pull together responses that are identical among hundreds of response sheets so that the raters can simply mark that answer once. “This has helped cut down the time for the task by about 35%,” says Dr. De Champlain. “But, ultimately, we want to reduce variability from the equation as much as possible.”
The solution lies in machine learning and natural-language processing, computer technologies that recognize spoken and written language, says Dr. De Champlain. These technologies are now widely used for computer voice recognition, automated translation, and “chatbots” on websites. But they are being studied in many countries for applications in testing. This involves the fields of psychometrics — psychological measurement through testing — and computational linguistics – how computers process language. “This research has been going on for some time,” emphasizes Dr. De Champlain, so the MCC is confident that it will identify a reliable approach. An expert in computational linguistics from the Université de Montréal is also being consulted as part of the MCC’s efforts to automate the scoring of open-ended items.
To date, the MCC has developed and tested the new technology. He describes the results to date as “incredibly positive.”
We’ve actually marked several sets of Clinical Decision Making write-in items in parallel with human marking. Human and high-tech marking matched more than 90% of the time.”
Dr. André De Champlain,
Director of Psychometrics and Assessment Services, MCC
A “proof-of-concept” will be presented to the Central Examination Committee in 2019 for review and approval. Then, automated marking will probably be conducted in parallel with human marking for about a year before formally switching over. The performance of the automated marking will continue to be monitored even once implemented, by checking test scores at random and in cases where a few marks might make the difference between a pass and a fail, says Dr. De Champlain.
“This solution is evidence-based. We’re not rolling something out without looking at all the pros and cons,” Dr. De Champlain assures. “I would say to candidates that we’re also optimizing the way we’re doing things so that we can deliver the exam more frequently and flexible at their convenience.”