Using generative AI in item development for the driving theory test: What does the future hold?

Using generative AI in item development for the driving theory test: What does the future hold?


Generative artificial intelligence (AI) tools have produced both excitement and uncertainty across many industries, and the global assessments landscape is no exception.

Back in July 2021, the Association of Test Publishers (ATP) released a white paper titled ‘Artificial Intelligence and the Testing Industry: A Primer’, exploring a range of potential applications for AI in the credentialing field as well as ‘the appropriate responsible use of AI’1.

Shifting forward to today, AI is dominating media headlines around the world. An increasing number of exam content creators have started actively exploring how cutting-edge AI technologies might be used to develop test items for a range of assessments.

As the international driving test community ‘prepares drivers for smart mobility’ at the 56th CIECA Congress 2024, AI is understandably giving everyone a lot to think about, including:

  • What differences between AI and human-driven item quality might we see with item development in the future?
  • With the integration of AI in the item writing process, what ethical or legal implications need to be considered?
  • What is the potential cost of incorporating AI into existing processes? With the level of accuracy depending largely on the quality and quantity of data, what is the cost of human effort required to refine and improve upon the generated content?

Generative AI allows users to submit written text (prompts) specifying a task.

This could include writing a multiple-choice item, crafting a scenario about driving in difficult conditions, suggesting plausible but incorrect response options, or editing existing text according to style guidelines. While AI can do these things, the obvious question is “How well?” closely followed by “How will this fit into our existing test development processes?

Opinions across the global assessments landscape vary widely around the potential for automatic item generation — whether using AI or other methods such as template-based approaches. While simple requests for items may produce flawed and relatively ‘low-level’ content, it is possible to get well-constructed.

Items across a range of cognitive levels using the right prompt. Incorporating the same instructions provided to human item writers regarding format, structure, distractors, and other item elements is just as necessary for generative AI item development as conventional item writing processes. Including experts on item development and evaluation through the whole process is key — as is an organized and scientific approach to understanding the results generated.

In 2023, we conducted several studies looking into the quality and characteristics of items generated by popular, free-to-use AI platforms.

We created a series of prompts based on our item writing guidelines, which also included comprehensive instructions and examples of cognitive level, sample item formats, and style guidelines.


1 ‘Artificial Intelligence and the Testing Industry: A Primer’- A Special Publication from ATP, Authored by the International Privacy Subcommittee of the ATP Security Committee July 6, 2021, p.3


About Pearson VUE

Pearson VUE has been a pioneer in the computer-based testing industry for decades, delivering more than 16 million certification and licensure exams annually in every industry from academia and admissions to IT and healthcare. We are the global leader in developing and delivering high-stakes exams via the world's most comprehensive network of nearly 20,000 highly secure test centers as well as online testing in over 180 countries. Our leadership in the assessment industry is a result of our collaborative partnerships with a broad range of clients, from leading technology firms to government and regulatory agencies. For more information, please visit PearsonVUE.com.

報道関係者の方はこちら

Greg Forbes, Global PR & Communications Manager
+44 (0) 7824 313448
greg.forbes@pearson.com

Generative artificial intelligence (AI) tools have produced both excitement and uncertainty across many industries, and the global assessments landscape is no exception.

Back in July 2021, the Association of Test Publishers (ATP) released a white paper titled ‘Artificial Intelligence and the Testing Industry: A Primer’, exploring a range of potential applications for AI in the credentialing field as well as ‘the appropriate responsible use of AI’1.

Shifting forward to today, AI is dominating media headlines around the world. An increasing number of exam content creators have started actively exploring how cutting-edge AI technologies might be used to develop test items for a range of assessments.

As the international driving test community ‘prepares drivers for smart mobility’ at the 56th CIECA Congress 2024, AI is understandably giving everyone a lot to think about, including:

  • What differences between AI and human-driven item quality might we see with item development in the future?
  • With the integration of AI in the item writing process, what ethical or legal implications need to be considered?
  • What is the potential cost of incorporating AI into existing processes? With the level of accuracy depending largely on the quality and quantity of data, what is the cost of human effort required to refine and improve upon the generated content?

Generative AI allows users to submit written text (prompts) specifying a task.

This could include writing a multiple-choice item, crafting a scenario about driving in difficult conditions, suggesting plausible but incorrect response options, or editing existing text according to style guidelines. While AI can do these things, the obvious question is “How well?” closely followed by “How will this fit into our existing test development processes?

Opinions across the global assessments landscape vary widely around the potential for automatic item generation — whether using AI or other methods such as template-based approaches. While simple requests for items may produce flawed and relatively ‘low-level’ content, it is possible to get well-constructed.

Items across a range of cognitive levels using the right prompt. Incorporating the same instructions provided to human item writers regarding format, structure, distractors, and other item elements is just as necessary for generative AI item development as conventional item writing processes. Including experts on item development and evaluation through the whole process is key — as is an organized and scientific approach to understanding the results generated.

In 2023, we conducted several studies looking into the quality and characteristics of items generated by popular, free-to-use AI platforms.

We created a series of prompts based on our item writing guidelines, which also included comprehensive instructions and examples of cognitive level, sample item formats, and style guidelines.


1 ‘Artificial Intelligence and the Testing Industry: A Primer’- A Special Publication from ATP, Authored by the International Privacy Subcommittee of the ATP Security Committee July 6, 2021, p.3