The BIG Problem with Using A.I. for Assessment
I distinctly remember the sound of our school’s Scantron grading the filled out midterm bubble sheets.
“Rat…tat….tat.”
Then another one, and this time you could hear the score, as if the machine was yelling.
“Tat, tat, tatatatatat!”
As a teacher, that Scantron machine sure saved me time grading the big assessments like midterms and finals.
However, when I started my career as a middle school ELA teacher, I never used them for any other assessments.
But when I moved to the high school, I began using the Scantron for some in-class assessments in my first few years, like most of my colleagues.
Often, there was a line at the machine, especially when one of the two in the faculty room was broken (many times this was the case).
This was common practice, even though it had been 30 year since Scantron was founded. A recent article in The Atlantic, breaks down it’s impact on the education system:
By the time Scantron was founded in 1972, machine grading had already made multiple-choice tests a key part of American education, and an enormous push for statewide tests only increased the demand for scoring technology. The company and its business model helped make those tests even more pervasive: Scantron provided scoring machines for cheap, and turned a profit by selling answer sheets to a captive market of schools and school districts. Teachers had already been borrowing the A/B/C/D format from standardized tests for years, but Scantron provided smaller, affordable scanners that made doing so even easier.
As of 2019, Scantron served 96 of what it referred to as the “top 100 school districts in the United States” and printed some 800 million sheets globally each year; their scanners can process 15,000 sheets an hour. Teachers and leaders who already believed that these tests provided neutral assessments of ability found “the technology to grade these multiple-choice exams very appealing,” Terzian said.
Just as Scantron cornered the market for cheaper, easier, and more efficient ways to grade student assessments, we now have A.I. entering the chat (see what I did there).
It’s no secret the SAT, ACT, AP exams, and numerous state tests have already moved towards online computer-based and A.I. grading, while most others will soon follow.
Maybe they aren’t calling it “A.I. grading”, but let’s be real: Computers, and algorithms, are doing the work.
Not everyone is happy about this change from a “scantron + human” method of grading standardized assessments.
Computers will now score essays and other open-ended questions on the State of Texas Assessments of Academic Readiness (STAAR), at least in part.
San Antonio Independent School District officials have concerns about how the new system could impact scores and how teachers might respond when teaching students to prepare for the test in the future.
“AI scoring could result in basic writing being scored at higher rates, as we have seen from the AI-scored [Texas Success Initiative assessment],” district spokesperson Laura Short said Wednesday. “We worry that nuanced, complex writing will not receive a valid score because the AI engines may be looking for specific words, phrases, and formulaic construction.”
If that were proven true, Short said, that could put pressure on teachers “to teach formulaic writing to receive better STAAR scores, impairing our efforts to provide writing instruction that prepares students for college and careers.”
https://sanantonioreport.org/staar-essay-computer-scoring-san-antonio-school-district-concerns/
Even the business model seems eerily similar. Scantron was notorious for giving out their machines for free (or very cheap cost) and then billing the school organization on the back-end for the Scantron sheets needed to facilitate the assessment and enter through the machine.
Many A.I. tools are being given away “free” to teachers, but charging districts for services that can now be replicated and used across all classrooms.
It brings up a very big question. One that I’ve discussed with thousands of teachers and school leaders over the last few years in sessions all around the country.
In a world of Artificial Intelligence, what do we VALUE about the education experience?
How we answer this question, will have massive ramifications both in and out of our school systems.
What Do We Value?
If you’ve read this far, you may be asking: So, what is the BIG problem with using A.I. for assessment?
For this question I want to look at a statement made recently by author Joanna Maciejewska:
I posed this question online, and want to ask you to think about it now.
What is the “laundry and dishes” in our educational systems? What is the “art and writing”?
In short, what are the foundational beliefs on what keeps education a human and social practice?
What we know for sure, is that the Scantron has made it very clear that we are fine with computers grading standardized assessments. This practice has been happening for years with multiple-choice assessments.
However, on tests like the SAT, ACT, and the majority of state assessments, there has consistently been a “human” piece, where a real person looks at writing and grades accordingly.
Now, A.I. and computer-based systems are doing this work.
So, I’ll ask again: What do we value and want to KEEP HUMAN in our educational systems?
Aligning our Mission, Values, and Key Principles
When I worked at Upper Perkiomen School District as Curriculum Coordinator and EdTech Specialist, we began the work of answering this question: What do we value?
To begin this work, our Leadership Council — which was comprised of teachers, admin, board members, community members, and students — looked at Schooling By Design by Grant Wiggins and Jay McTighe.
In their book, the authors share some key elements to building a foundation for powerful learning:
You’ll notice, the groundwork for all of the decisions around curriculum, assessment, technology, instructional programs etc is based on your “Mission” and “Learning Principles”.
We had a known Mission as a School District, but we did not have a set of guiding beliefs, values, and learning principles.
Our Leadership Council went through the process of identifying UPSD Learning Principles. As McTighe points out: Since the Learning Principles reflect research and best practice, they serve to guide curriculum planning, instruction and assessment. They provide a common language for conversations about teaching and learning, and function as criteria for a variety of school actions, (e.g., textbook selection, classroom observations).
After going through a long process, we identified six core principles and values that would serve as our guide in our work of teaching and learning:
UPSD LEARNING PRINCIPLES
1. Curricula will reflect opportunities for authentic learning. Learners will be given opportunities to use their existing skills and understandings to make connections to new learning. Learners will apply critical and creative thinking to collaboratively and individually solve authentic problems.
2. Successful learning requires individuals who know how to reflect, self-assess, and use feedback to establish personal goals for learning. The teaching and modeling of the reflective practice will be incorporated in all aspects of the learning process.
3. All curriculum, planning, design and content delivery will address the intentional development of competencies, skills, and understandings relevant to both the subject area and the broader set of authentic ‘life skills’ such as creative expression, skillful communication, self-direction, resilience, and persistence through common language and assessment practices.
4. The school district is a community of learners where ability is seen as dynamic and the environment in all classrooms allows each student to grow, develop, and engage in meaningful learning.
5. Learners will engage in work in supportive environments and receive regular and specific feedback related to their progress in order to maximize learning and develop individual persistence. The students will be given opportunities to use that feedback to remain accountable while improving their own learning.
6. All learners are capable of their highest potential when interests and strengths are recognized and accommodated in inclusive learning environments, when there is an appropriate blend of challenge, comfort, and support in those environments, and when success is seen as attainable through persistent effort.
Note: The Learning Principles should not be “set in stone.” They can (and should) be periodically revisited and refined to reflect emerging research and staff insights from their application.
WE had created a set of principles that shared our VALUES with the entire community and organization.
When current practices didn’t align with our principles they could be discussed on how to change them.
When deciding on new initiatives, programs, curricula, or resources — we could run those decisions through our principles and see if they aligned with our values.
You can see a video walkthrough of our entire process below.
The BIG Problem with Using A.I. For Assessment
After going through this process, and then leading various other organizations through a similar process, I see two big problems with using A.I. for assessment.
The first problem is simple to see and recognize: Assessment should be a conversation about learning.
If we value relevant and nuanced feedback, meaningful reflection, and use assessment as a conversation about learning — then we value a HUMAN on the other side of that conversation.
Can A.I. be used for formative checks for understanding? Of course.
Can A.I. be used in the same way a Scantron has been used to grade multiple-choice assessments? Sure, why not.
But, when we really want to assess a student’s understanding and see their work, their process, their mistakes, and their insights—a human needs to be on the other side of that conversation.
Looking above at Learning Principle #2 and #5 — both speak to the role of assessment (both external feedback and self-reflection) to drive meaningful learning and growth.
The second problem is connected, and maybe more important in our current conversations around artificial intelligence.
As Mark so aptly points out in our conversation: If A.I. can’t actually have the metacognition to “understand” how can we trust this technology to decipher “understanding”?
Just last week Professor Patricia Taylor detailed her time as a fellow for the University of Southern California’s Center for Generative AI and Society:
It became clear over the course of the experiment that the AI was giving variations on the same feedback regardless of the quality of the paper. It asked for more examples or statistics in papers that didn’t need them. It continually encouraged the five-paragraph essay structure—but, unfortunately, that went against what I wanted, since I (like so many other writing professors at the college level) want students to develop arguments that go past the five-paragraph structure. When focusing on language and grammar issues, it flattened style and student voice.
The A.I. assessment and feedback, can only be as good as the “LLM Training”, and in this case, you’ll see an educator frustrated with the overly formulaic responses to human work that is undoubtedly nuanced.
Could A.I. help with some lower-level tasks and feedback? Yes, and it did fairly well when given a specific area to focus on that was more objective than subjective.
However, Taylor’s final summation gives pause for anyone thinking about using A.I. for meaningful feedback:
Over the course of this project, I was forced to spend more time trying to get the AI to produce meaningful feedback tailored to the actual paper than I did just writing the feedback on my initial pass through the paper. AI isn’t a time saver for professors if we are actually trying to give meaningful reactions to student papers that have complex issues. And its feedback on things like structure can actually do more harm than good if not carefully curated—curation that easily takes as much time as writing the feedback ourselves.
It looks as if this conversation is only getting started. More and more performance tasks, like writing essays, will be assessed with A.I. assisted systems in the coming years. And, I’m sure, that artificial intelligence will continue to improve in it’s ability to give feedback, and move beyond the “Scantron” in it’s effort to make grading more efficient.
When it comes down to how we respond as educators, it will ultimately be a conversation on what we value as a human experience, and what principles we run our decisions through to support our learners.
After all, if we get to a place where the A.I. is doing the student’s work, and the A.I. is assessing that work—what is there even left to value?