Jared Cooney Horvath is a globally recognized Science of Learning expert committed to helping teachers, students and parents achieve better outcomes through applied brain and behavioral science.
Is ChatGPT really revolutionizing student learning … or is the hype misleading?
In this installment of From Theory to Practice, we examine a widely-circulated meta-analysis on ChatGPT’s impact on academic performance – and reveal shocking flaws in the research.
From questionable data sources to missing contradictory studies, we expose how easy it is to manipulate findings with selective reporting.
If you care about AI in education, don’t just read the headlines – watch this breakdown to find out what’s really going on.
Video Transcript
Hello, everybody, and welcome to this week's "From Theory to Practice," where I take a look at the research so you don't have to. The article I've selected this week is really popular right now and has been making the rounds. It's called "Does ChatGPT Enhance Student Learning? A Meta-Analysis" by Deng and colleagues.
Why have I selected this paper? Because if you look at the highlights, which, let's be clear, that's all most people are ever going to look at, it says: ChatGPT enhances academic performance, boosts effective motivational states, improves higher-order thinking properties, and reduces mental effort. This is why this paper is so popular right now, especially amongst ed tech and tech gurus, because finally we have "proof" with meta-analysis that ChatGPT is good for learning. I'm going to ignore the other three claims and just focus on this idea: "ChatGPT enhances academic performance."
Wow, holy geez, we finally got the holy grail, the silver bullet we've been looking for. Things are about to change. Well, there's a small problem.
First and foremost, it is way too early to be doing a meta-analysis with ChatGPT. If you think about it, this tool is really only about two years old, a little bit older. That is nowhere near enough time for us to do meaningful research that we can then pool together into a meta-analysis and say, "Here's what the big picture is saying."
Just as a kind of a dirty rule of thumb, we tend to say five to ten years of pure research is required before a meaningful, what we'll call "deep meta-analysis," can or even should be considered. So already off the bat, this study shouldn't exist. We simply don't have the data yet for it. But let's take a look at what they found anyway.
Looking at academic achievement, they were able to find 51 papers published in the last two years looking at ChatGPT and academic achievement, and they found it had an effect size of 0.71, which is massive. This suggests the tool helps learning significantly.
But then you take a look at the individual effect sizes. If you look closely at the 51 papers that went into coming up with that 0.71 value, some things should jump immediately out to you. Here's three of them: One effect size is greater than 2. These effect sizes are higher than anything you should possibly ever find in any research across any field. Those are absolutely absurd. And once you start finding absurdities, and the fact that these weren't immediately dropped, you now have an open door to ask: what else is going on here? Clearly these researchers, by keeping these extreme values, were letting us know they really wanted a high effect size.
So where else were they cutting corners?
We need to understand that not all research is created equally. In the academic literature, we publish our research in journals, and journals come in different flavors:
Q1 journals are the top quartile. These are the journals you want. These have the most stringent acceptance criteria and the most stringent peer review. This is where you find the best of the best research.
Under that is Q2 journals. These are not always great, being second quartile. You can kind of futz with these a little bit.
And as you can guess, Q3 and Q4 get even worse. These are where you really start to hit what we'd call predatory journals, where really, if you just pay enough money, you can get your stuff published regardless of how good it is. Sometimes they don't even peer review it.
You also have unlisted journals. These are so left field that we don't even know how to rank them. They're almost certainly predatory, and you shouldn't trust anything you read in any of them.
And you also have conference papers. Conference papers are basically pilot studies that we present to other researchers in our field, but we don't really consider them deep research. It's basically saying, "Here's what I'm working on. Get ready for my publication in the future." So realistically, anything published or presented at a conference is cool, but we don't often consider that deep, meaningful data, and certainly not in this field.
When we take a look at those 27 papers that are in Q1 or Q2 journals, the effect size drops to 0.46. That's a 35% drop. But that's still significant.
So what if we're a little bit more stringent? What if we only take a look at Q1 journal research, those studies that we consider to be real, deep, meaningful research? There are only 19 studies included in Q1 journals in this paper. And when we analyze those, we get an effect size of 0.25 with a confidence interval that stretches across zero.
For those of you who don't know what that means, when the confidence interval crosses zero, there is no statistically significant effect. So realistically, based on the data used in this paper, if you simply take the time to look at it, these researchers should have concluded that there is no effect of ChatGPT on learning.
And just to make matters worse, there's a little bit more. The fact that they were using such wonky research to bolster their conclusion made me think, did they miss anything?
In a five-minute Google search, I was able to find four papers on the first page of Google Scholar published in Q1 journals that demonstrate ChatGPT harms learning:
These are top-tier research papers that these researchers somehow omitted from their analysis.
So now, let's bring this back. What does this mean for all of us? Three things:
And if you aren't as pedantic about AI and education as I am, you probably never would have thought to look deeply at this paper. As we said, most people just read the abstract, the highlights, and they assume they understand everything.
One of my favorite things to do, and I tell all my students to do this: the next time you read a scientific article, ignore the abstract. Just read the title, decide if you want to read it, then read the paper. Only when you're done reading the paper beginning to end do I then want you to go back and read the abstract again and see where they lied.
Because believe it or not, when we talk about peer review in academia, the two things we do not peer review are the abstract and the highlights. You can basically say anything you want in those two sections and no reviewer is going to pay attention to it because we're too focused on the bulk of the paper. So if you want to see where people are futzing the truth or rounding the edges a bit, it's always going to be in the abstract and the highlights. If you could ignore reading those, read the paper first, then go back, you'll see in about 50 to 60% of all papers, they bend the truth in their abstract, knowing they're going to get away with it.
Did You Enjoy This Post?
Help spread the idea by sharing it with your peers and colleagues ...
Join the LME insider community! Subscribe below to get exclusive updates on new courses, lectures, articles, books, podcasts, and TV appearances—delivered straight to your inbox before anyone else.
You Might Also Like ...
Copyright © LME Global
6119 N Scottsdale Rd, Scottsdale, AZ, 85250
(702) 970-6557
Connect With Us
Copyright © LME Global – 6119 North Scottsdale Road, Scottsdale, AZ, 85250 – (702) 970-6557