CheatGPT - A Textbook Case of Bad Research

Jared Cooney Horvath is a globally recognized Science of Learning expert committed to helping teachers, students and parents achieve better outcomes through applied brain and behavioral science.

Meta-Analysis Gone Mega-Wrong

Is ChatGPT really revolutionizing student learning … or is the hype misleading?

In this installment of From Theory to Practice, we examine a widely-circulated meta-analysis on ChatGPT’s impact on academic performance – and reveal shocking flaws in the research.

Does ChatGPT Enhance Student Learning? A Meta-analysis (Deng, 2025)

From questionable data sources to missing contradictory studies, we expose how easy it is to manipulate findings with selective reporting.

If you care about AI in education, don’t just read the headlines – watch this breakdown to find out what’s really going on.

Video Transcript

CheatGPT | A Textbook Case of Bad Research

Hello, everybody, and welcome to this week's "From Theory to Practice," where I take a look at the research so you don't have to. The article I've selected this week is really popular right now and has been making the rounds. It's called "Does ChatGPT Enhance Student Learning? A Meta-Analysis" by Deng and colleagues.

Why have I selected this paper? Because if you look at the highlights, which, let's be clear, that's all most people are ever going to look at, it says: ChatGPT enhances academic performance, boosts effective motivational states, improves higher-order thinking properties, and reduces mental effort. This is why this paper is so popular right now, especially amongst ed tech and tech gurus, because finally we have "proof" with meta-analysis that ChatGPT is good for learning. I'm going to ignore the other three claims and just focus on this idea: "ChatGPT enhances academic performance."

Wow, holy geez, we finally got the holy grail, the silver bullet we've been looking for. Things are about to change. Well, there's a small problem.

Problems with Premature Meta-Analysis

First and foremost, it is way too early to be doing a meta-analysis with ChatGPT. If you think about it, this tool is really only about two years old, a little bit older. That is nowhere near enough time for us to do meaningful research that we can then pool together into a meta-analysis and say, "Here's what the big picture is saying."

Just as a kind of a dirty rule of thumb, we tend to say five to ten years of pure research is required before a meaningful, what we'll call "deep meta-analysis," can or even should be considered. So already off the bat, this study shouldn't exist. We simply don't have the data yet for it. But let's take a look at what they found anyway.

Examining the Effect Sizes

Looking at academic achievement, they were able to find 51 papers published in the last two years looking at ChatGPT and academic achievement, and they found it had an effect size of 0.71, which is massive. This suggests the tool helps learning significantly.

But then you take a look at the individual effect sizes. If you look closely at the 51 papers that went into coming up with that 0.71 value, some things should jump immediately out to you. Here's three of them: One effect size is greater than 2. These effect sizes are higher than anything you should possibly ever find in any research across any field. Those are absolutely absurd. And once you start finding absurdities, and the fact that these weren't immediately dropped, you now have an open door to ask: what else is going on here? Clearly these researchers, by keeping these extreme values, were letting us know they really wanted a high effect size.

So where else were they cutting corners?

Journal Quality and Research Reliability

We need to understand that not all research is created equally. In the academic literature, we publish our research in journals, and journals come in different flavors:

Q1 journals are the top quartile. These are the journals you want. These have the most stringent acceptance criteria and the most stringent peer review. This is where you find the best of the best research.

Under that is Q2 journals. These are not always great, being second quartile. You can kind of futz with these a little bit.

And as you can guess, Q3 and Q4 get even worse. These are where you really start to hit what we'd call predatory journals, where really, if you just pay enough money, you can get your stuff published regardless of how good it is. Sometimes they don't even peer review it.

You also have unlisted journals. These are so left field that we don't even know how to rank them. They're almost certainly predatory, and you shouldn't trust anything you read in any of them.

And you also have conference papers. Conference papers are basically pilot studies that we present to other researchers in our field, but we don't really consider them deep research. It's basically saying, "Here's what I'm working on. Get ready for my publication in the future." So realistically, anything published or presented at a conference is cool, but we don't often consider that deep, meaningful data, and certainly not in this field.

Reanalyzing the Data by Journal Quality

So let's go back to those 51 studies and take a look. When you dive into them, 24 of those studies appear in Q3, Q4, unlisted journals, or at conferences, which means they should not be in this paper. And surprise, surprise, of those 24 papers that we can cut out of this analysis, nine of the 10 largest effect sizes fall in that drop category. Nine of the 10 largest effect sizes these researchers were using came from either predatory or unlisted journals or conferences. Do you see why I'm diving into this paper now?

When we take a look at those 27 papers that are in Q1 or Q2 journals, the effect size drops to 0.46. That's a 35% drop. But that's still significant.

So what if we're a little bit more stringent? What if we only take a look at Q1 journal research, those studies that we consider to be real, deep, meaningful research? There are only 19 studies included in Q1 journals in this paper. And when we analyze those, we get an effect size of 0.25 with a confidence interval that stretches across zero.

For those of you who don't know what that means, when the confidence interval crosses zero, there is no statistically significant effect. So realistically, based on the data used in this paper, if you simply take the time to look at it, these researchers should have concluded that there is no effect of ChatGPT on learning.

Missing Research and Selective Inclusion

And just to make matters worse, there's a little bit more. The fact that they were using such wonky research to bolster their conclusion made me think, did they miss anything?

In a five-minute Google search, I was able to find four papers on the first page of Google Scholar published in Q1 journals that demonstrate ChatGPT harms learning:

"AI-Generated Feedback on Writing: Insights into Efficacy and ENL Student Preference"
"Is It Harmful or Helpful? Examining the Causes and Consequences of Generative AI Usage Among University Students"
"Impact of AI Assistance on Student Agency"
"Generative AI Can Harm Learning"

These are top-tier research papers that these researchers somehow omitted from their analysis.

Key Takeaways for Research Consumers

So now, let's bring this back. What does this mean for all of us? Three things:

Not all data are created equal. If we're going to be doing meta-analyses or these big reviews to try and let people know what the state of the field is, you can't use every bit of data that exists out there. You have to be discerning in what you accept and what you reject. And when you have your threshold, say "I'm only going to do Q1 journals" or "I'm only going to do Q1 and Q2 journals," you then have to bring all of it in. You can't simply ignore research that goes against the point you're trying to make because that goes against the very reason for doing something like a meta-analysis.
Knowledge takes time to build. Remember I said there's about five to ten years of research you need before you can meaningfully do a meta-analysis? That's because academic research takes time. A lot of the research at conferences and such are funded or run by tech companies themselves, which is cool. They've got the money, but they've also got reason to find certain things. For academic researchers, people without vested interests, it takes more time to get money, set up experiments, analyze data, and then rerun experiments to replicate and extend that data. So if you really want to know what impact ChatGPT is having on learning and education broadly, 2030 is about when we can confidently say we're seeing good impact, no impact, or negative impact. We need the time.
Lying with data is super easy. In fact, as Winston Churchill said, "Don't trust any statistics you did not fake yourself." I'm pretty sure he didn't say that, but I thought that was a funny quote, so let's throw it in there. Just because something is quantitative or numerical, that doesn't mean there aren't still human decisions being made when we look at and analyze that data. In this meta-analysis, not only did these researchers use data they shouldn't have, they flat out ignored data that they should have included. And if you don't have deep knowledge of this field or how research works or how publication works, you would have had no clue any of this was going on.

Advice for Reading Research

And if you aren't as pedantic about AI and education as I am, you probably never would have thought to look deeply at this paper. As we said, most people just read the abstract, the highlights, and they assume they understand everything.

One of my favorite things to do, and I tell all my students to do this: the next time you read a scientific article, ignore the abstract. Just read the title, decide if you want to read it, then read the paper. Only when you're done reading the paper beginning to end do I then want you to go back and read the abstract again and see where they lied.

Because believe it or not, when we talk about peer review in academia, the two things we do not peer review are the abstract and the highlights. You can basically say anything you want in those two sections and no reviewer is going to pay attention to it because we're too focused on the bulk of the paper. So if you want to see where people are futzing the truth or rounding the edges a bit, it's always going to be in the abstract and the highlights. If you could ignore reading those, read the paper first, then go back, you'll see in about 50 to 60% of all papers, they bend the truth in their abstract, knowing they're going to get away with it.

Conclusion

So, all right, that is all I've got for you guys this week. If you like what you're hearing, you can take a look at us on lmeglobal.net. Keep up with what we're doing. Or you could take a look at our award-winning science of learning programs for both teachers and students called "The Learning Blueprint." Otherwise, thank you all so much, and I'll see you at the next one.

Did You Enjoy This Post?

Help spread the idea by sharing it with your peers and colleagues ...

Share

Post

Share

Join the LME insider community! Subscribe below to get exclusive updates on new courses, lectures, articles, books, podcasts, and TV appearances—delivered straight to your inbox before anyone else.

CLICK TO JOIN