There are many things wrong with the science and research culture in the “Science of Reading” community where the evidence for phonics is claimed to be so compelling. Here I want to focus on a recent critique of Bowers (2020) by Brooks (2023) just published in the journal “Review of Education”. Like a previous critique of Bowers (2020) by Buckingham (2020) published in the journal “The Educational and Developmental Psychologist”, I was not asked to review it, and in both cases, the articles are chock-full of errors, irrelevancies, and mischaracterizations. In the case of Buckingham (2020) I wrote a response that included a table outlining the most important mistakes (see Table 1 at end of this post). But the journal rejected my response and even refused to retract the most significant and most egregious errors. For more details see: https://jeffbowers.blogs.bristol.ac.uk/blog/buckingham-2020/. You can find the rejected article here: https://doi.org/10.31234/osf.io/f5qyu. The Brooks critique of Bowers (2020) is no better. Indeed, every point is either wrong, irrelevant, or misleading (apart his catch of a typo on a date), and it gets embarrassing towards the end when discussing Sherman (2007). Let’s see if the journal will publish a response.
Bowers (2020) reviewed the 12 existing meta-analyses on phonics and reading outcomes in England since legally mandating phonics in state schools in 2007. The Brooks (2023) critique focuses on four of these meta-analyses, and in addition, criticized a more recent article by Wyse nor Bradbury (2022). Again, neither Wyse nor Bradbury were asked to review the Brooks article prior to publication. I’ll let Wyse et al. respond to points directed at them and focus on my own work, going sequentially through Brooks’ points.
First, he criticizes my analysis of Galuschka et al. (2014) who found similar effect sizes for phonics (g′ = 0.32), phonemic awareness instruction (g′ = 0.28), reading fluency training (g′ = 0.30), auditory training (g′ = 0.39), and color overlays (g′ = 0.32) and nevertheless concluded that phonics was the “most” effective method because only phonics had a significant effect (due to the fact that the phonics condition included many more studies). I noted that the conclusion that phonics is the most effective requires a significant interaction – the effect of phonics needs to be larger than the effect of the other methods – to support this conclusion. The interaction was not reported (and would not be significant given the similar effect sizes).
In response, Brooks writes that Bowers “chides Galuschka et al. for not having carried out an analysis of the interaction between the various approaches—a mistaken demand as that term refers to two independent variables, whereas here there would be only one (type of approach).” This is simply incorrect. It is easy to test for an interaction, by assessing whether phonics is more effective than the other interventions. Contrary to Brooks there are multiple different independent variables in this study (the different interventions). But if you want to say that the different interventions are just different levels of one independent variable (“type of approach”), then you still need to ask if the different levels of instruction interact, with better outcomes for phonics. Buckingham (2020) also challenged me on this most basic point of statistics. More generally, it just does not make sense to take a study that reports similar effect sizes for a wide range of interventions as evidence that phonics is best.
Next, Greg criticizes my analysis of Han (2009). He makes his one correct point here – I did cite this dissertation as being published in 2010 rather than 2009. Below is the sum total of what I wrote in Bowers (2020), where I combined my critique of this work with another meta-analysis:
Han (2010) and Adesope, Lavin, Tompson, and Ungerleider (2011). These authors reported meta-analyses that assessed the efficacy of phonics for non-native English speakers learning English. Han (2010) included five different intervention conditions and dependent measures and reported the overall effect sizes as 0.33 for phonics, 0.41 for phonemic awareness, 0.38 for fluency, 0.34 for vocabulary, and 0.32 for comprehension. In the case of Adesope et al. (2011), the authors found that systematic phonics instruction improved performance (g = + 0.40), but they also found that an intervention they called collaborative reading produced a larger effect (g = + 0.48) as did a condition called writing (structured and diary) that produced an effect of g = + 0.54. Accordingly, ignoring all other potential issues discussed above, these studies do not provide any evidence that phonics is the most effective strategy for reading acquisition.
Brooks’ is critical of my analysis of the Han because he claims that there is a flaw in the Han study, writing “[Han] gives a list of 11 ‘Instructional activities’ which are classified as phonics—but only one, or at most two, deserve that label.” But this is simply irrelevant. For the sake of argument, let’s accept Han’s meta-analysis is flawed. Then fine, my conclusion stands that this study provides no evidence for phonics. Indeed, it strengthens my conclusion – if the selection of studies to include in the meta-analysis was flawed, not only is the .33 effect for phonics not larger than the alternative methods, but it is also a bogus finding.
Next, he criticizes my analysis of Sherman. This is the sum total I wrote on this meta-analysis:
Sherman (2007). Sherman compared phonemic awareness and phonics instruction with students in grades 5 through 12 who read significantly below grade-level expectations. Neither method was found to provide a significant benefit.
Greg claims that Sherman did indeed obtain a significant benefit of phonics, writing:
Bowers (2020: 695) says that Sherman (2007) found no overall effect of phonics. This is inaccurate. The top line of data in Sherman’s table 20 (p. 69) shows an ES of 0.33 for the impact of phonics on literacy overall. The confidence interval for the ES is given as 0.13 Lower, 0.52 Upper; since this does not cross zero, the ES must be significant at least at p<0.05, even though Sherman does not give a probability value or discuss this result.
There are a few problems. First, here is the table that Greg is referring to. He is indeed correct that the confidence intervals do not overlap with zero in the analysis carried out on the full dataset, but the analysis that excluded outliers does:
Now, you might wonder whether outlier studies should be excluded, but here is a description of the outlier studies taken from the dissertation:
Including a study with an effect size (cohen’s d) of 7.69 is absurd. Indeed, this is effect size is so large I expect it is a mistake in the Sherman PhD. thesis. But it is not worth looking into this when you read the following from the thesis:
Because of the small number of studies and the variability of the population studied, the alpha level was relaxed to 0.25 to explore statistical significance of main effects or interaction effects at this level. The impact of group size and reading level on effect size was significant in many of the analyses at a 0.25 alpha level.
In other words, because Sherman (2007) was not obtaining significant effects at the .05 level, she decided to work with a .25 level of significance! So, even if we consider the full dataset that includes a study with Cohen d of 7.69, the analysis only shows that it is significant at the .25 level rather than the .05 level as Brooks claims.
Finally, Greg criticizes my analysis of Camilli et al. (2003), writing:
When I pointed out the fragility of Camilli et al.’s (2003) analysis, Bowers (personal communication, 9 March 2023) replied: ‘My critique does not hinge on the Camilli et al. findings (there is little evidence for phonics even if you ignore his [sic] point).’ Despite what he says, Bowers’ argument does in fact make considerable use of ‘the Camilli et al. findings’:
This leaves the impression that I’m conceding his point regarding Camilli et al. (2003). But I am not. He has selectively quoted me in a misleading way. Here is what I wrote in that email:
“My critique does not hinge on the Camilli et al. findings (there is little evidence for phonics even if you ignore his point), but I don’t understand your criticism of the study. There are essentially no forms of instruction used in school that use NO phonics, so studies that completely ignore phonics are not appropriate to include in a control condition if you want to claim that systematic phonics is needed to improve existing classroom instruction.”
That is it! This covers all the points Brooks raised to challenge my critique of the evidence for phonics. It is an impressive repeat of Buckingham (2020): Every substantive point is wrong, irrelevant, or misleading. This is why the journal “Review of Education” should have asked me to review Brooks’ manuscript. Indeed, that should be standard policy – if a journal is considering publishing a critique of an article, the authors of the published work should have a chance to review the submitted manuscript (and an opportunity to respond if the work is published). Now we have Greg Ashman, Nate Joseph, Timothy Shanahan, Pamela Snow, Keven Wheldall, Dylan Wiliam, and others retweeting this flawed article that should never have been published in the first place. Let’s see if any of these authors will retweet this response.
I should note that a much more reasonable response to Bowers (2020) was published by Fletcher et al. (2021) in journal “Educational Psychology Review”. I was asked to review the submission where I identified multiple mistakes that were fixed prior to publication, and I was invited to submit a response (Bowers, 2021) that was in turn reviewed by Fletcher. I think the criticisms by Fletcher et al. (2021) were misguided, but at least it was a constructive exchange and I think some common confusions were clarified. A reader who read the full exchange will have learnt something – but that is not the case here, other than perhaps learn how low the standards are when making pro-phonics claims.
And the unjustified scientific claims used to support the efficacy of phonics just does no stop. For example, the recently released Progress in International Reading Literacy Study (PIRLS) showed that England ranked 4th (of 61 countries). An impressive performance indeed. The problem is that many researchers are attributing this outcome to phonics. For example, Kathy Rastle tweeted:
Here are the PIRLS results that led her to this conclusion. See a problem?
In a series of tweets to Rastle I pointed out two additional problems with attributing the high PIRLS ranking to phonics. First, Singapore, Ireland, and Norther Ireland have consistently outperformed England in English despite not requiring phonics (nor the phonics screening check). Part of the reason why England went up in the most recent rankings is that Ireland and North Ireland were excluded from the comparison. (They again scored better but were excluded because a delay in assessing children on PIRLS – due to COVID – meant that children were slightly older.) Second, England scored better than Italy and Spain (and many other countries) that have writing systems with consistent grapheme-phoneme correspondences, and where children would score near 100% (at a younger age) if they were presented with a phonics screening check. However effective mandated phonics has been in improving the naming of English regular words (and nonwords), English children are not as good as Italian and Spanish children at naming words (and nonwords) in their languages. Accordingly, the higher English scores in PIRLS (that measures reading comprehension) must reflect something other than phonics. That is worth exploring. But no matter, many researchers are attributing the good results to phonics. I received no response.
Adesope, O. O., Lavin, T., Thompson, T., & Ungerleider, C. (2011). Pedagogical strategies for teaching literacy to ESL immigrant students: a meta-analysis. British Journal of Educational Psychology, 81(Pt 4), 629–653.
Bowers, J.S. (2020) Reconsidering the Evidence that Systematic Phonics is more Effective than Alternative Methods of Reading Instruction. Educational Psychology Review, 32, 681-705.
Bowers, J.S. (2021). Yes children need to learn their GPCs but there really is little or no evidence that systematic or explicit phonics is effective: A response to Fletcher, Savage, and Sharon (2020). Educational Psychology Review. doi.org/10.1007/s10648-021-09602-z
Bowers, J.S., & Bowers, P.N. (2021). The science of reading provides little or no support for the widespread claim that systematic phonics should be part of initial reading instruction: A response to Buckingham. doi.org/10.31234/osf.io/f5qyu
Brooks, G. (2023). Disputing recent attempts to reject the evidence in favour of systematic phonics instruction. Review of Education, 11(2), e3408.
Buckingham, J. (2020). Systematic phonics instruction belongs in evidence-based reading programs: A response to Bowers. The Educational and Developmental Psychologist, 1-9.
Camilli, G., Vargan, S., & Yurecko, M. (2003). Teaching children to read: the fragile link between science and federal education policy. Education Policy Analysis Archives, 11(15), 1–51.
Fletcher, J. M., Savage, R., & Vaughn, S. (2021). A commentary on Bowers (2020) and the role of phonics instruction in reading. Educational Psychology Review, 33, 1249-1274.
Galuschka, K., Ise, E., Krick, K., & Schulte-Körne, G. (2014). Effectiveness of treatment approaches for children and adolescents with reading disabilities: a meta-analysis of randomized controlled trials. PLoS One, 9(2), e89900. https://doi.org/10.1371/journal.pone.0089900.
Han, I. (2009). Evidence-based reading instruction for English language learners in preschool through sixth grades: a meta-analysis of group design studies. Retrieved from the University of Minnesota Digital Conservancy, http://hdl.handle.net/11299/54192.
Sherman, K. H. (2007). A Meta-analysis of interventions for phonemic awareness and phonics instruction for delayed older readers. University of Oregon, ProQuest Dissertations Publishing, 2007, 3285626.
Wyse, D., & Bradbury, A. (2022). Reading wars or reading reconciliation? A critical examination of robust research evidence, curriculum policy and teachers’ practices for teaching phonics and reading, Review of. Education, 10(1), 1 53. https://doi.org/10.1002/rev3.3314
****Here is the table from Bowers and Bowers (2021) responding to Buckingham (2020). Buckingham did not respond to any of points, nor has anyone else that I am aware of. But Buckingham did block me on twitter and suggested she would sue me for comments in the following blogpost that details the sad episode. https://jeffbowers.blogs.bristol.ac.uk/blog/buckingham-2020/