Journal of Linguistics and Language Teaching
Volume 3 (2012) Issue 2
The Differential Effects of Comprehensive Corrective Feedback on L2 Writing Accuracy
K. James Hartshorn (Provo
(Utah ), USA ) /
Norman W. Evans (Provo
(Utah ), USA )
Abstract
Although recent studies of focused written corrective feedback (WCF),
targeting only one or a few error types, may provide valuable insights for
building second language acquisition theory, a growing number of scholars have
been concerned with the ecological validity of these studies for the second
language (L2) classroom. While many researchers favor focused WCF to prevent
overload for L2 writers, this study examines an alternative instructional
strategy, which targets all errors simultaneously. Based on principles derived
from skill acquisition theory, this strategy avoids overload by using shorter
pieces of writing. Building on earlier research showing this method improved
overall accuracy, this study examines its effects on a variety of discrete
linguistic categories. Analyses of pretest and posttest writing in a
controlled, 15-week study, suggest that the treatment positively influenced L2
writing accuracy for the mechanical, lexical, and some grammatical domains.
Theoretical and pedagogical implications are addressed along with
limitations.
Key words: written corrective feedback, L2 writing
accuracy, second language acquisition
1 Introduction
Despite extensive debate over the efficacy of written corrective
feedback (WCF) in second language (L2) writing pedagogy, a great deal of
uncertainty remains. Numerous recent studies have advocated limiting WCF to one
or few linguistic features to prevent learner overload. Although these studies
of focused feedback have been beneficial to researchers and practitioners, some
scholars have questioned their ecological validity for classrooms where more
comprehensive feedback may be desired (e.g. Bruton 2009, 2010, Storch 2010, van
Beuningen 2010). Moreover, studies utilizing insights from skill acquisition
theory have provided evidence that extensive WCF can be both practical and effective
in improving accuracy (e.g. Evans et al. 2011; Hartshorn et al. 2010).
Nevertheless, these studies have only examined overall accuracy. Therefore, the
effects of this type of comprehensive WCF on a broader array of linguistic
domains remain unclear. Thus, the intent of this study was to identify the
effects of comprehensive WCF on a range of linguistic domains in an
ecologically valid classroom context. To do this, we analyzed a corpus of
student writing originally elicited to test the general effects of WCF based on
principles derived from skill acquisition theory (Evans et al., 2011; Hartshorn
et al. 2010).
2 Research Context
At the outset, this study needs to be contextualized. We begin by
defining WCF as any feedback
targeting grammatical, lexical, or mechanical errors in L2 writing. Further, we
define error broadly as a linguistic
choice that would “not be produced by the speaker’s native counterparts” given
the same context and conditions (Lennon 1991: 182). We use the term linguistic accuracy to refer to the
absence of these errors.
This study is situated at the intersection of the L2 writing and the
second language acquisition (SLA ) research
paradigms described by scholars such as Ferris (2010), Manchón (2011), and Ortega (2011). In the tradition of L2 writing, we want to
help our students to become “more successful writers” (Ferris 2010: 188).
Nevertheless, we also hope to “facilitate interlanguage development” and to
“draw L2 learners’ attention to linguistic forms in their own output” (Sheen
2010a: 175). As scholars such as Ferris (2010) and Ortega (2011) have observed,
we believe that both of these perspectives can work synergistically; we
therefore see both the L2 writing and SLA
paradigms as highly relevant to our students’ learning and to this study.
Nevertheless, our primary interest in this study focuses on
interlanguage development as demonstrated through the analysis of new texts
rather than text revision. This is because new texts show how or whether WCF
has affected the accuracy of the learner’s production. For discussions about
text revision studies versus studies using new texts, see authors such as
Ferris (2010), Sachs and Polio (2007), Truscott (2010), and Truscott and Hsu
(2008).
Though improving linguistic accuracy may not be a priority for all L2
writers, it is crucial for many and therefore deserves our careful attention.
While we recognize the obvious impropriety of those from previous eras who
attempted to reduce writing to a “reinforcement
of grammar teaching” (Watson 1982: 6), we also see compelling evidence
from L2 writing classrooms demonstrating marked improvements in rhetorical
skills (e.g. effective content, organization, flow of ideas) without improved
linguistic accuracy (e.g. Evans et al. 2011, Hartshorn et al. 2010, Hinkel
2004, and Storch 2009). Though accuracy is not more important than rhetorical
dimensions of writing, it deserves our attention simply because it is the single
greatest struggle that many L2 writers face. Thus, rather than merely assisting
learners to produce more accurate writing, our aim is to identify ways to help
them become more accurate writers.
3 Review of
literature
3.1
Description of Terms
With this context in mind, we consider the
literature most relevant to this study. Many researchers have examined the
general efficacy of WCF. However, the conflicting results and divergent designs
of early studies made comparing or synthesizing research findings difficult
(e.g. Bitchener 2008, Ferris 2003, 2004, Guénette 2007, and Truscott 2007).
Therefore, in an attempt to provide greater focus to WCF research, many
scholars have investigated the benefits of specific types of errors or feedback
methods. Because of their relevance to this study, we briefly consider three
distinctions often used to analyze WCF:
(a) treatable and untreatable
errors,
(b) direct and indirect
feedback, and
(c) focused and unfocused
feedback.
3.1.1 Treatable and Untreatable Errors
Ferris (1999) described
treatable errors as those that could be prevented through the application of
systematic rules governing the use of linguistic features such as articles,
verb tense, verb form, subject-verb agreement, and plurals. On the other
hand, it was believed that untreatable errors resulted from ignorance of
idiosyncratic language rules that must be acquired over longer periods, such as
word choice, word order, and certain sentence structures.
Several recent studies
have been designed to target treatable errors. For example, many have examined
the effects of various types of WCF on specific article functions. In such
studies we observe a fairly consistent pattern in which those who receive the
WCF use these article functions more accurately in the writing of new texts
than those who do not (e.g. Bitchener 2008, Bitchener & Knoch 2008, 2010a,
2010b, Bitchener Young & Cameron 2005, Ellis, Sheen, Murakami &
Takashima 2008, Sheen 2007, 2010b, Sheen et al. 2009). Though some studies have
produced mixed results (e.g. Lu, 2010), most research has supported the
treatable-untreatable distinction (e.g. Bitchener et al. 2005; Ferris 2006,
Ferris & Roberts 2001).
3.1.2 Direct and Indirect Feedback
Another distinction in the
literature is between different types of WCF. For example, scholars have
differentiated between direct feedback, in which a correction is provided (e.g.
written in the margin or between lines), and indirect feedback, where the location
of an error is indicated but without any correction. Indirect feedback can be
further classified as coded feedback, in which metalinguistic information in
the form of symbols is used to indicate the error type, or uncoded feedback,
where the error is identified through some form of marking such as underlining
or circling (e.g. Ferris & Roberts 2001, Robb, Ross, & Shortreed 1986).
Research on the benefits
of direct and indirect feedback seems inconclusive. Some studies have suggested
that indirect feedback may be most beneficial, whether uncoded (e.g. Lu, 2010)
or coded (e.g. Erel & Bulet 2007, Lalande 1982). However, other studies
found no differences between various methods of direct and indirect feedback
(Robb, Ross, & Shortreed 1986, Semke 1984), and no differences between
control groups and groups receiving various combinations of direct and indirect
feedback options (Bitchener & Knoch 2009a, Ferris & Roberts 2001).
Thus, further study is needed to clarify the effects of these types of
feedback.
3.1.3
Focused and Unfocused Feedback
A final distinction emphasized here is between what has been called
focused feedback (targeting only one or a few error types) and unfocused
feedback (targeting many or all error types). Most current WCF researchers continue
to advocate focused feedback over unfocused feedback (e.g. Bitchener 2008;
Bitchener & Knoch 2009a 2009b, Bitchener et al. 2005, Ellis et al. 2008,
Ferris 2006, Sheen 2007, Sheen et al. 2009). The primary rationale for
restricting feedback is the need to keep the “processing load manageable” to
avoid the “risk of overloading the students’ attentional capacity” (Sheen 2009:
559). Such studies have provided valuable evidence of the potential benefits of
WCF on one or a limited number of error types (e.g. Bitchener 2008, Bitchener
& Knoch 2009a, 2009b, Ellis et al. 2008, Sheen 2007, Sheen et al2009).
We are aware of only a few studies specifically designed to test whether
focused or unfocused feedback results in greater accuracy. The first study by
Ellis et al. (2008) observed no statistical differences between the two types
of feedback groups. However, Sheen et al. (2009) suggested that these results
may have been affected by limitations in the research design, which they
attempted to overcome. They examined four different learner groups. Two groups
received direct WCF, including focused feedback for one group (targeting
articles) and unfocused feedback for the other group (targeting articles along
with the copular to be, the regular
past tense, the irregular past tense, and locative prepositions). The remaining
groups included one which completed only the writing (without feedback) and a
control group (without writing or feedback). They found that while each group
made gains in accuracy, the focused WCF group outperformed all others and that
the accuracy of the unfocused feedback group was no greater than occurred for
the control group. They concluded that unfocused feedback tends to be
“confusing,” “inconsistent,” and may “overburden” the learners (Sheen et al.
2009: 567).
Another study by Farrokhi &
Sattarpour (2011) used direct feedback to examine the differential effects on
accuracy from focused and unfocused WCF for high and low proficiency levels.
English articles, the copular to
be, regular and irregular past tense, third person
possessive, and prepositions were targeted for the unfocused group. The groups
receiving focused WCF outperformed both the unfocused and control groups for
accuracy of English article use across both proficiency levels. Like Sheen et
al. (2009), they concluded that focused WCF “is more effective…than correcting
all of the existing errors” (Farrokhi & Sattarpour 2011: 1802).
Despite these conclusions
and the valuable insights which such studies may provide, they overlook a
number of important considerations. One is the desire which most learners have
for comprehensive WCF. Anderson
(2010), for instance, reported that 88% of his learners indicated a preference
for comprehensive WCF while only 26% specified a preference for focused
feedback (also see Ferris 2006, Leki 1991). Thus, before prematurely discarding
comprehensive feedback, some scholars suggest we continue to study its
viability (e.g. Ellis et al. 2008, van Beuningen 2010). Another consideration is a lack
of systematicity in the feedback process. For example, Sheen et al. (2009)
acknowledged that the unfocused feedback in their study was neither systematic
nor comprehensive. Though this lack of systematicity may have been implemented
in an attempt to reflect feedback commonly observed in practice as “arbitrary”
and “inconsistent” (Sheen et al. 2009: 566), such limitations make it
difficult, if not impossible, to weigh the actual benefits of focused feedback
compared to a systematic approach to comprehensive feedback. What has yet to be
studied is a feedback method that is comprehensive, systematic, and appropriate
to learners’ cognitive capacity. The need for such study is especially acute in
light of scholars who have called for researchers to investigate a broader
range of linguistic features using additional types of feedback (e.g. Bitchener
2009, Ellis 2008).
3.2 Expanding the WCF Research Agenda
Though carefully focused WCF may be essential for certain types of
research or theory building, it may be less ideal for the practical realities
of some classroom contexts. Not only could highly restricted feedback
inadvertently promote avoidance strategies (e.g. Truscott 2004, Xu 2009), but
it could also divert learner attention away from a broader view of accuracy
(van Beuningen 2010), possibly hindering language development in other
linguistic domains. Although most recent
research continues to extol the virtues of focused WCF, some scholars have
recognized problems associated with “strict limits on the number of errors” being treated
and “narrowly defined error categories” (Ferris 2010: 192). Others have
expressed concern over pervasive attempts to generalize about “the efficacy of
WCF” when most of the available research is based on “a limited range of
structures” (Storch 2010: 41).
Thus, while some researchers have considered “comprehensive [corrective
feedback]” as the “most authentic feedback methodology” (van Beuningen 2010: 20),
there are sound reasons for including it in SLA
research agendas as well. Though we acknowledge reasonable grounds for
isolating one linguistic feature at a time, there also is a compelling
rationale for examining the effects of a particular treatment on various
linguistic domains at the same time within the same learning condition. This is
because it is only in such a research design that we can truly observe the
differential effects of a specific treatment within a given context.
Despite the important place in our research which must be reserved for
focused feedback, its limitations justify scholars in exploring alternative
strategies for using feedback that targets multiple error types without
overloading the learner. For example, van Beuningen (2010: 19) declared, “the
learning potential of comprehensive [corrective feedback] deserves more
attention.” Similarly, Ellis et al. (2008: 367) asserted, “the question of the
extent to which [corrective feedback] needs to be focused in order to be
effective remains an important one,” and concluded, “if [corrective feedback]
is effective when it addresses a number of different errors, it would be
advantageous to adopt this approach.”
3.3 Comprehensive Feedback
3.3.1
Dilemmas
Though there are sound reasons for examining comprehensive feedback for
research and pedagogy, a few dilemmas need to be reconciled for it to be a
reasonable alternative. First, comprehensive WCF is often unmanageable for the
teacher as well as the learner. In addition to risking teacher fatigue,
students cannot benefit from more feedback than they are capable of processing.
Second, not all pedagogical practices involving WCF may be designed well enough
to ensure improved accuracy. Consider, for example, the limited efficacy of
teacher feedback if the learner must wait for several weeks before receiving
feedback or if the learner does not have adequate opportunities to process and
practice utilizing the feedback. Such limitations in the use of WCF are inconsistent
with the relevant notions from skill acquisition theory we now consider (e.g.
DeKeyser 2001, 2007).
3.3.2 Skill Acquisition Theory
Early contributors to thinking about skill acquisition could include
Hulstijn & Hulstijn (1983), who distinguished explicit knowledge (verbalizable)
from implicit knowledge (non-verbalizable), Anderson (1983) who
differentiated between declarative knowledge (what one knows) and
procedural knowledge (what one can do), and McLaughlin, Rossman, and
McLeod (1983), who described controlled processing (requiring a heavy strain
on learner attention) versus automatic processing (requiring little or
no attention). Van Patten and Benati (2010: 33) have described skill
acquisition as “a general theory” which claims “adults [learn] through largely
explicit processes”, and that with ongoing “practice and exposure,” they become
“implicit processes.” In other words, skill acquisition theory seeks to explain
or predict learner progress through the declarative, procedural, and automatic
stages of skill use. Thus, the culminating point of skill acquisition is
automaticity, which has been described as “the absence of attention control in
the execution of a cognitive activity” (Segalowitz & Hulstijn 2005: 371).
Moreover, DeKeyser (2001, 2007) has observed that skill acquisition theory
predicts that errors will decrease as a function of practice when supported by
abundant examples, explicit rule-based instruction, and frequent application.
Taken together, skill acquisition theory suggests that in order to
facilitate progress toward automaticity, instruction, practice, and feedback
should be meaningful, timely, and constant. These learning activities are
meaningful when instruction is explicit, when students understand the practice
task and its purpose, and when they understand the feedback they receive and
what they are to do with it. Instruction, practice, and feedback are timely
when instruction addresses the most germane concerns from the learner’s recent
writing, practice immediately follows instruction, and feedback is provided
promptly after practice. This process is constant when teachers and learners
continually engage in this cycle of teaching and feedback-based learning over
an extended period.
With these theoretical ideals in mind, instruction, practice, and
feedback need to be manageable if these learning activities are to be
meaningful, timely, and constant—especially if the teacher intends to provide
comprehensive WCF. One solution well suited to these principles is to shorten
the length of the writing task. While dramatic limits on the length or volume
of texts could undermine the practice and evaluation of rhetorical aspects of
writing, we reason that such constraints have little if any adverse effect on identifying
patterns of linguistic error production. On the other hand, time limits can
preserve manageability for both the teacher and the student and make it
possible for instruction, practice, and feedback to continue to be meaningful,
timely, and constant.
3.3.3
Dynamic WCF
Given the limitations of focused WCF and the call for examining the
effects of addressing multiple errors simultaneously, we have utilized what we
have termed dynamic WCF, which is simply our way of operationalizing the
principles associated with skill acquisition theory. Thus, dynamic WCF
is an instructional strategy designed to help L2 learners improve the accuracy
of writing by ensuring that instruction, practice, and feedback are manageable,
meaningful, timely, and constant. One should note that our application of
dynamic WCF simply targets linguistic accuracy—not rhetorical dimensions of
writing. It is expected that students will also need practice and feedback on
longer pieces of writing across various genres if they are to continue to
develop their rhetorical skills.
The effects of dynamic WCF on overall accuracy were analyzed in earlier
research. Though differences between a treatment group and a contrast group in
an intensive English program (IEP) were not statistically significant for
measures of rhetorical competence, fluency, or complexity, results showed a
significant difference for accuracy as measured by error-free T-unit[1] ratios (Hartshorn et al. 2010). In additional research conducted with
matriculated university students, accuracy was measured by error-free clause
ratios (Evans et al. 2011). While the findings from both studies suggest a
clear benefit to dynamic WCF on overall accuracy, they do not help us
understand whether there was greater improvement for some linguistic domains
than for others. This question has important implications for helping us
understand the specific effects of those principles which underlie skill
acquisition theory. Thus, additional research was needed.
Since utilizing skill acquisition theory to test the efficacy of
extensive feedback is a novel line of inquiry, we believed that it was
essential to first identify its global effects on a broad array of linguistic
domains before examining its effect on highly specific linguistic features.
Thus, we organized the most commonly observed errors into error types and error
families. The three families included grammatical errors, lexical errors, and
mechanical errors. The grammatical error family included sentence
structure errors, determiner errors (e.g. articles, possessive nouns and
pronouns, numbers, indefinite pronouns, and demonstrative pronouns), verb
errors (e.g. subject-verb agreement, verb tense, and other verb form problems),
numeric shift errors (e.g. count-non-count, singular-plural), and semantic
errors (e.g. awkwardness, insertion / omission, unclear meaning, and word
order). The lexical error family included word choice errors, word form
errors, and preposition errors. The mechanical error family included
errors in capitalization, indentation, non-sentence level punctuation, and
spelling. With these error families in mind, we need to emphasize that the
intent of this study was to compare the effects of comprehensive WCF with those
of a traditional approach to process writing to determine whether the
comprehensive feedback would increase the accuracy of L2 writing.
3.4 Research
Question
We now articulate the
seven parts of our research question: Compared to a traditional process writing
class, to what extent will the treatment produce greater accuracy for each of
the following linguistic domains:
(a) sentence structure
accuracy,
(b) determiner accuracy,
(c) verb accuracy,
(d) numeric agreement
accuracy,
(e) semantic accuracy,
(f) lexical accuracy, and
(g) mechanical accuracy?
4 Method
4.1 Participants
4.1.1 Learners
There were 19 students in the contrast
group—4 males and 15 females—with a mean age of 25. The contrast group L1s
included Spanish (6), Korean (3), Mandarin (3), Portuguese (3), French (1),
Mongolian (1), Romanian (1), and Russian (1).
The treatment group included 28 students—16 males and 12 females—with a
mean age of 24. The treatment group L1s
included Spanish (19), Korean (6), Japanese (2), and French (1). Although this study used intact classes,
prior to the treatment, a preliminary test of accuracy based on error-free
T-units of student writing was not statistically significant, t(45) = 0.58, p = 0.56, suggesting the comparability of the groups.
4.1.2 Teachers
Five teachers participated
in this study. Each held a relevant graduate degree and was considered to be an
effective teacher by his or her students and peers. Since some students who
were admitted into university programs during the semester did not participate
in the posttest, the number of students with complete data sets taught by each
teacher was unequal. The first two each taught 10
students in the treatment group. The second and third teachers instructed 5 and
6 students, respectively, who were part of the contrast group. The fifth was
the teacher of 8 students in the treatment group and 8 students in the contrast
group.
After undergoing a period of training, two of these instructors also blindly
scored the essays on the various aspects of accuracy outlined above.
4.2 Procedures
Procedures in this study fall into two categories, including the daily
activities which differentiate the treatment group from the contrast group and
the procedures used for data elicitation.
4.2.1
Treatment and Contrast
Since the principles underlying dynamic WCF might be
operationalized in a variety of ways, we briefly describe how we have applied
them in the current study. First, to ensure manageability, writing tasks were completed within a 10-minute time limit. Writing prompts were general and were usually minimized to one or two
words. Topics included social issues, science, history, and popular culture. While the primary audience was the
teacher, learners were free to shape the writing task within the context of a
given topic (e.g. “The Economy,” “Friendship,” or “Global Warming”). Genres
varied but were largely descriptive, expository, narrative, or persuasive in
nature.
In order to ensure that feedback was timely and constant,
writing was completed three to four times per week. Teachers marked the short compositions with coded symbols based on the most commonly observed
error types (see Appendix A). Teachers also provided learners with a holistic
score for each student’s writing based on its linguistic accuracy and content. Writing was returned to the student by the next class
period. To ensure that tasks and feedback were meaningful, students were taught
about the purpose of the course and the writing tasks along with the codes at
the beginning of the semester. Since the literature showed no obvious benefit
for direct versus indirect feedback (whether coded or uncoded), we used coded
symbols to facilitate counts of error type by frequency. These frequency counts
functioned as an indication of performance levels over time. Rather than
use a predetermined syllabus, teachers used this
ongoing flow of information to determine or adjust classroom instruction in a
dynamic manner attuned to the changing needs
of the learners.
Students also used a number of tools designed to facilitate linguistic
awareness. These included error tally sheets (a list
of error frequency counts from each piece of writing), error lists (a complete
inventory of all errors produced along with the surrounding text), and edit
logs (an ongoing record of the number of times the work was resubmitted before it
was deemed free of errors). Students edited successive drafts of their
writing until all the errors were corrected. If particular errors were not addressed
in a subsequent draft, the errors were marked and the writing was returned to
the student. This process was repeated until the piece of writing was deemed
error-free or until one week transpired from the time the teacher provided the
first WCF for a given piece of writing. Thus, students were engaged in editing
multiple drafts at the same time. Nevertheless, since the pieces of writing
were short, tasks remained manageable. Three or four times during the semester,
learners also wrote longer compositions with the expectation that they would
appropriately attend to all rhetorical requirements of the writing task.
We now compare the treatment group with the contrast group. Students in
both groups were enrolled in the same IEP, participating in a 15-week semester.
A battery of in-house language tests placed them into the institution’s highest
proficiency level. Monday through Thursday, they attended four 65-minute
classes. Thus, both groups participated in 13 classroom hours per week designed
to strengthen reading, listening, and speaking skills, and 6 hours per week of
related homework activities; 4 hours and 20 minutes per week were devoted to a
traditional writing class for the contrast group and a class emphasizing
dynamic WCF for the treatment group. Both groups participated in approximately
two additional hours of homework each week relating to these experimental
courses. Careful reviews of the curriculum led us to believe that nothing
outside of the treatment was likely to create a differential effect for writing
accuracy.
Students in the treatment group received dynamic WCF as described above.
Observation of classes and feedback suggested consistent patterns among
the teachers involved in this study.
Students in the contrast group were taught with a traditional approach
to process writing. They wrote four five-page papers during the course, each of
which required three or four separate drafts for which the teacher provided
thorough feedback on content and organization. Students were expected to
include effective introductions, thesis statements, topic sentences, and
conclusions. Papers were designed to demonstrate their ability to defend an
opinion, synthesize, argue, hypothesize, and propose. While most of the class
time was devoted to helping students understand and develop relevant rhetorical
skills within the context of a particular genre, a limited amount of feedback
also targeted linguistic accuracy. During the experimental period, learners
from both the treatment and contrast groups wrote three or four 30-minute
compositions similar to those used to elicit the pretest and posttest data.
4.2.2 Data Elicitation
Although most of the treatment dealt with short compositions, the intent
was to compare the accuracy of new pieces of writing which were more substantial.
Thus, we determined to analyze 30-minute essays. This format seemed appropriate
because of its pervasiveness in standardized language examinations (Ferris
2009) as well as its prevalence in university courses. Moreover, we believed
our window into language development would be more valid if we utilized
authentic, timed tests, in which student focus would gravitate toward content
rather than language (e.g. Long 2007). In a secure testing environment,
students in both groups typed responses to the same pretest and posttest
prompts (see Appendix B). Pretest data were obtained at the conclusion of one
semester and posttest data were obtained at the conclusion of the following
semester. The only word processing tools available to students during these
tests were cutting, copying, and pasting text. The software automatically ended
the test once the allotted time elapsed.
4.3 Data
Analysis
We now describe the various measures of accuracy along with the scoring
guidelines used in this study.
4.3.1
Measures of L2 Writing Accuracy
In order to measure accuracy, many recent studies have used the obligatory
occasion analysis (e.g. Bitchener et al. 2005, Bitchener 2008, Bitchener &
Knoch 2009a, 2009b, 2010, Ellis et al. 2008), which involves identifying all
obligatory occasions of a particular linguistic feature in a text and then
calculating the ratio of the correctly supplied features over the total number
of obligatory occasions (Ellis & Barkhuizen 2005).
Although obligatory occasion analysis seems well suited for some
contexts, its limitations make it inappropriate for comprehensive feedback. For
example, it may not be possible to identify all of the obligatory occasions for
every linguistic feature; nor is it appropriate for writing samples that
include no obligatory occasions for a particular linguistic feature.
Furthermore, the method has no way of accounting for lexical errors. Due to
these limitations, we recognized the need for an alternative measure of
accuracy.
Subsequently, we adapted an approach from Wolf-Quintero, Inagaki &
Kim (1998), who recommended the ratio of errors over the total number of
T-units because of its high correlation with written language development.
While traditionally, this has involved one overall measure of error production,
we incorporated two small innovations in this study. First, rather than using
the ratio to provide one overall measure of error production, we examined
varying performance levels among the seven different error types within the
three error families.
Second, rather than focusing on inaccuracy, the formula (1–Errors / T-units)
was used in this study to express the accuracy for each linguistic domain. For
example, if a student produced six determiner errors within 30 total T-units,
the accuracy score for determiners would be (1– 6/30), or .80. Thus, this
text-centric analysis of accuracy is represented as a proportion of the text’s
overall communicative potential for each linguistic domain. Though this method
does not distinguish between the accurate use of these linguistic features and
their absence, all errors are targeted simultaneously. Thus, this method
provides a fairly comprehensive view of accuracy since it produces one measure
for each linguistic domain. Addressing all errors simultaneously also helps
soften concerns over learner performance strategies such as avoidance (e.g.
Truscott 2004, Xu 2009), which could be more problematic in studies which
target a single error type.
4.3.2
Scoring guidelines
In addition to understanding how accuracy was measured, we also need to
examine how the writing was scored. Most errors were assigned a value of one
with no attempt to weight egregiousness. For example, a subject-verb agreement
error or a missing determiner was counted as one error each. However, since
those in the semantic error group affected meaning to varying degrees, an
attempt was made to account for this variability. One error was counted for
every word which was inserted inappropriately or every time an obligatory word
was missing. One error was counted for every word order error where one shift (whether of one word or a group of
words) could correct the error. In addition, the notion of awkwardness was
defined as a production error that was obviously distracting, though the
meaning of the construction seemed clear to the scorer. Such productions were
also counted as one error each.
Perhaps the most complex error type in the semantic error group was unclear meaning. Since scorers had no
way of verifying learner intent, a text-based method of analysis was devised to
quantify the proportion of the text which was incomprehensible. To qualify for
a clear meaning, a particular word would need to exhibit semantic clarity with
the word preceding and following it. For example, consider the following
construction: After working all day, the
work come bed TV sleep early. The breakdown in this construction begins
with the word work. Though the word work is preceded acceptably by the
article the, the verb come that follows violates syntactic
expectations. Therefore, the error counting begins with the word work and continues through the word sleep for a total unclear meaning error
value of five. The word early is not
counted as an error because its preceding word, sleep, concatenates acceptably with the word early.
In addition to these grammar-based errors, the lexical error family
included inaccurate word choices, word forms, and prepositions. Each of these
inaccuracies was considered as one error. However, only those errors which were
correctly spelled were eligible for these categories. Otherwise, such words
were considered as spelling errors. In addition to spelling, each problem with
punctuation or capitalization was counted as a mechanical error.
4.3.3
Reliability Estimates
Two
scorers, (S1) and (S2), jointly determined the number of T-units in each essay.
S1 scored
all of the compositions (94) for each of the seven error types, and S2 scored
just over half of the essays (48). Those scored by S2 were drawn from
stratified random samples from six essay categories based on pretests and
posttests within three levels of student proficiency (low, middle, or high).
Pearson correlation coefficients included mechanical accuracy, r=.98 (p<.001), determiner accuracy, r=.94 (p<.001),
semantic accuracy, r=.92 (p<.001), verb accuracy, r=.90 (p <.001), sentence structure accuracy, r=.87 (p<.001),
numeric agreement accuracy, r=.83 (p<.001), and lexical accuracy, r=.81 (p<.001). While these results exceeded the recommendation of 70%
consistency with a 30% overlap proposed by Stemler and Tsai (2008),
coefficients for some linguistic domains were smaller than expected due to
occasional discrepancies in error categorizations. Nevertheless, we believed
that these correlations provided enough evidence of reliability to warrant the
statistical analyses needed to address our research question.
4.3.4 Statistical
Analysis
With a significance level
of .05, we computed mixed model, repeated measures analyses of variance
(ANOVA). We also measured effect sizes using the partial eta squared (ƞ2p). Cohen
(1988) provided guidelines for interpreting these effect sizes, noting that .01 was small, .06 was moderate, and .14 or greater was large. Moreover,
Ferguson (2009:
533) reported .04 as the “recommended minimum effect size representing a ‘practically’
significant effect,” or the point at which a factor may have utility in
practice regardless of statistical significance.
5 Results
The research question
dealt with whether any of the seven mean accuracy scores from the posttest
essays would be significantly greater for the treatment group when compared to
the contrast group. Descriptive statistics for the mixed model ANOVA are
displayed in Table 1. Results were mixed in that some tests suggested a
meaningful difference while others did not. We will limit our presentation of
results to the group by time interactions, each of
which is examined in order of increasing effect size.
Pretest Scores
|
Posttest Scores
|
|||||
Accuracy
|
Groups
|
M
|
SD
|
M
|
SD
|
|
Sentence
|
Contrast
|
.961
|
.046
|
.965
|
.035
|
|
Treatment
|
.958
|
.052
|
.979
|
.031
|
||
Numeric
|
Contrast
|
.953
|
.051
|
.963
|
.047
|
|
Treatment
|
.936
|
.069
|
.933
|
.082
|
||
Determiner
|
Contrast
|
.866
|
.151
|
.794
|
.171
|
|
Treatment
|
.797
|
.161
|
.848
|
.153
|
||
Lexical
|
Contrast
|
.812
|
.100
|
.768
|
.115
|
|
Treatment
|
.717
|
.168
|
.793
|
.152
|
||
Verb
|
Contrast
|
.743
|
.148
|
.693
|
.254
|
|
Treatment
|
.726
|
.186
|
.794
|
.196
|
||
Semantic
|
Contrast
|
.688
|
.201
|
.708
|
.139
|
|
Treatment
|
.649
|
.253
|
.811
|
.138
|
||
Mechanical
|
Contrast
|
.150
|
.400
|
- .030
|
.565
|
|
Treatment - .145
|
.690
|
.140
|
.606
|
|||
Tab. 1: Descriptive Statistics by Accuracy Type and
Group
We begin with the numeric
agreement accuracy scores, which included the appropriate use of count and
non-count nouns as well as the accurate production of singular and plural
constructions. Mean differences were not significant, F(1,45)=.30, p=.59, and
the effect size was negligible,
ƞ2p<.01. Similarly, mean
differences in the degree to which student writing utilized complete sentences
was not significant, as shown by the mean sentence structure accuracy scores, F(1,45)=2.31, p=.31. This analysis also produced an effect size too small to
consider meaningful (ƞ2p=.02). Like the previous
analyses, the test examining verb accuracy scores, which included subject-verb
agreement and verb tense was not statistically significant, F(1,45)=3.33, p=.08. Nevertheless, this test produced a moderate effect size (ƞ2p=.07), suggesting a
practical benefit to the treatment group.
The next three tests were
statistically significant and produced effect sizes at the border of moderate
to large favoring the treatment group. These included determiner accuracy, F(1,45)= 6.11, p=.017, (ƞ2p=.12); semantic accuracy, F(1,45)=6.46,
p=.015, (ƞ2p=.13); and lexical accuracy, F(1,45)=6.48,
p=.014, (ƞ2p=.13). The final analysis was of mechanical accuracy, which included
spelling, capitalization, and non-sentence level punctuation. This test
produced a statistically significant interaction effect for group by time, F(1,45)=14.26, p<.001, and resulted in a fairly large effect size favoring the
treatment group (ƞ2p=.24).
6 Discussion
6.1 Analysis of the Findings
One salient finding from
this study is that the treatment influenced some linguistic domains more than
others. Though the precise reasons for these differential effects are unclear,
we note that at least some aspects from each of the three error families were
affected positively by the treatment (i.e. effect sizes showed that the
treatment benefited mechanical, lexical, and some forms of grammatical
accuracy), despite no observable benefits for numeric agreement and sentence
structure accuracy.
This lack of improvement
for numeric agreement and sentence structure could have been the result of a
number of possible factors. For example, while skill acquisition theory (as
operationalized here) benefited learners for most linguistic categories, it may
be inadequate for effecting greater accuracy for these linguistic domains.
Another explanation is the possibility of a ceiling effect. For example, since
pretest means for both groups ranged from .93 to .96 (see Table 1), there was
very little room for improvement, which might account for these negligible
differences.
In contrast, the treatment
had the greatest positive impact on the mechanical accuracy family, largely
made up of errors based on spelling, punctuation, and capitalization. This
result is consistent with claims made by Truscott (2007: 258) who suggested
that such errors “are among the most correctable.” Nevertheless, viewing
grammatical structures as more complex, Truscott (2007: 258) also has maintained
“correction may have value for some non-grammatical errors but not for errors
in grammar”.
Thus, the most compelling
results may be those associated with the grammatical error family. Although
differences between the treatment and contrast groups were not statistically
significant for verb accuracy scores, the analysis produced a moderate effect
size, suggesting a practical benefit for the treatment group. While these
benefits can only be seen as marginal, they seem consistent with focused
feedback studies, showing the benefits of WCF on improved accuracy for both
past tense (Bitchener et al. 2005, Sheen, 2007, Sheen et al. 2009) and present
tense (Lu 2010).
In
addition, statistically significant improvements were observed for determiner
accuracy as well as semantic accuracy. The effect size for determiner use was
at the border between moderate and large. These findings seem consistent with
focused feedback studies, which examined the benefits of WCF on article use
(e.g. Bitchener et al. 2005, Ellis et al. 2008, Ferris & Roberts 2001, and
Sheen 2007).
The final
category from the grammatical family examined in this study was semantic
accuracy, which produced an effect size at the threshold of moderate to large.
Semantic accuracy, as measured here, encompassed the application of a complex
body of knowledge, including appropriate word order and collocations which help
writers to avoid language that is awkward,
unclear, or simply unintelligible. Such findings seem inconsistent with claims
from Truscott (2007) that WCF is not beneficial
for improving grammatical accuracy.
Moreover,
when coupled with the apparent benefits of the treatment on lexical accuracy,
these findings seem to suggest that even linguistic features that have been
considered “untreatable” may benefit from WCF. For example, while Ferris (2010:
192) observed that most recent studies have been limited to linguistic features
that are “relatively easy . . . to define, describe, and teach,” she also
mentioned the more troublesome errors that “obscure meaning and interfere with
communication” including problems with “word order, sentence boundaries, phrase
construction, word choice, or collocations” (Ferris 2010:193).
Remarkably,
these are the very kinds of errors learners must overcome in order to write
with the semantic and lexical accuracy examined in this study. Though more
research targeting the effects of WCF on semantic and lexical accuracy is
needed, such findings should be encouraging for scholars such as Xu, who
suggested,
If writing teachers are to feel
more assured of error correction practice, we need evidence from some
linguistic features which are not so teachable but prove to be treatable. (Xu
2009: 275)
At a time
when many researchers continue to advocate focused WCF to prevent learner
overload, the findings of this study suggest that focused WCF may not be the
only way to ensure manageability. Although additional research and replication are needed in this line of inquiry, this study has demonstrated that the
principles associated with skill acquisition theory produced positive results
for an array of linguistic domains. However, the fact
that improvements were not observed equally for all linguistic domains could
suggest limitations in skill acquisition theory or how it was operationalized
in this study.
6.2 Limitations and Suggestions for Further
Research
While this study provided
evidence of the benefits of dynamic WCF on broad linguistic domains, future
research should examine specific linguistic features with greater precision.
Another suggestion for future research relates to how errors are defined and
how accuracy is measured. Though the internal consistency for error identification
was 98% between the scorers, the difficulty of categorizing specific error
types varied. Better training and protocols may resolve these challenges. For
example, rather than simply categorizing a particular lexical item as accurate or inaccurate, it may be more effective to determine the extent to
which a specific lexical choice may be perceived as acceptable or unacceptable.
Moreover, since there may have been a ceiling effect for some variables, it may be helpful to include less proficient learners in
future research. The treatment may only facilitate greater accuracy for certain
linguistic domains at a particular range of proficiency. Finally, since this
study only examined the effects of the treatment during a 15-week semester, future
research should also analyze longitudinal data.
6.3 Pedagogical Implications
Despite their limitations,
the findings of this study may have some practical implications for pedagogy.
First, results have shown that a systematic application of the principles behind
skill acquisition theory may have a positive effect on the accuracy of L2
writing for both non-grammatical and grammatical errors without undermining
rhetorical competence. Second, our results underscore the assertion that
focused WCF may not be the only appropriate form of feedback for every learning
context: practitioners should be encouraged to explore what may be best for
their specific learners.
Nevertheless, potential
benefits would need to be carefully considered along with practical constraints
for each unique context. For example, some classes may not be able to meet
frequently, potentially undermining the expectation for writing tasks and
feedback to be timely and constant. Though technological solutions may
effectively remedy these challenges in some settings, there may be no simple
solutions in others. Nevertheless, if scholars can continue to clarify which
factors have the greatest impact on accuracy, practitioners in contexts in
which linguistic accuracy is among the priorities will be empowered to make
informed decisions about how best to help their students to improve the
accuracy of their writing.
7 Conclusion
Although further research
is needed to help us to fully understand the most effective ways to utilize WCF
to improve L2 writing accuracy, the findings of this study should be
encouraging for practitioners who are seeking insights to guide teaching and
learning in their classrooms. We hope that more scholars and practitioners will
take interest in this line of inquiry aimed at addressing overall linguistic
accuracy in traditional classroom settings. It seems that there may be more
than one appropriate way to guide L2 writers along the path of language development
and that learners will benefit from researchers and practitioners who utilize
pedagogical practices which help students maximize their opportunities to learn
to write more accurately within their unique learning contexts.
Appendices
Appendix A
Writing Tasks
Used for Data Elicitation
Students were
given 30 minutes to address the following writing tasks
Pretest writing task administered before Week 1:
Do you agree or disagree with the following
statement? Only people who earn a lot of
money are successful. Use specific
reasons and examples to support your answer.
Posttest writing task administered in Week 15:
In your opinion,
what is the most important characteristic (for example, honesty, intelligence,
a sense of humor) that a person can have to be successful in life? Use specific
reasons and examples from your experience to explain your answer.
References
Anderson, T. (2010). The effects of tiered corrective feedback on second language academic writing. An unpublished thesis from the University of British Columbia , Vancouver , Canada .
Bitchener, J. (2008). Evidence in support of written corrective feedback. Journal of Second Language Writing, 17, 102–118.
Bitchener, J. (2009). Measuring the effectiveness of written corrective feedback: A response to “Overgeneralization from a narrow focus: A response to Bitchener (2008).” Journal of Second Language Writing, 18(4), 276–279.
Bitchener, J., & Knoch, U. (2008). The value of written corrective feedback for migrant and international students. Language Teaching Research, 12, 409–431.
Bitchener, J., & Knoch, U. (2009a). The relative effectiveness of different types of direct written corrective feedback. System, 37, 322–329.
Bitchener, J., & Knoch, U. (2009b). The value of a focused approach to written corrective feedback. Language Teaching Research, 12, 409–431.
Bitchener, J., & Knoch, U. (2010a). The contribution of written corrective feedback to language development: A ten month investigation. Applied Linguistics, 31(2), 193–214.
Bitchener, J., & Knoch, U. (2010b). Raising the linguistic accuracy level of advanced L2 writers with written corrective feedback. Journal of Second Language Writing, 19, 207–217.
Bitchener, J., Young, S., & Cameron, D. (2005). The effect of different types of corrective feedback on ESL student writing. Journal of Second Language Writing, 14, 191–205.
Bruton, A. (2009). Improving accuracy is not the only reason for writing, and even if it were…. System, 37, 600–613.
Bruton, A. (2010). Another reply to Truscott on error correction: improved situated designs over statistics. System, 38, 491–498.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale , NJ : Erlbaum.
DeKeyser, R. (2001). Automaticity and automatization. In P. Robinson (Ed.), Cognition and second language instruction (pp. 125–151). Cambridge : Cambridge University Press.
DeKeyser, R. (2007). Skill acquisition theory. In B. Van Patten & J. Williams (Eds.), Theories in Second Language Acquisition (pp. 97–113). Mahwah , NJ : Erlbaum.
Ellis, R., & Barkhuizen, G. (2005). Analyzing learner language. Oxford : Oxford University Press.
Ellis, R., Sheen, Y., Murakami, M., & Takashima, H. (2008). The effects of focused and unfocused written corrective feedback in an English as a foreign language context. System, 36, 353–371.
Erel, S., & Bulut, D. (2007). Error treatment in L2 writing: A comparative study of direct and indirect coded feedback in Turkish EFL context. Journal of Institute of Social Sciences , Erciyes University , 23, 397–415.
Evans, N. et al. (2011). The efficacy of dynamic written corrective feedback for university-matriculated ESL learners. System, 39, 229-239.
Evans, N., Hartshorn, K. J., Allen, E. (2010). Written corrective feedback: Practitioner perspectives. International Journal of English Studies, 10, 47-77.
Evans, N., Hartshorn, K. J., McCollum, R. M., & Wolfersberger, M. (2010). Contextualizing corrective feedback in L2 writing pedagogy. Language Teaching Research, 14, 445-463.
Farrokhi, F. & Sattarpour, S. (2011). The effects of focused and unfocused written corrective feedback on grammatical accuracy of Iranian EFL learners. Theory and Practice in Language Studies, 1, 1797-1803.
Ferris, D. (1999). The case for grammar correction in L2 writing classes: A response to Truscott (1996). Journal of Second Language Writing, 8, 1–11.
Ferris, D. (2003). Response to student writing: Implications for second language students. Mahwah , NJ : Erlbaum.
Ferris, D. (2004). The “grammar correction” debate in L2 writing: Where are we, and where do we go from here? (and what do we do in the meantime . . . ?) Journal of Second Language Writing 13, 49–62.
Ferris, D. R. (2006). Does error feedback help student writers? New evidence on the short- and long-term effects of written error correction. In K. Hyland & F. Hyland (Eds.), Feedback in second language writing: Contexts and issues (pp. 81–104). Cambridge : Cambridge University Press.
Ferris, D. R. (2009). Teaching college writing to diverse student populations. Ann Arbor : University of Michigan Press.
Ferris, D. R. (2010). Second language writing research and written corrective feedback in SLA : Intersections and practical applications. Studies in Second Language Acquisition, 32, 181–201.
Ferris, D. R., & Roberts, B. (2001). Error feedback in L2 writing classes: How explicit does it need to be? Journal of Second Language Writing, 10, 161–184.
Guénette, D. (2007). Is feedback pedagogically correct? Research design issues in studies of feedback on writing. Journal of Second Language Writing, 16, 40–53.
Hartshorn, K. J. et al. (2010). Effects of dynamic corrective feedback on ESL writing accuracy. TESOL Quarterly, 44, 84-109.
Hinkel, E. (2004). Teaching academic ESL writing: Practical techniques in vocabulary and grammar. Mahwah , NJ : Lawrence Erlbaum.
Hunt, K. W. (1965). Grammatical structures written at three grade levels. Urbana , IL : The National Council of Teachers of English. Journal, 66, 140–149.
Lalande, J. (1982). Reducing composition errors: An experiment. The Modern Language
Leki, I. (1991). The preferences of ESL students for error correction in college-level writing classes. Foreign Language Annals, 24, 203–218.
Lennon, P. (1991). Error: Some problems of definition. Applied Linguistics, 12, 180-196.
Long, M. H. (2007). Problems in SLA . Mahwah , NJ : Lawrence Erlbaum Associates.
Lu Y. (2010). The value of direct and indirect written corrective feedback for intermediate ESL students. Unpublished master's thesis, Auckland University of Technology, Auckland , New Zealand .
Manchón, R. M. (2011). Situating the learning-to-write and writing-to-learn dimensions of L2 writing. In R. M. Manchón (Ed.), Learning-to-Write and Writing-to-Learn in an Additional Language, (pp. 3–14). Philladephia , PA : John Benjamins.
McLaughlin, B., Rossman, T., & McLeod, B. (1983). Second language learning: an information-processing perspective. Language Learning, 36, 109-123.
Ortega, L. (2011). Reflections on the learning-to-write and writing-to-learn dimensions of second language writing. In R. M. Manchón (Ed.), Learning-to-Write and Writing-to-Learn in an Additional Language, (pp. 237–250). Philladephia , PA : John Benjamins.
Robb, T., Ross, S., & Shortreed, I. (1986). Salience of feedback on error and its effect on EFL writing quality. TESOL Quarterly, 20, 83–91.
Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written feedback on a L2 writing revision task. Studies in Second Language Acquisition, 29, 67–100.
Segalowitz, N. & Hulstijn, J. (2005). Automaticity in bilingualism and second language learning. In J.F. Kroll & A.M.B. De Groot, (Eds.), Handbook of bilingualism: Psycholinguistic approaches. (pp. 371-388) Oxford , UK : Oxford University Press.
Semke, H. (1984). The effect of the red pen. Foreign Language Annals, 17, 195–202.
Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for generalized causal inferences. New York : Houghton Mifflin.
Sheen, Y. (2007). The effect of focused written corrective feedback and language aptitude on ESL learners’ acquisition of articles. TESOL Quarterly, 41, 255–283.
Sheen, Y. (2010a). The role of corrective feedback in second language acquisition. Studies in Second Language Acquisition, 32, 169–179.
Sheen, Y. (2010b). Differential effects of oral and written corrective feedback in the ESL classroom. Studies in Second Language Acquisition, 32, 203–234.
Sheen, Y., Wright, D., & Moldawa, A. (2009). Differential effects of focused and unfocused written correction on the accurate use of grammatical forms by adult ESL learners. System, 37, 556–569.
Stemler, S.E., & Tsai, J. (2008). Best practices in estimating interrater reliability. In J. Osborne (Ed.). Best practices in quantitative methods (pp.29-49). Thousand Oaks , CA : Sage publications.
Sternberg R. J. & Sternberg, K. (2010). The psychologist’s companion (5th ed.). New York : Cambridge University Press.
Storch, N. (2009). The impact of studying in a second language (L2) medium university on the development of L2 writing. Journal of Second Language Writing, 18, 103–118.
Storch, N. (2010). Critical feedback on written corrective feedback research. International Journal of English Studies, 10, 29–46.
Truscott, J. (1996). The case against grammar correction in L2 writing classes. Language Learning, 46, 327–369.
Truscott, J. (1999). The case for “the case for grammar correction in L2 writing classes”: A response to Ferris. Journal of Second Language Writing, 8, 111–122.
Truscott, J. (2004). Evidence and conjecture on the effects of correction: A response to Chandler . Journal of Second Language Writing, 13, 337–343.
Truscott, J. (2007). The effect of error correction on learners’ ability to write accurately. Journal of Second Language Writing, 16, 255–272.
Truscott, J. (2010). Further thoughts on Anthony Bruton’s critique of the correction debate. System, 38, 626–633.
van Beuningen, C. G. (2010). Corrective feedback in L2 writing: Theoretical perspectives, empirical insights, and future directions. International Journal of English Studies, 10, 1–27.
Watson, C. B. (1982).The use and abuse of models in the ESL writing class. TESOL Quarterly, 16, 5-14.
Wolfe-Quintero, K., Inagaki, S., & Kim, H. (1998). Second language development in writing: Measures of fluency, accuracy, and complexity. Manoa , HI : University of Hawaii at Manoa.
Xu, C. (2009). Overgeneralization from a narrow focus: A response to Ellis et al. (2008) and Bitchener (2008). Journal of Second Language Writing, 18, 270–275.
Authors:
K. James Hartshorn, PhD
Associate Coordinator
162 UPC
Tel. (801) 422-4034
Fax. (801) 422-0804
E-mail: james_hartshorn@byu.edu
Norman W. Evans, PhD
Associate Professor
Linguistics & English Language
4050 JFSB
[1] The error-free T-unit ratio is
calculated as the number of error-free T-units over the total number of T-units
(Wolfe-Quintero, Inagaki, & Kim, 1998).