3(2) p. 6

WRITER'S WORKBENCH Analysis of Holistically Scored Essays

Stephen Reid and Gilbert Findlay

1981, Colorado State University, through the efforts of Kathleen Kiefer and Charles Smith, began to adapt Bell Laboratory's WRITER'S WORKBENCH (WWB) programs for use in college composition classes (Kiefer and Smith, 1983, 1984; Smith and Kiefer, 1983). After using the Bell Laboratory's WRITER'S WORKBENCH in composition courses for three years with over 6,000 Colorado State University students, we wanted to determine to what degree the fifteen WWB programs--including the relatively sophisticated STYLE program--could measure essay quality. The only research to date correlating the WWB programs to essay quality is in an unpublished report comparing the WWB STYLE program to holistically scored essays (King and Spring, 1982). Because CSU annually places 3,000 students into composition courses based on holistically scored essays, we decided to run all the WWB programs on a selected sample of

p. 7

those placement essays and to perform statistical analyses on nine of the programs that were easily quantifiable (see Appendix A). Our overall purpose was to determine to what degree, if at all, the WWB analyses correlated with essay quality, as measured by our graders' holistic scores.

Our analysis of correlation had two specific objectives, both related to the integration of WWB in CSU's composition program. First, understanding which of the WWB stylistic measurements showed significant correlation with essay quality would enable us to advise students how to use the WWB output more effectively. Currently, teachers tend to emphasize one or another of the WWB programs, depending primarily on their own stylistic preferences. Significant correlation of essays' quality with certain WWB analyses but not with others would suggest which of the WWB programs might prompt students to make effective revisions in an essay draft. The WWB analyses, of course, cannot understand essay content and, thus, cannot judge overall value, but stylistic deficiencies or excesses often point to or correlate with problems in development, coherence, or clarity. Second, the standard deviations for each style measurement would help to revise the CSU/ WWB ranges and limits that were originally developed to alert students to necessary revisions. These limits establish, for example, an "acceptable" percentage of passive voice or an appropriate range of readability scores for writers in a college composition course.

p. 8


Initially we selected holistically scored essays from the 1982 Colorado State University Composition Placement Examination given to every entering first-year student. The holistic scoring procedures, adapted from the format used by the Educational Testing Service for scoring the Advanced Placement Test, followed standard guidelines to insure maximum reliability (Cooper, 1977; Charney, 1984). These essays were on a fixed topic designed to assess the maturity of argumentative prose. Students were asked to respond to a controversial argument about competition in American life, an essay written especially for the placement examination under the pseudonym Dudley Erskine Devlin. Instructions required students to summarize the main ideas of the essay and then explain why they agreed or disagreed with the essay. The summary-response essays were scored on a 1-9 point scale (see Appendix B for scoring guide). The scoring guide indicated the criteria for the holistic scoring sessions: complete summary, clear focus of agreement or disagreement, and supporting details, facts, and examples. Although stylistic matters such as spelling, punctuation, and diction influenced a reader's overall judgment, they were not the only basis for the reader's score. The readers, high school and college composition teachers with previous experience in holistic scoring, were trained to apply the grading guide and restandardized with sample or anchor essays every hour during the reading sessions. The holistic scores represented, then, an evaluation of overall writing quality and, thus, provided a

p. 9

standard of quality to compare with the WWB's numerical evaluations. Because we needed samples with reliable scores, we selected 44 essays, distributed evenly across the 1 to 9 point range, that had been read by a minimum of three readers. To insure maximum possible reliability, each selected essay had received three independent scores varying by no more than one point (e.g., 5, 5, 4).

After selecting the essays, we typed them into the WRITER'S WORKBENCH, ran all the WWB programs on every essay, and organized the data for SPSS analysis. Because our overall purpose was to determine which stylistic measurements were most predictive of essay quality, we did simple correlations of holistic scores with 27 style measurements (i.e., sentence length, readability, vague words, spelling, percentage of pronouns, nouns, adjectives, and so forth) taken from nine of the fifteen WWB programs. The statistical calculations were based on a grouping of holistic scores into three categories: high-range essays (7-9), middle-range essays (4-6), and low-range essays (1-3).

The resulting statistical data must be taken, however, with several caveats. Although our essays were specifically selected because of their high inter-reader reliability, holistic scores, even when carefully designed and executed, carry an inter-reader reliability of .90 (Cooper, 1977). Second, the number of samples for this test (44) is adequate, but a larger sample, perhaps taken from a wider range of writer abilities, would improve the accuracy of the statistical analyses. In order to help verify the initial conclusions presented in this

p. 10

study, we are currently replicating this research with another group of placement essays. Third, the results obtained here are, of course, specific to this particular impromptu essay sample. Essays written on other topics, for other rhetorical purposes, or by another population of writers would likely alter the conclusions pointed to in this study. Finally, the WRITER'S WORKBENCH programs themselves do not identify all style measurements with complete accuracy. The parts of speech program, for example, is approximately 83% accurate; passive voice identification is 85% accurate (Cherry, 1983).


Results of the statistical analyses appear in Tables 1 and 2. Table 1 outlines the statistical significance of 27 of the WWB stylistic measurements. Nine of the stylistic measurements showed statistical correlation with essay quality. Table 2 indicates how CSU's WWB limits and ranges might be revised, based on the standard deviations for each stylistic measurement.

Significance of Stylistic Measurements

The statistical correlation of the WWB stylistic measurements with holistic scores (Table 1) showed nine variables with significance (P<.05). However, several stylistic measurements representing syntactic maturity that we had expected to show correlation, such as percentage of complex sentences or conjunctions, did not show

p. 11

Table 1

Significance of WWB Stylistic Measurements*

Measurement r2 r Significance F Prob ----------- ---- -------------- ------ Essay length .4980 .00001 .0002 Spelling (SPELL) .3270 .00002 .0019 Kincaid (readability) .3270 .00003 .0001 Avg. word length .2700 .00015 .0007 % Content words .1520 .00430 .0450 Avg. sentence length .1450 .00520 .0060 % Long sentences .1270 .00850 .0070 % Pronouns .1250 .00910 .0270 % Short sentences .0950 .02000 .0990 % Abstract wds. (ABSTRACT) .0800 .03000 .0480 % To be verbs (FINDBE) .0580 .05600 .0180 % Nouns .0570 .05700 .2210 % Nominalizations (NOM.) .0550 .06100 .2500 % Vague words (VAGUENESS) .0450 .08200 .5830 % Adjectives .0300 .13000 .2800 % Conjunctions .0270 .14200 .3930 % Compound/complex sent. .0200 .17600 .1820 Type token .0140 .22000 .8370 % Passive voice (PASSIVE) .0090 .26000 .1420 % Simp1e sentences .0080 .27000 .3680 % Adverbs .0040 .33000 .8830 % Complex sentences .0020 .36200 .7910 Hapax Legomena .0020 .36900 .3870 % Diction (DICTION) .0010 .40600 .9330 % Compound sentences .0010 .40800 .9520 % Prepositions .0001 .46700 .9840 % Subject openers .0001 .46500 .9870 *Unless otherwise specified in parentheses, measurements are from the WWB STYLE program.

p. 12

significant correlation with the holistic scores.

The factor with the highest correlation (r2 = .498) to the quality of writing was essay length (calculated in the STYLE program). Because the placement examination essays were written on a controlled topic and with a time limit (60 minutes), we interpret this unsurprisingly high correlation to mean that the longer essays correlate significantly with quality writing because they demonstrate development within paragraphs, structural completeness, and scribal fluency (the skill of keeping the pen on the page, keeping the flow of prose going). Though the longest essay in the sample, 877 words, was also rated best, it is not length itself that creates quality, of course. Eight hundred seventy-seven nonsense words would be the reductio ad absurdum. But without the development, organizational completeness, and fluency that correlate with the longer essays, there would be slight probability of achieving quality.

The second highest correlation (r2 = .327) was with spelling (calculated from the SPELL program). We devised a misspelling factor by calculating the number of misspelled words (discounting repetition of the same error) as a percentage of the total words in the essay. The holistic scorers were specifically asked to temper the negative influence of poor spelling on their evaluations, unless the essays had "serious and repeated" problems with literacy (see Appendix B, scoring guide). The correlation confirms this. The high-range essays (7-9) show insignificant problems with spelling. The low-range essays (1-3), as a group, had significant weakness in spelling;

p. 13

though in individual essays, spelling was not necessarily the cause of low quality. These results suggest that spelling may be representative of other mechanical weaknesses, many of which (such as comma splices, subject-verb agreement, or sentence fragments) the WRITER'S WORKBENCH cannot recognize. Thus, rather than a superficial predictor of quality ratings (Stewart & Grobe, 1979; Grobe, 1981), a low spelling-error factor may be an important index of mechanical skill in composition.

The third most significant correlation (r2 = .327) occurred with the Kincaid readability grade (STYLE program). This readability grade is calculated from a formula combining and weighing average sentence length and average syllables per word. Although other WWB research supports the high correlation of readability scores with essay quality (King and Spring, 1982), we were surprised by the high correlation, because readability is not a direct measurement of comprehensibility (Schwartz, 1983). We expected little prediction from this factor simply because it is so mechanical, the exact opposite of content, coherence, and support, which are the focus of holistic scoring. The implications of this high positive correlation are that quality impromptu prose, as measured by our sample, does indeed have longer sentences, possibly with either more development and/or more subordination, and those sentences are made up of longer or more difficult words, suggesting a more mature vocabulary. We interpret this to show that although content overrides style in holistic scoring, measurements of sentence length and word length, (as contained in Kincaid) are indicative of the

p. 14

relationship between quantitative style measurements and content. The Kincaid data were supported by the analysis of variance of average sentence length and word length (potentially redundant factors because they are computed in the Kincaid formula). Average sentence length (STYLE program) is significant (r2 = .145) and shows an increase from 17.9 words per sentence in the low-range group here and a to 22.8 words per sentence in the high-range group (Table 2).

Average word length (STYLE program), which had the fourth highest correlation (r2 = .270), was our most surprising result. We had anticipated that the two most interesting indicators of vocabulary would be type token (a ratio of total vocabulary tokens to total words) and hapax logomena (the ratio of single word uses to total words), because they would show the command and extent of a significant vocabulary (Finn, 1977; Grope, 1981). However, neither of these factors proved statistically significant. Instead of essay quality being highly correlated with these two indicators, which would reveal a broad lexicon and identify quality word choices, we discovered a significant correlation of increased word length with essay quality (F Prob = .0007). As Table 2 shows, word length had a low-group mean of 4.3 letters per word and a high-group mean of 4.7. Though this is a minute difference to a reader (0.4 of a letter per word), it is nearly a ten-percent difference in word length. The implication is that, in these impromptu essays, the overall weight of longer words, indicating a mature lexicon, increases essay quality. This assumption is supported by the similar pattern for content words (STYLE program). The low mean

p. 15

Table 2 WWB Stylistic Measurements: Limits and Ranges
High Mean High Mean WWB Limits & Measurement Low Mean High Mean Std. Dev. Limits +/- Suggestions ----------- -------- --------- ---------- ---------- ------------ Essay length* 345 589 122 711 467 N/A Spelling* 1.80 0.50 0.20 7.20 3.00 Kincaid* 8.00 12.20 2.10 14.30 10.10 13>x>9.5 Av. Wd. length* 4.30 4.70 .30 5.03 4.43 % Content wds.* 52.30 55.60 3.10 58.70 52.50 Sent. length* 17.90 22.80 3.10 25.90 19.70 x>15 % Long sent.* 10.60 15.70 3.90 19.60 11.80 x>15 % Pronouns* 9.70 7.20 2.60 9.80 4.60 x>3% % Abstract* 2.10 2.90 .67 3.57 2.23 x<2% % Short sent. 24.80 32.00 5.60 37.60 26.40 x>15 % To be verbs 40.90 37.90 10.00 47.90 27.90 x<20% % Nouns 23.90 25.90 2.50 28.40 23.40 % Nominalization 2.60 3.40 1.20 4.60 2.20 x>2% % Vague words 7.20 6.20 2.50 8.80 3.69 x<3% % Adjectives 12.50 13.90 2.20 16.10 11.70 %,Conjunctions 4.00 4.20 1.30 5.80 3.20 x>2% % Comp.cmpx.sent. 16.90 21.70 8.30 30.00 13.40 Type token 48.50 47.40 5.00 52.40 42.40 % Passive 7.80 9.10 5.00 14.10 4.10 x<5%## % Simple sent. 33.30 26.50 14.10 40.60 12.40 x<50# % Adverbs 6.00 6.04 1.00 7.04 5.04 x>2% % Complex sent. 40.30 43.70 15.00 58.70 28.70 x<50# Hapax Lgmna. 2.50 34.40 5.40 39.80 29.00 % Diction 1.70 1.80 .67 2.47 1.13 % Compound sent. 9.10 8.20 6.90 15.10 1.30 % Prepositions .85 9.82 1.80 11.60 8.00 % Subj. open 4.60 64.50 12.30 76.80 52.20x<75% *Significant (P<.05). Factors listed in order of significance. #Sentence variety formula: %simple sentences - %complex sentences - x; -46.2<x<11.2 (WWB suggest -40<x<10) ##Bell Labs suggests x<22% for scientific writing
p. 16

of such words (nouns, adjectives, non-auxiliary verbs, and adverbs) is 52.3%, and the high mean is 55.6%. Taken together, the average word-length descriptor and the percentage of content words (r2 = .152) indicate lexical quality. These factors are almost impossible for a holistic scorer to recognize, though they may be approximated by noting unusual or precise diction choices.

The long sentence descriptor from the STYLE program measures the percentage of sentences ten words longer than the average in each essay, and the short sentence descriptor measures the percentage of sentences five words shorter than the average. The former was a statistically significant factor (r = .0085), but the latter fell short of statistical significance (r = .020, but F = .099). We assumed that these descriptors would reveal that increases in sentence variety positively influence holistic scorers (Nold & Freedman, 1977). The variance analysis confirms this for short sentences, which show a steady increase in the percentage of short sentences from low group to high group.

For the long sentence factor, the high group had a much higher percentage (15.7% compared to a 11.15% mean for the whole sample). However, the low group had a higher mean than the middle group. This anomaly is attributable to the number of comma splices in the low essays, which a holistic reader would note, but which are measured simply as long sentences by the WWBSTYLE program. We anticipate that a slightly more sophisticated style-analysis program--such as EPISTLE (Neuwirth, 1984)--would yield more accurate measurements for the long and short sentence descriptors.

p. 17

The next statistically significant factor in the WRITER'S WORKBENCH STYLE output was percentage of pronouns (r2 = .125). Our expectation was that, among parts of speech, pronouns and prepositions would correlate positively with holistically judged quality--pronouns because of efficient avoidance of unimaginative repetition and more mature transitional connections, and prepositions because of increased potential for development through multiple prepositional phrases. However, the pronoun correlation was negative; that is, the higher the percentage of pronouns, the lower a probability of quality writing. In addition, the scattergram showed such variety of pronoun use within each group that no conclusion could be drawn. Moreover, the correlation of prepositions to the holistic score was the lowest of all the 27 factors tested (r2 = .0001).

Percentage of abstract words (STYLE program) was the last significant WWB factor (r2 = .080). The analysis of variance shows a curious set of means. The low mean is 2.16%; both the middle and the high mean are 2.92%. Though the CSU modification to the WWB limits warns the student writer that any abstract percentage over 2% is too high, our data suggest that the better student writers' percentages fall within a range of 3.57% to 2.23% (Table 2). Our tentative conclusion is that these middle- and upper-range writers can manipulate abstractions better than can the more basic writers. Conversely, the vague word percentage (from the VAGUENESS INDEX though not statistically significant, is highest for the low group and lowest for the high group. As the vagueness percentage drops,

p. 18

the abstract percentage rises. This may suggest that the quality writer can address liberty and justice rather than "all that sort of stuff."

The other factors tested did not correlate significantly with the holistic scores. From the STYLE program, the predictive ability of several key measurements proved disappointing. For instance, we assumed that a quality writer would be a sophisticated thinker, and sophisticated thinking in writing would manifest itself in the STYLE analysis as a high percentage of compound/complex sentences. Compound/complex sentences also might be likely to contain coherence words or transitional markers that, if used wisely, would increase overall coherence (Sloan, 1984). Our study shows, however, that a high percentage of these sentences do not correlate with the top third of the holistically scored essays. It does not, of course, disprove our hypothesis connecting sophisticated thinking to subordination in sentence. The data merely suggest that in these impromptu essays, no correlation exists between the mere number of these sentences and quality prose. In addition, several parts of speech measured in the STYLE program (conjunctions, adverbs, nouns, and adjectives) showed no significant correlation. Among structural elements calculated in the STYLE program, subject openers were the least predictive of all 27 factors, though King and Spring (1982) found statistically significant correlation with preposition and expletive openers.

Similarly, correlations from the other WWB programs did not meet our expectations, but in some cases the data invited us to reconsider some conventional assumptions. For example,

p. 19

there was little correspondence between the percentage of DICTION program "hits" (matches with infelicitous word choices) and low essay quality. The upper-range essays, in other words, did not have significantly fewer occurrences of words such as "aspect," "very," or "situation.". Results from the NOMINALIZATION program (percentage of nouns ending in "-ance," "-ence," "-ion," or "-ment") showed that the high essay group used a higher percentage of nominalizations than did the low group, but the range within each group negated the productive quality of this factor.

Passive voice (from the PASSIVE program), though not statistically significant as a predictor, also proved interesting. The mean percentages, from low to high, were 7.8, 5.6, and 9.2, though the standard deviation for the high group was 5.0 (Table 2). The implication is that, contrary to conventional wisdom, higher quality impromptu prose, as measured by holistic scorers, uses more, not fewer, passive constructions than does lower-quality prose.

Finally, the "to be" verb factor (from FINDBE) was the most disappointing of all the data we generated. Nold and Freeman (1977) found that a high percentage of auxiliary "be's" and "have's" weakened essays. Our data, however, had too much variance to correlate with essay quality--the percentages in the high group range from 55% to 22%. The tentative conclusion from this group of STYLE programs that show poor correlation seems to be that some anathemas of composition teachers--poor diction, nominalization, passive voice, and "to be" verbs--may not influence holistic scorers' judgement as significantly as we might think. Certainly,

p. 20 these results need to be replicated before we can draw any firm conclusions, but the suggestions are important enough, we believe, to warrant additional research in this area.


The calculations in Table 2 are most useful in order to modify the limits and ranges that determine the PROSE program's advice to students. The CSU/ WWB PROSE program, for example, prints out all of a student's uses of passive voice and advises him or her to reduce passive constructions (Appendix A). The current CSU/ WWB limits and ranges used by the PROSE program were determined by standards derived from the best essays written for our first-year composition course (Kiefer and Smith, 1984). In our case, we wished to use the placement examination data primarily to fine tune the CSU/ WWB limits and ranges and to re-examine, where appropriate, the advisability of setting any limits for variables that do not correlate with essay quality.

Comparing figures for standard deviations in Table 2 with the current CSU/ WWB limits and suggestions illustrates how these limits might be revised, if we wished to use placement essays in addition to revised essays to assist in WWB standards. In some cases, limits probably should correspond to these or the current high and low standard deviations, as in the Kincaid recommendation; in other cases, teachers may agree that, even though the standard deviation for "to be" verbs in the better first-year essays varies from 47.9 to 27.9, should, by fiat, use fewer than 20% "to

p. 21

be" verbs. Certainly, we should not allow students' "lazy" tendencies to become enshrined as models to imitate; however, our beliefs about what constitutes good writing should be tempered by repeated statistical measurement.

The most important question to ask, in light of the data in Table 1 and Table 2, is "Should we set limits at all for stylistic measurements that do not correlate with essay quality?" For example, percentage of passive voice in these impromptu essays does not correlate significantly with essay quality despite the prevailing instructional belief that active voice constructions are preferable. Here too, further research is needed to investigate how the frequency of passive voice in defined rhetorical contexts affects holistic scorers. Teachers using WWB in a specific course need to decide whether WWB should suggest an acceptable limit for passives (currently <5%) or merely print out a list of passives for students to recognize and change if appropriate. The same is true for nominalizations. Should, in fact, the WWB PROSE program advise students to reduce nominalizations to fewer than 2% if nominalizations do not correlate significantly with essay quality? We may be misleading students when we say that stylistic niceties such as active voice are important if, in our grading (or holistic scoring), we weigh such stylistic features far less than content or organization.



The overall purpose of this study was to determine which WWB stylistic measurements correlate with essay quality in order for

p. 22

students and teachers to use quantitative analyses such as the WWB programs more effectively as a guide for revision of first-draft prose. To assist effectively in a student's revision process, the WWB analyses, we assume, must meet two conditions: the quantitative measurements must have significant correlation with quality, and those analyses must lead to direct and practical improvements in the essay draft.

The results in Table 1 show that only nine of the stylistic measurements do, in fact, correlate with impromptu essay quality--and certainly not all the style factors we expected. The relatively simple quantitative measurements of word and sentence length, readability, spelling, and abstract words proved significant while the measurements of the more sophisticated style elements, such as sentence structure and parts of speech (except pronouns), did not prove statistically significant. These results give only an initial indication of which of the WWB outputs teachers should emphasize and which, possibly, they should ignore. For the moment, however, WWB users should recognize that some of the measurements of the STYLE program do not, on the basis of this small sample of impromptu placement essays, correlate significantly with essay quality.

In order to meet the second condition, that the WWB analysis should lead to direct improvements in a revision, even the measurements that do correlate with essay quality need to be used with some caution. Research that studies the relationship of revised essays to holistic scores should provide more accurate guidance in this area, but our

p. 23

initial findings suggest some starting points. Reducing the percentage of abstract words and spelling errors can be accomplished directly and with measurable effect on essay quality; however, students should not expect that direct changes in word length, sentence length, or readability will improve an essay. The correlation of average word length to quality, for instance, cannot be directly addressed in revision. Substitution of lexical units for the mere sake of length would change the data but not, necessarily, the quality of the revised draft. Perhaps indirectly, through active experience in reading and practice in writing, which increase active vocabulary, a student could meaningfully raise his or her average word length from the low mean (4.3) to the high limit of 5.0. The fact that these WWB measurements can be improved only indirectly suggests the overall importance of scribal fluency. Because sentence length, word length, and readability are all measurements of scribal fluency, the results of this study seem to suggest that more classroom time should be spent on improving scribal behavior than on practicing those discrete grammatical and stylistic elements that do not correlate significantly with essay quality.

The revisions in limits and suggestions indicated by the "high mean limits" column of Table 2 should also help teachers and students use the WWB output more effectively. For the teacher, these data are most useful as a "description" of style and give quantitative support to the correlation of elements such as readability, sentence length, and percentage of long and short sentences with essay quality.

p. 24

For the student, the direct use of such data in a revision is more difficult. If a student is within the suggested readability range, for example, he or she will feel satisfied, but what should the student do if the readability level is "too high" or "too low"? As with word length, he or she certainly should not attempt to change the readability score directly (as many of our students have tried--with some astonishing results). These measurements are effective primarily as a evaluative tool. If, for example, students have worked on contextualized sentence combining or vocabulary exercises, the results may appear as higher quantitative scores on their next essay. In short, those style measurements that correlate with essay quality and can be revised directly should be used by teachers and students differently than those measurements (sentence and word length, readability) that are primarily evaluative and can be improved only indirectly.

Overall, the results of this study indicate which quantifiable stylistic measurements do, on these impromptu placement essays, correlate with holistic score, and the degree to which these countable style elements influence trained scorers of essays. At the beginning of CSU's WWB experiment, we used nearly all of the WWB analyses, merely because these programs could perform some complex analysis. As a result of this study, we are now attempting to select, as Bridwell and Ross (1984) suggest, not just which of the WWB programs we can use but those programs we should use. Correlation of WWB analyses with essay quality is an important tool to help teachers modify WWB and other style analysis programs for most efficient and valid

p. 25 classroom use. Without evaluation and interpretation of its analyses, the WRITER'S WORKBENCH or similar quantitative analyses may risk misdirecting or misadvising the student writer.


Below is a list of all fifteen WRITER'S WORKBENCH programs; an * indicates that the program was used to produce the statistical correlations in this report. Description of programs are taken from Kiefer and Smith (1984).

ORGANIZATION prints the first and last sentence of each paragraph to give the writer an outline of an essay. DEVELOPMENT counts words in each paragraph and compares those figures with averages drawn from sample papers. *FINDBE capitalizes and underlines all forms of "to be." *DICTION highlights any of about 500 wordy, overused, misused, sexist, and inflated words and phrases. SUGGEST follows the text with possible substitutions for words and phrases highlighted by DICTION.

p. 26

*VAGUENESS INDEX flags any of 140 vague or general words. *SPELL lists typographical and spelling errors. CHECK lists commonly confused homophones and word pairs when the writer has used one of the words in the text. PUNCTUATION checks for missing parentheses and for patterns of punctuation, such as periods followed by capital letters. GRAMMAR lists most split infinitives and misuses of "a" and "an." *PROSE compares values for ten stylistic criteria in a student's paper with standards derived from the best papers written for that course. When the student's value falls outside a range of +/- one standard deviation from the mean, the program suggests improvements (see Table 2). *PASSIVE prints out all passive sentences in a student's text. *NOMINALIZATION prints out all sentences with nominalized words (nouns ending in "-ance," "-ence," "-ion," "-ment"). *STYLE summarizes information about sentence length, type and sentence opening, and word class counts.

p. 27

*ABSTRACT compares words in a text to a dictionary of 314 abstract words (determined from psycholinguistic research).
9-8 The upper-range responses satisfy the following criteria: a. Summary--The summary should identify Devlin's thesis (the destructiveness of competition) and note in some detail the three areas of support (school, sports, and social life). b. Focus of agreement and/or disagreement--Agr/disagr may be complete or partial, but the writer must make clear what he/she is agreeing/disagreeing with. Specifically, upper-range papers must deal with the three areas which Devlin discusses--not just with competition generally. c. Support for agreement and/or disagreement--Support should provide relevant and concrete examples from the writer's experience/general knowledge and/or an analysis of Devlin's argument. d. Style and coherence--These papers contain few repeated errors in grammar and mechanics. They demonstrate clear
p. 28

style, overall organization, and consecutiveness of thought. 7 This grade should be used for papers which fulfill the basic requirements for the 9-8 grade but are thinner. 6-5 Middle-range papers omit or are deficient in one of these four criteria: a. Summary--Summary absent or contains only sketchy reference to the article. OR b. Focus of agreement/disagreement--What the writer is agreeing/disagreeing with is not clear or is not related to Devlin's main argument. OR c. Support--Writer's examples are not distinguishable from examples given in the article, or writer's analysis is only counterassertion. OR d. Style and coherence--These papers are loosely organized or contain noticeable errors in grammar and mechanics. 4 This grade should be used for papers which fulfill the basic requirements for the 6-5 grade but are slightly weaker. 3-2 Lower-range papers are deficient in two or more of the criteria. Typically, these papers weakly paraphrase Devlin's essay, or they have serious organization/coherence
p. 29

problems, OR they have serious, repeated errors in grammar and mechanics. 1 This grade should be given to a paper which has almost no redeeming qualities. Note: Essays written in relatively fluent, stylistic prose may be scored one point higher than the guide would normally permit.
Bridwell, L. & Ross, D. (1984). Integrating computers into a writing curriculum; Or, buying, begging, and building. In W. Wresch (Ed.), The computer in composition instruction: A writer's tool. Urbana, IL: National Council of Teachers of English.

Brossell, G. (1983). Rhetorical specification in essay examination topics. College English, 45, 165-173.

Charney, D. (1984). The validity of using holistic scoring to evaluate writing: A critical overview. Research in the Teaching of English, 18, 65-81.

Cherry, L. L. (1983). A study of part's and style's performance on a real student text. Unpublished paper, Murray Hill, N. J., AT&T Bell Laboratories.

p. 30

Cherry, L. L., & Vesterman, W. (1981). Writing tools--The STYLE and diction programs. Computing Science Technical Report #91, Murray Hill, NJ, Bell Laboratories.

Cooper, C. (1977). Holistic evaluation of writing. In C. R. Cooper and L. Odell (Eds.), Evaluating writing: describing, measuring judging. Urbana, IL: National Council of Teachers of English.

Finn, P. J. (1977). Computer-aided description of mature word choices in writing. In C. R. Cooper and L. Odell (Eds.), Evaluating writing: describing, measuring, judging. Urbana, IL: National Council of Teachers of English.

Grobe, C. (1981). Syntactic maturity, mechanics, and vocabulary as predictors of quality ratings. Research in the Teaching ofEnglish, 15, 75-85.

Kiefer, K., & Smith, C. (1983). Textual analysis with computers: Tests of Bell Laboratories' computer software. Research in the Teaching of English, 17, 201-214.

Kiefer, K., & Smith, C. (1984). Improving students' revision and editing: The WRITER'S WORKBENCH. In W. Wresch (Ed.), The computer in composition instruction: A writer's tool. Urbana, IL: National Council of Teachers of English.

p. 31

King, G. W., & Spring, C. (1982). Computer-guided revision of prose. Unpublished manuscript, University of California, Davis. Basic Skills Research Program.

Neuwirth, C., Kaufer, D., & Geisler, C. (1984). What is EPISTLE? Computers and Composition, 1(4), 1-2.

Nold, E., & Freedman, S. (1977). An analysis of readers' responses to essays. Research in the Teaching of English, 11, 164-174.

Petersen, B., Selfe, C., & Wahlstrom, B. (1984). Computer-assisted instruction and the writing process: Questions for research and evaluation. College Composition and Communication, 35, 98-102.

Schwartz, H. (1982). Monsters and mentors: Computer applications for humanistic education. College English, 44, 141-152.

Schwartz, H. (1984). Teaching writing with computer aids. College English, 46, 239-247.

Sloan, G. (1984). The frequency of transitional markers in discursive prose. College English, 46, 158-179.

Smith, C., & Kiefer, K. (1983). Using the WRITER'S WORKBENCH programs at Colorado State University. In S. K. Burton & D. D. Short (Eds.), Sixth international conference on computers and the humanities. Rockville, MD: Computer Science Press.

p. 32

Stewart, M., & Grobe, C. (1979). Syntactic maturity, mechanics of writing, and teachers' quality ratings. Research in the Teaching of English, 13, 207-215.

Wresch, W. (1982). Computers in English class: Finally beyond grammar and spelling drills. College English, 44, 483-490.

Wresch, W. (1983). Computers and composition instruction: An update. College English, 45, 794-799.

Stephen Reid and Gilbert Findlay teach at Colorado State University, Fort Collins, Colorado