Rubric-Scored Written Products

 

Purpose
Rubric scoring of students' written products enables the researcher to go from qualitative characterizations of student work to quantitative variables that can be statistically analyzed. This technique for assessing the effectiveness of a given instructional intervention lends itself to a wide range of analyses, allowing investigation of just about any aspect or category of student thinking and learning; moreover, it enables one to distinguish among varying levels of performance within each category, permitting more finely-detailed characterizations of students' products than simple "grading" often allows.

Description
At root, scoring rubrics are nothing more than explicit evaluation criteria for assessing the outcomes of a given pedagogy or technology. Rubrics consist of: (1) a set of categories – features or aspects of student work that are of interest to the analysis, such as "use of course concept x" or "degree of reflection"; and (2) hierarchical levels of performance within each category, such as "0 – course concept x not used," "1 – course concept x inappropriately used," "2 – course concept x appropriately used but not justified," and "3 – course concept x appropriately used and justified."

Scoring rubrics can be quantified in two ways: First, varying levels of performance can be assigned an ordered set of numbers, as illustrated in the previous example. Second, categories can be assigned weights so that more important categories account for a larger percentage of the sum total of each students' score.

Rubrics can be general (applicable to many products) or specific (tailored to a particular product); they can be analytic (enabling a detailed analysis of different categories of performance) or holistic (assigning one global category to one entire product).

General Requirements
Each category and each level of performance within the scoring rubric must be well defined. It is standard procedure to have a second coder assess a subset of the data (typically 10 – 15%) to insure that the scoring rubric is reliable. Here, an interrater agreement of 80% or better is normally expected. Make sure that each level category and/or level of performance is meaningful. If you find, for example, that you cannot systematically distinguish between performance that is "5 Excellent" and performance that is "4 Good," consider collapsing the codes together. Also, a portion of the data may simply be very difficult (if not impossible) to code – for example, when a statement is vague and you don't know what category it belongs in. In such instances, it may be better to forgo coding the data you are unsure of and simply report this with your findings (e.g., "We coded 80% of the entire data corpus; the remaining 80% was ambiguous and could not be coded.")

Limitations
Developing an adequate rubric for a given student product requires time and, often, multiple iterations of revision. Rubrics developed "top down" from a priori course goals or particular categories of interest to the researcher rather than "bottom up" from repeated generalizations of the data itself can miss important aspects of student performance. One compromise is to develop the scheme in both directions, beginning with particular categories of interest but adding or revising categories as you work with the data and discover other features of interest.

Example Research Studies

Additional Resources