FOSS Assessment Corner: Why Code Rather than Score?

Kathy Long, FOSS Assessment Coordinator, Lawrence Hall of Science
September 09, 2014 | Assessment

In past installments of the "FOSS Assessment Corner," I've provided an overview of the FOSS Assessment System (FOSS Newsletter, Fall 2013), a more detailed description of embedded and benchmark assessments, and a peek at FOSSmap, the computer program FOSS users can access to have students take assessments online and for teachers to run a number of reports that can help guide instruction as well as differentiate student learning needs (FOSS Newsletter, Spring 2014). In this issue I'll talk about some of the criteria we use to design items and describe student progress levels based on assessment responses.

How Are the Benchmark Assessments Designed?

All of the benchmark assessments for the FOSS Third Edition have the same basic item types. They usually include 8–10 multiple-choice items (mark the one best answer), multiple-answer items (mark all that apply), and short answer items. In addition, there are two to three open response items for which students need to write answers.

There are three levels of items on the tests. Level I items examine the pieces of knowledge students are acquiring and their command of academic vocabulary. Level II items look at how students are beginning to connect pieces of knowledge in order to demonstrate emerging conceptual understandings. Level III items require students to apply the science knowledge and practices that they have learned to answer questions or solve problems presented in new contexts.

Progress Level Chart for the FOSS Assessment System

Progress Level Chart for the FOSS Assessment System

Coding provides information about the quality of learning, rather than the number of correct answers.

The benchmark assessments are constructed to provide information about students' conceptual progress rather than simply show mastery over minimum competencies. This is an important distinction. There is a wide range of item difficulties included on the benchmark assessments. You should not expect most of your students to get 100% on the test. If this were the case, you wouldn't get much information about where students still need to work on understanding. Because of this design, you will need to adjust your evaluation criteria if you are going to use the benchmark assessments as a tool for giving grades. (See more about grades in a later question.)

Why Code Rather than Score?

Items on the benchmark assessments are coded, rather than scored. A coding guide, along with an answer sheet, is provided for each item in the Assessment chapter. We code rather than score in an effort to identify the level of the understanding exposed by a student's response rather than simply noting whether the answer was wrong or right, or adding up points for correct answers. Coding provides information about the quality (depth and flexibility) of learning, rather than the number of correct answers. There are four levels of codes (see the Progress Level Chart). By describing these levels we are providing information to teachers and students about the complexity of the thinking students are able to demonstrate and what they need to do to improve their responses (and therefore their understanding).

You may also have noticed that some of the item codes range from 0–2, others from 0–3, and still others from 0–4. The range of the codes for each question depends on the level of complexity of the question. For example, if the question is a Level II question (medium in complexity and requiring students to connect pieces of information), then the highest code possible for that question is a 3. If, on the other hand, the question is highly complex or requires that knowledge be applied to a new situation (Level III), then the highest code would be a 4. The Progress Level Chart for the FOSS Assessment System shows the framework we've used for classifying the questions and determining the range of the coding guides.

How Do I Give Grades if I Am Coding Rather than Scoring?

Because the coding structure as well as the item difficulty is somewhat different than what you would find on a typical test, you cannot simply add up points and give grades based on percentages. A better way to determine a grade would be to look at the frequency of the codes a student receives. (This is a simplistic way of looking at what FOSSmap does. FOSSmap statistically relates item difficulties with student performances to determine students' levels of progress.) If a student is getting 2s most of the time, then he or she is at the recognition level. That might translate to a B or C depending on grade level. If he or she is able to answer at the 3 and 4 levels on the questions that are more complex, then he or she would be at the conceptual or strategic levels and might warrant an A or B. As always, teacher judgment is an important part of the formula.

If you do need to use percentages, due to policy or practice at your school or district, then we recommend that you adjust the scale a bit. Instead of making 90% an A, use 80%. Again the tests have been constructed to help you assess progress, so that you can help all students advance, not just whether or not students have mastered the basic facts. That means that some of the items are more difficult and therefore we expect fewer students to get those high percentages. A thoughtful shift downward solves the problem.

If you have further questions about this aspect of the assessment system or would like to provide feedback, we're always happy hear from you. Please send your comments to