All test questions (called items) within Georgia assessments are carefully written, reviewed, and evaluated to ensure they are valid, reliable, and fair. The Standards for Educational and Psychological Measurement (2014) hold validity, reliability, and fairness as the most important considerations for judging assessment quality. These three criteria must be considered together because, for example, an assessment could be reliable, but be unfair to certain groups of students. Such a finding would weaken claims of assessment quality. As such, the GaDOE employs multiple steps to ensure all items are as fair to all students and all student groups as possible.
The GaDOE requires its test developers to utilize the Universal Design for Learning (UDL) framework, which emphasizes maximizing access and minimizing barriers, so all students have an opportunity to show what they know. This approach specifies criterion such as avoiding unnecessary or overly-complex wording and removing confusing graphics or other potential barriers to accessing the content of the assessment.
In addition to UDL, the GaDOE engages in an extensive item review process with committees of Georgia educators. During this review all potential reading passages, as well as test items for all content areas, are reviewed to ensure they are not offensive or culturally inappropriate for any student group. Items are also reviewed for alignment to the state-adopted content standards, cognitive complexity, and accessibility for all students.
Items that pass the item review process are then placed on an operational form, where they are administered to students in a field test capacity. Field test items do not contribute to student scores, but the field test data on the item is collected and used to further evaluate the quality of the item. After test items are field tested, additional reviews are conducted based on the field test item data. At this time, items are evaluated for what is called Differential Item Functioning (DIF). DIF is a way of evaluating how well assessment items perform for various subgroups of students compared to other subgroups of students that have been first matched on ability level. Assessment items that appear to disadvantage certain groups of students are flagged for another round of review by Georgia educators and may ultimately be removed from the item pool. Field test item data is also reviewed to ensure each item is reliably providing useful information about student performance. Once an item passes field test evaluation, it is included in the item pool and may be selected to be placed on an operational test form. Operational items contribute to student scores, and are further evaluated in the scoring process against technical quality criteria.
Items are field tested to allow for a thorough review of how well each item measures achievement before that item is used operationally. All items included on the Georgia Milestones assessment and GAA 2.0 undergo this rigorous review process, including review by
Georgia educators and field testing with
Georgia students. Remember – these test items are developed specifically for
Georgia assessments.
Items are field tested with Georgia students to:
- ensure the items measure achievement as intended with the population of students for which they were designed; and
- obtain information regarding student performance which is later used in scoring constructed-response items and constructing valid and reliable test forms.
Field test items are embedded within the overall test, and not identified as field test items, to ensure that student performance on the items is not impacted by students' knowledge that the item does not count towards their score.
All field test responses (for all item types) are scored, and student performance data is analyzed for various statistical properties: item difficulty, discrimination, and differential item functioning (DIF). Additionally, student responses for any constructed-response field test items are used to establish the scoring guidelines for all future administrations of those items.
Spotlight On: Question Types
Each of Georgia's assessments have different item types specifically designed to assess content knowledge in the same way content knowledge is taught. On the GAA 2.0, this involves task-based single-select items designed to maximize accessibility for students with the most significant cognitive disabilities; in Keenville this looks like innovative game-based interactions designed to engage early learners. On Georgia Milestones, in addition to multiple-choice items and items where students have the opportunity to write a response, technology-enhanced items allow students to demonstrate the depth of what they know and can do using approaches that mirror classroom instruction. These technology-enhanced items increase cognitive rigor from identification to application of student skills. For more information on item types on the Georgia Milestones, visit: Welcome to Experience Online Testing Georgia! (gaexperienceonline.com). Some examples from Georgia Milestones are below: