Monday, November 5, 2012

Reliability and Validity

Meyer et al. (2001) believed other than psychotherapy, little else is as important as assessment. Because testing is a vital part of the psychological professions, results must be reliable and valid to the same extent as that of medical tests (Meyer et al., 2001). Reliability and validity are the most fundamental attributes that give strength and value to an assessment. Although an assessment can be reliable without being valid, it cannot be valid if it is not reliable (Hogan, 2007). When considering a test's reliability and validity, it must be consistent, replicable, and dependable. Furthermore, the reliability and validity of assessments provides a level of usability for empirical investigation and practical application (Whiston, 2009).


Reliability refers to the replicability of a measurement and whether it is stable enough to provide similar results again, in the same individual (within a specified margin of error) (Little, n.d.). When determining the reliability of an assessment, a reliability coefficient of at least .80 indicates a trustworthy level of reliability (Little, n.d.). As described by Whiston (2009), the reliability of a test estimates how much of the variance in responses is a result of real variance and how much can be attributed to a systemic error (the inaccurate and/or inappropriate planning, implementation, or administration of the test). In essence, reliability measures the extent to which the results of a test are true and stable (Whiston, 2009).

Some of the ways reliability is tested is test-retest wherein the same test is given to the same individual twice with 2-4 weeks between the tests. Using the alternate of parallel forms to test reliability entails using two different forms of a test on the same individual, and then correlating the results. Measures of internal consistency divide the test into different sections and correlates the scores from the various portions. The goal is for each section to yield highly correlated results (Whitson, 2009).


Without validity, a test is pointless for its intended purpose. When determining validity, one must consider if the results are measuring what it was designed to measure. Rather than determining if a test is valid, it is essential to establish if a test is valid for a particular purpose (Whiston, 2009). Establishing appropriateness between the assessment and its results is essential so that counselors can make accurate inferences based on results. Validity lets counselors assume the results are true to a certain degree, if the test is used in the manner for which it was designed. Goodwin and Leech (2003) described the evolution of validity in psychometrics to a process of accumulating scientific support for the interpretations derived from the test results, not simply a validation of various aspects of the test. Contemporary standards make additional requirements; that the interpretations are valid for the test's intended purpose.

When choosing an assessment for use in practice, the counselor first determines reliability. Once reliability is established, the next step is to determine what the test measures and how well it measures it (Whiston, 2008). As an example of validity, if I want to determine a client's fluency in French, utilizing a Rorschach test would be inappropriate. The test is not a valid measurement of French fluency, and I could not make inferences based on the results. A Rorschach test may indeed be reliable, but it will never be a valid indication of French fluency.

Reliable, but not Valid; No Reliability, No Validity

A test can be reliable, but not valid. In other words, a test may yield similar results in the same individual, but although these results are reproducible, their reproducibility does not imply the test is valid. For example, consider a test designed to determine the presence and severity of anxiety in high school students. It is reliable, and has provided like results in the same students three different times. It is so reliable, in fact, that its reliability coefficient is .92. However, it became apparent the construct that the assessment was measuring was motivation in the students, rather than anxiety. So, although it was a highly reliable test, it did not measure the construct it set out to measure.

Another useable example regarding reliability and validity is if the statistics instructor set out to test his students' ability to accurately perform an ANOVA. Most of the questions on the test were regarding the history of Russia during the 18th century. The students produced similar results each time they took the test, so it was reliable; however, the test did not measure the students' ability to do an ANOVA. It measured an entirely different knowledge. Furthermore, an instrument that is not reliable cannot measure anything consistently, so if a test is not reliable, its validity is neither of consequence nor is it likely measureable (Whiston, 2009).


Goodwin, L. D., & Leech, N. L. (2003). The meaning of validity in the new Standards for Educational and Psychological Testing: Implications for measurement courses. Measurement and Evaluation in Counseling and Development, 36(3), 181-191.

Hogan, T. P. (2007). Psychological testing: a practical introduction (2nd ed.). Hoboken, NJ: John Wiley & Sons.

Little, S. G. (n.d.). Reliability and validity [PowerPoint slides]. Retrieved from

Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G. G., Moreland, K. L., Dies, R. R., ... Read, G. M. (2001). Psychological testing and psychological assessment: A review of evidence and issues. American Psychologist, 56(2), 128-165. doi: 10.1037/0003-066X.56.2.128

Whiston, S. C. (2009). Principles and applications of assessment in counseling (3rd ed.). Belmont, CA: Brooks/Cole, Cengage Learning

No comments:

Post a Comment