City of Boston Loses on Adverse Impact in Promotion Part 2

The case is Smith et. al. v. City of Boston, decided on November 16, 2015 by Judge William G. Young of the District of Massachusetts [2015 U.S. Dist. Lexis 154468]. The challenge was to a multiple-choice test for promotion from police sergeant to lieutenant. In part one on this case we reported that the plaintiffs successfully proved adverse impact of the test. They also won in the second phase of this case as the defendants were unable to prove that the test was content valid. Part 2 below discusses the reason why the test was deemed invalid.

Judge Young cited the Uniform Guidelines as the basis of what constitutes content validity. Although not mentioned by name, Judge Young’s analysis followed closely a five prong analysis initially established by the 2^nd Circuit in Guardians v. CSC (1980 [630 F.2d 79]. These five steps, which have been reported in several prior Alerts are as follows:

The test-makers must have conducted a suitable job analysis;
They must have used reasonable competence in constructing the test itself;
The content of the test must be related to the content of the job;
The content of the test must be representative of the content of the job; and
There must be a scoring system that usefully selects from among the applicants those who can better perform the job.

Judge Young’s ruling was based on failure of Steps 4 and 5.

Regarding Step 1, entailed job analyses were conducted in 1991 and 2000, and “mini-job analyses” performed prior to tests conducted in 2005 and 2008. Dr. Wiesen, the plaintiff expert, challenged the suitability of these analyses on several grounds, but Dr. Campion, the defense expert, testified that the lieutenant position had not changed much across the years. The mini-job analyses ultimately identified 145 critical knowledge, skills, and abilities (KSA) for the lieutenant position, 91 of which were skills and abilities.

Despite what were arguable errors in the mini-analyses, Judge Young deemed them suitable, ruling:

While acknowledging that the 2000 job analysis may have some errors and is not a model of perfection, the Court concludes that the robust job analyses performed in 1991 and 2000, and the mini-job analyses performed in 2005 and 2008, were adequate. As detailed above, for the 1991 job analysis, DPA identified important work behaviors by gathering information from various sources, created a list of tasks, asking police officers to rate the tasks, creating a list of potentially important KSAs, and asking SMEs to link KSAs to tasks. The same was true for the 2000 job analysis for which Morris & McDaniel, in part using the 1991 report, created a list of 302 possibly relevant tasks and KSAs, which SMEs rated. In 2005 and 2008, HRD asked SMEs to again rank tasks and KSAs that had been identified in the older reports.

Regarding Step 2, the exam was created by a well-known consulting company (EB Jacobs) and there was no question as to their competency in test construction.

Regarding Step 3, the test was deemed linked to a large number of knowledge areas, but only two skills and no abilities. The knowledge areas and skills were deemed to be job related.

However, regarding Step 4, Judge Young accepted Dr. Campion’s report that the knowledge areas were representative of the full range of knowledge required. However, Judge Young also accepted Dr. Wiesen’s argument that the test was not fully representative of the lieutenant position because of the failure to test any abilities.

The failure in Step 4 would alone be sufficient to render the test invalid. However, Judge Young found that there was no proof that the test would successfully predict job performance because there were not measures of reliability (internal consistency in this case). Dr. Campion defended this deficiency by pointing to reliability statistics in prior tests, which Judge Young did not accept.

The moral of Judge Young’s ruling is plain; make sure that all KSAs are adequately represented on a test, and present proof that the test is at least internally consistent (e.g., using Cohen Alpha statistics).

By Art Gutman, Ph.D., Professor, Florida Institute of Technology