United States V. City of New York: A Replay With a Questionable Outcome

by Art Gutman Ph.D., Professor, Florida Institute of Technology

In US v. City of New York (2009) [F. Supp. 2d 419], Judge Nicholas G. Garufis of the District Court for the Eastern District of New York evaluated an entry-level firefighter test based on five criteria for content validity established by the 2nd Circuit in Guardians v. Civil Service (1980) [630 F.2d 79] and found the test invalid based on all five criteria. These criteria are:

(1) suitable job analysis
(2) reasonable competence in test construction
(3) test content related to job content
(4) test content representative of job content
(5) scoring systems selecting applicants that are better job performers.

This ruling captured the attention of a wide audience because it was decided shortly after the Supreme Court’s ruling in Ricci v. DeStefano (2009), which Judge Garaufis found not applicable to the case. Indeed, Judge Garaufis wrote:

I reference Ricci not because the Supreme Court’s ruling controls the outcome in this case; to the contrary, I mention Ricci precisely to point out that it does not. In Ricci, the City of New Haven had set aside the results of a promotional examination, and the Supreme Court confronted the narrow issue of whether New Haven could defend a violation of Title VII’s disparate treatment provision by asserting that its challenged employment action was an attempt to comply with Title VII’s disparate impact provision.… In contrast, this case presents the entirely separate question of whether Plaintiffs have shown that the City’s use of Exams 7029 and 2043 has actually had a disparate impact upon black and Hispanic applicants for positions as entry-level firefighters. Ricci did not confront that issue

Based on the judge’s ruling, the fire department went back to the drawing board and developed a new test. On August 4, 2010, Judge Garufis ruled that the first criterion (suitable job analysis) was satisfied, but the other four criteria were not [see 2010 U.S. Dist. LEXIS 78641]. Two experts, one internal and one external, conducted an extensive job analysis, and based on the results, created a new exam written by incumbent firefighter subject matter experts (SMEs) under the direction of the internal expert. The main problems, as cited by the judge, were that the SMEs were not experts at test construction, and the final test, was deemed lacking by independent firefighter SMEs who examined the final product.

There are two other issues worth noting. First, the alpha coefficient for internal consistency for the entire test exceeded .90, but as depicted in this table, 12 of the 18 components had coefficients of less than .50. However, one could argue it is difficult to obtain high alphas when the number of items is low, and that the one component with a substantial number of items (perceptual speed) had an alpha coefficient of .95.

Second, and perhaps more importantly, Judge Garaufis invoked Section 1607.14C(1) of the Uniform Guidelines, which states:

A selection procedure based upon inferences about mental processes cannot be supported solely or primarily on the basis of content validity. Thus, a content strategy is not appropriate for demonstrating the validity of selection procedures which purport to measure traits or constructs, such as intelligence, aptitude, personality, commonsense, judgment, leadership, and spatial ability.

Based on this Guideline, the Judge concluded:

Here, it is clear that Exam 6019 seeks to measure abstract, unobservable mental constructs such as Flexibility of Closure, Speed of Closure, and Problem Sensitivity, as well as personality characteristics like Integrity, Adaptability, Tenacity, Work Standards, and Resilience. These are precisely the sort of "traits or constructs" that render an exam unfit for content-based validation. The City's exclusive reliance on a content validation approach necessarily means that it has failed to demonstrate that the content of Exam 6019 is related to the content of the job of entry-level firefighter.

This part of the ruling is strange (at least to me). The guidance in Section 1607.14C(1) was actually struck down in Guardians v. CSC (1980) itself, and in virtually all subsequent cases employing a content valid strategy for defending a test.

Don’t get me wrong --- I think the judge hit on two critical issues: failure to use expert test writers and failure to find agreement by independent SMEs. However, I think the manner in which he assessed reliability and incorporated ancient guidelines that have been struck down in many courts is questionable.