Choose Your Own Comparator: The Importance of Race Coding Strategy in Compensation Equity Analyses

In a prior blog post, we indicated that OFCCP has taken a position that the specific strategy for dummy coding races in a regression analysis does not matter. Specifically, they have argued that the group that one chooses as the referent group is inconsequential. DCI disagrees and provides an illustration of how conclusions based on the regression analysis results may differ substantially based on the dummy coding scheme one adopts.

First Set of Examples

As an initial basis for our illustration, we use the small data set below.

Employee ID
Salary
Time-in-Job
Race
919
$37,001.00
1.00
Asian
527
$39,420.00
2.33
Asian
287
$38,510.00
1.33
Asian
435
$38,376.00
1.67
Asian
96
$39,796.00
0.33
Asian
480
$36,174.00
0.67
Black
184
$36,801.00
1.67
Black
77
$35,259.00
1.67
Black
660
$36,682.00
1.00
Black
945
$34,100.00
0.67
Black
620
$36,815.00
1.67
Hispanic
810
$33,459.00
1.00
Hispanic
700
$33,017.00
0.67
Hispanic
144
$34,429.00
1.00
Hispanic
28
$34,884.00
0.33
Hispanic
225
$36,855.00
2.00
White
404
$35,737.00
2.33
White
316
$33,266.00
0.67
White
685
$33,808.00
1.67
White
369
$34,331.00
1.00
White

Imagine that the data set represents 20 employees for a particular job. Five of the employees are Asian, five are Black, five are Hispanic, and five are White. The compensation team identified Time-in-Job as the only factor that should influence salary in the job. See the summary table below for average salary and Time-in-Job (expressed as years) for each racial/ethnic group.

Race Group
Salary
Time-in-Job
Asian
$38,620.60
1.33
Black
$35,803.20
1.13
Hispanic
$34,520.80
0.93
White
$34,799.40
1.53

According to the summary table, the Asian Group on average has the highest salary, and the White Group on average has the highest Time-in-Job. Using a regression equation, we can account for the influence of Time-in-Job on Salary and then evaluate whether any statistically significant salary differences across racial/ethnic categories remain. That is, we can evaluate whether there are unexplained salary differences across racial/ethnic categories after accounting for how long each individual has been in his or her job.

Although the procedure is fairly straightforward on its face, the racial/ethnic categories must be coded using a scheme called dummy coding in order for race/ethnicity to be included in the regression equation. A dummy code is simply a variable representing a particular race/ethnicity such that everybody in the particular racial/ethnic category receives a 1 for the variable and everybody in a different race/ethnicity category receives a 0 for the variable. The complication is that to fully represent race/ethnicity in the regression analysis, one does not have to create a dummy code for every racial/ethnic category. One only needs to create a dummy code for every category but one. The category that does not receive a dummy code is called the referent group. The referent group is the category to which all other races with dummy codes are compared in the regression equation. That is, the regression results for each dummy code included in the regression analysis reflect the difference in salary between the race with the dummy code and the race without the dummy code (the referent group). So, determination of which race serves as the referent group will dictate what specific comparisons one is making. We provide two examples below to illustrate.

In the first example, let us assume that our theory of discrimination is that at least one race category is potentially underpaid compared to the White category. If this is our theory of discrimination, then the coding in the table below would allow us to create a regression model to test the theory.[1]
Example 1
Employee ID
Salary
Time-in-Job
Race
Black Dcode
Hispanic Dcode
Asian Dcode
919
$37,001.00
1.00
Asian
0
0
1
527
$39,420.00
2.33
Asian
0
0
1
287
$38,510.00
1.33
Asian
0
0
1
435
$38,376.00
1.67
Asian
0
0
1
96
$39,796.00
0.33
Asian
0
0
1
480
$36,174.00
0.67
Black
1
0
0
184
$36,801.00
1.67
Black
1
0
0
77
$35,259.00
1.67
Black
1
0
0
660
$36,682.00
1.00
Black
1
0
0
945
$34,100.00
0.67
Black
1
0
0
620
$36,815.00
1.67
Hispanic
0
1
0
810
$33,459.00
1.00
Hispanic
0
1
0
700
$33,017.00
0.67
Hispanic
0
1
0
144
$34,429.00
1.00
Hispanic
0
1
0
28
$34,884.00
0.33
Hispanic
0
1
0
225
$36,855.00
2.00
White
0
0
0
404
$35,737.00
2.33
White
0
0
0
316
$33,266.00
0.67
White
0
0
0
685
$33,808.00
1.67
White
0
0
0
369
$34,331.00
1.00
White
0
0
0

 

 

In the regression model, we would predict salary by including Time-in-Job, Black Dcode, Hispanic Dcode, and Asian Dcode[2]. If the regression coefficient associated with a particular dummy code is statistically significant, then there is support that the average salary for that particular race category is significantly different from the White category, after accounting for the influence of Time-in-Job on salary. The sign in front of the statistically significant regression coefficient indicates whether the average salary for the group is higher or lower than the White group, after accounting for the influence of Time-in-Job on salary. The table below presents the different results and interpretations of the regression coefficients associated with the dummy codes.

 
Race Category
Regression Coefficient Sign
Statistically Significant
Not Statistically Significant
Black
Positive
The average salary for the Black group is significantly  higher than that for the White group, after accounting for the influence of Time-in-Job on Salary
No significant difference in salary  from the White Group
Negative
The average salary for the Black group is significantly  lower than that for the White group, after accounting for the influence of Time-in-Job on Salary
Hispanic
Positive
The average salary for the Hispanic group is significantly  higher than that for the White group, after accounting for the influence of Time-in-Job on Salary
Negative
The average salary for the Hispanic group is significantly  lower than that for the White group, after accounting for the influence of Time-in-Job on Salary
Asian
Positive
The average salary for the Asian group is significantly  higher than that for the White group, after accounting for the influence of Time-in-Job on Salary
Negative
The average salary for the Asian group is significantly  lower than that for the White group, after accounting for the influence of Time-in-Job on Salary

 

 
 

Note, there is no statistical test of whether the average salary for the Black category differs from that of the Hispanic category or that of the Asian category, after accounting for Time-in-Job. Similarly, there is no statistical test of whether the average salary for the Hispanic category differs from that of the Asian category, after accounting for Time-in-Job.

In the second example, let us assume that our theory of discrimination is that at least one race category is potentially underpaid compared to the Asian category (i.e., the Race group with the highest average salary, merit variables notwithstanding). If this is our theory of discrimination, then the coding in the table below would allow us to create a regression model to test the theory.[3]
Example 2
 
Employee ID
Salary
Race
Time-in-Job
Black Dcode
Hispanic Dcode
White Dcode
919
$37,001.00
1.00
Asian
0
0
0
527
$39,420.00
2.33
Asian
0
0
0
287
$38,510.00
1.33
Asian
0
0
0
435
$38,376.00
1.67
Asian
0
0
0
96
$39,796.00
0.33
Asian
0
0
0
480
$36,174.00
0.67
Black
1
0
0
184
$36,801.00
1.67
Black
1
0
0
77
$35,259.00
1.67
Black
1
0
0
660
$36,682.00
1.00
Black
1
0
0
945
$34,100.00
0.67
Black
1
0
0
620
$36,815.00
1.67
Hispanic
0
1
0
810
$33,459.00
1.00
Hispanic
0
1
0
700
$33,017.00
0.67
Hispanic
0
1
0
144
$34,429.00
1.00
Hispanic
0
1
0
28
$34,884.00
0.33
Hispanic
0
1
0
225
$36,855.00
2.00
White
0
0
1
404
$35,737.00
2.33
White
0
0
1
316
$33,266.00
0.67
White
0
0
1
685
$33,808.00
1.67
White
0
0
1
369
$34,331.00
1.00
White
0
0
1

In the regression model, we would predict salary by including Time-in-Job, Black Dcode, Hispanic Dcode, and White Dcode[4]. If the regression coefficient associated with a particular dummy code is statistically significant, then there is support that the average salary for that particular race category is significantly different from the Asian category, after accounting for the influence of Time-in-Job on salary. The sign in front of the statistically significant regression coefficient indicates whether the average salary for the group is higher or lower than the Asian group, after accounting for the influence of Time-in-Job on salary. The table below presents the different results and interpretations of the regression coefficients associated with the dummy codes.

Note, there is no statistical test of whether the average salary for the Black category differs from that of the Hispanic category or that of the White category, after accounting for Time-in-Job. Similarly, there is no statistical test of whether the average salary for the Hispanic category differs from that of the White category, after accounting for Time-in-Job.

Race Category
Regression Coefficient Sign
Statistically Significant
Not Statistically Significant
Black
Positive
The average salary for the Black group is significantly  higher than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
No significant difference in salary  from the Asian Group
Negative
The average salary for the Black group is significantly  lower than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
Hispanic
Positive
The average salary for the Hispanic group is significantly  higher than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
Negative
The average salary for the Hispanic group is significantly  lower than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
White
Positive
The average salary for the White group is significantly  higher than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
Negative
The average salary for the White group is significantly  lower than that for the Asian group, after accounting for the influence of Time-in-Job on Salary

To illustrate the issue further, we conducted actual regression analyses using the data provided in the first table. We conducted analyses under the first theory of discrimination (all groups are underpaid compared to the White category, after accounting for Time-in-Job). We also conducted analyses under the second theory of discrimination (all groups are underpaid compared to the Asian category, after accounting for Time-in-Job). The results from each analysis are below.

Example
R2
TiJ
Black Dcode
Hispanic Dcode
Asian Dcode
1 (White Referent)
 
b-weight
t-value
b-weight
t-value
b-weight
t-value
b-weight
t-value
 
.73
941.32
1.93
1380.33
1.75
286.19
0.35
4009.46
5.21*
Example
R2
TiJ
Black Dcode
Hispanic Dcode
White Dcode
2 (Asian Referent)
 
b-weight
t-value
b-weight
t-value
b-weight
t-value
b-weight
t-value
 
.73
941.32
1.93
-2629.14
-3.42*
-3723.27
-4.73*
-4009.46
-5.21*

*Statistically significant at p < .05

As is evident from the regression results, the dummy coding scheme does matter. In the first example, one would conclude that the Asian group has a significantly higher salary than the White group, after accounting for Time-in-Job. In the second example, one would conclude that the Black, Hispanic, and White groups ALL have significantly lower salaries than the Asian group, after accounting for Time-in-Job. Moreover, it is clear from the comparison of the R2 values that the two models are exactly the same in terms of how much overall variance is accounted for as well as the contribution of Time-in-Job. This is not a surprise, as exactly the same variables are included in each model: Time-in-Job and a set of dummy codes fully representing race. The models only differ in terms of which races are being compared to one another.

Second Set of Examples

We present a slightly different data set and new regression results to highlight a different scenario in which under one theory of discrimination, there are no statistical flags, but under an alternative theory of discrimination, there are statistical flags. The presented results are based on the small data set below.

Employee ID
Salary
Time-in-Job
Race
919
$37,001
1.00
Asian
527
$39,420
2.33
Asian
287
$38,510
1.33
Asian
435
$38,376
1.67
Asian
96
$39,796
0.33
Asian
480
$36,174
0.67
Black
184
$34,801
1.67
Black
77
$35,259
1.67
Black
660
$36,682
1.00
Black
945
$32,100
0.67
Black
620
$36,815
1.67
Hispanic
810
$33,459
1.00
Hispanic
700
$33,017
0.67
Hispanic
144
$34,429
4.00
Hispanic
28
$34,884
0.33
Hispanic
225
$42,855
2.00
White
404
$40,737
2.33
White
316
$34,266
0.67
White
685
$33,808
1.67
White
369
$33,331
1.00
White

See the summary table below for average salary and Time-in-Job (expressed as years) for each racial/ethnic group.

Race Group
Salary
Time-in-Job
Asian
$38,621
1.33
Black
$35,003
1.13
Hispanic
$34,521
1.53
White
$36,999
1.53

Using the theory of discrimination presented in the first example, we hypothesize that at least one race category is potentially underpaid compared to the White category. The dummy coding scheme would follow that presented earlier for Example 1. The results presented in the “Example 3” row found in the table below highlight that the average salaries for the Black, Hispanic, and Asian groups are not significantly different than the average salaries for the White group, after accounting for the influence of Time-in-Job.

Alternatively, suppose our theory of discrimination is consistent with that used in the second example, in which we hypothesize that at least one race category is potentially underpaid compared to the Asian category (i.e., the Race group with the highest average salary, merit variables notwithstanding). Again, our dummy coding scheme would follow that presented for the second example. The results presented in the “Example 4” row found in the table below indicate that the average salary for the White group is not significantly different from the average salary for the Asian group, after accounting for Time-in-Job. Conversely, however, the results indicate that the average salary for the Black and Hispanic groups are significantly lower than the average salary for the Asian group, after accounting for Time-in-Job.

Example
R2
TiJ
Black Dcode
Hispanic Dcode
Asian Dcode
3 (White Referent)
 
b-weight
t-value
b-weight
t-value
b-weight
t-value
b-weight
t-value
 
.40
858.59
1.26
-1654.48
-1.02
-2478.60
-1.55
1794.63
1.12
Example
R2
TiJ
Black Dcode
Hispanic Dcode
White Dcode
4 (Asian Referent)
 
b-weight
t-value
b-weight
t-value
b-weight
t-value
b-weight
t-value
 
.40
858.59
1.26
-3449.12
-2.15*
-4273.23
-2.67*
-1794.63
-1.12
 

*Statistically significant at p < .05Such results provide another illustration that the dummy coding scheme does matter. In the third example, one would conclude that  no racial/ethnic groups have significantly different salaries than the White group, after accounting for Time-in-Job. In the fourth example, one would conclude that the  Black and Hispanic groups have significantly lower salaries than the Asian group, after accounting for Time-in-Job.

Conclusion

We hope that this white paper helps to clarify DCI’s concerns about the lack of agency guidance on the approach that federal contractors should take when conducting compensation analyses. It is our hope that the OFCCP considers the problems resulting from an absent central strategy on the issue. We look forward to publication of official guidance from OFCCP to help contractors conduct proactive analyses in manners consistent with the agency’s approach.

_________________________________________________________________

[1] It is evident by looking at the patterns of 1s and 0s across the three dummy codes that the set of dummy codes will fully represent race when entered into the regression equation. The Asian category has a pattern of 001 across dummy codes, the Black category has a pattern of 100 across dummy codes, the Hispanic category has a pattern of 010 across dummy codes, and the White category has a pattern of 000 across dummy codes. 

[2] The regression equation would be Salary = Intercept + b1(Time-in-Job) + b2(Black Dcode) + b3(Hispanic Dcode) + b4(Asian Dcode) 

[3] It is evident by looking at the patterns of  1s and  0s across the three dummy codes that the set of dummy codes will fully represent race when entered into the regression equation. The Asian category has a pattern of  000 across dummy codes, the Black category has a pattern of  100 across dummy codes, the Hispanic category has a pattern of  010 across dummy codes, and the White category has a pattern of  001 across dummy codes.

[4] The regression equation would be Salary = Intercept + b1(Time-in-Job) + b2(Black Dcode) + b3(Hispanic Dcode) + b4(White Dcode)

by Kayo Sady, Ph.D., Consultant, DCI Consulting Group

 

Stay up-to-date with DCI Alerts, sign up here:

Advice, articles, and the news you need, delivered right to your inbox.

Expert_Witness_1st_Place_badge

Stay in the Know!