In a prior blog post, we indicated that OFCCP has taken a position that the specific strategy for dummy coding races in a regression analysis does not matter. Specifically, they have argued that the group that one chooses as the referent group is inconsequential. DCI disagrees and provides an illustration of how conclusions based on the regression analysis results may differ substantially based on the dummy coding scheme one adopts.
As an initial basis for our illustration, we use the small data set below.
Employee ID
|
Salary
|
Time-in-Job
|
Race
|
919
|
$37,001.00
|
1.00
|
Asian
|
527
|
$39,420.00
|
2.33
|
Asian
|
287
|
$38,510.00
|
1.33
|
Asian
|
435
|
$38,376.00
|
1.67
|
Asian
|
96
|
$39,796.00
|
0.33
|
Asian
|
480
|
$36,174.00
|
0.67
|
Black
|
184
|
$36,801.00
|
1.67
|
Black
|
77
|
$35,259.00
|
1.67
|
Black
|
660
|
$36,682.00
|
1.00
|
Black
|
945
|
$34,100.00
|
0.67
|
Black
|
620
|
$36,815.00
|
1.67
|
Hispanic
|
810
|
$33,459.00
|
1.00
|
Hispanic
|
700
|
$33,017.00
|
0.67
|
Hispanic
|
144
|
$34,429.00
|
1.00
|
Hispanic
|
28
|
$34,884.00
|
0.33
|
Hispanic
|
225
|
$36,855.00
|
2.00
|
White
|
404
|
$35,737.00
|
2.33
|
White
|
316
|
$33,266.00
|
0.67
|
White
|
685
|
$33,808.00
|
1.67
|
White
|
369
|
$34,331.00
|
1.00
|
White
|
Imagine that the data set represents 20 employees for a particular job. Five of the employees are Asian, five are Black, five are Hispanic, and five are White. The compensation team identified Time-in-Job as the only factor that should influence salary in the job. See the summary table below for average salary and Time-in-Job (expressed as years) for each racial/ethnic group.
Race Group
|
Salary
|
Time-in-Job
|
Asian
|
$38,620.60
|
1.33
|
Black
|
$35,803.20
|
1.13
|
Hispanic
|
$34,520.80
|
0.93
|
White
|
$34,799.40
|
1.53
|
According to the summary table, the Asian Group on average has the highest salary, and the White Group on average has the highest Time-in-Job. Using a regression equation, we can account for the influence of Time-in-Job on Salary and then evaluate whether any statistically significant salary differences across racial/ethnic categories remain. That is, we can evaluate whether there are unexplained salary differences across racial/ethnic categories after accounting for how long each individual has been in his or her job.
Although the procedure is fairly straightforward on its face, the racial/ethnic categories must be coded using a scheme called dummy coding in order for race/ethnicity to be included in the regression equation. A dummy code is simply a variable representing a particular race/ethnicity such that everybody in the particular racial/ethnic category receives a 1 for the variable and everybody in a different race/ethnicity category receives a 0 for the variable. The complication is that to fully represent race/ethnicity in the regression analysis, one does not have to create a dummy code for every racial/ethnic category. One only needs to create a dummy code for every category but one. The category that does not receive a dummy code is called the referent group. The referent group is the category to which all other races with dummy codes are compared in the regression equation. That is, the regression results for each dummy code included in the regression analysis reflect the difference in salary between the race with the dummy code and the race without the dummy code (the referent group). So, determination of which race serves as the referent group will dictate what specific comparisons one is making. We provide two examples below to illustrate.
Employee ID
|
Salary
|
Time-in-Job
|
Race
|
Black Dcode
|
Hispanic Dcode
|
Asian Dcode
|
919
|
$37,001.00
|
1.00
|
Asian
|
0
|
0
|
1
|
527
|
$39,420.00
|
2.33
|
Asian
|
0
|
0
|
1
|
287
|
$38,510.00
|
1.33
|
Asian
|
0
|
0
|
1
|
435
|
$38,376.00
|
1.67
|
Asian
|
0
|
0
|
1
|
96
|
$39,796.00
|
0.33
|
Asian
|
0
|
0
|
1
|
480
|
$36,174.00
|
0.67
|
Black
|
1
|
0
|
0
|
184
|
$36,801.00
|
1.67
|
Black
|
1
|
0
|
0
|
77
|
$35,259.00
|
1.67
|
Black
|
1
|
0
|
0
|
660
|
$36,682.00
|
1.00
|
Black
|
1
|
0
|
0
|
945
|
$34,100.00
|
0.67
|
Black
|
1
|
0
|
0
|
620
|
$36,815.00
|
1.67
|
Hispanic
|
0
|
1
|
0
|
810
|
$33,459.00
|
1.00
|
Hispanic
|
0
|
1
|
0
|
700
|
$33,017.00
|
0.67
|
Hispanic
|
0
|
1
|
0
|
144
|
$34,429.00
|
1.00
|
Hispanic
|
0
|
1
|
0
|
28
|
$34,884.00
|
0.33
|
Hispanic
|
0
|
1
|
0
|
225
|
$36,855.00
|
2.00
|
White
|
0
|
0
|
0
|
404
|
$35,737.00
|
2.33
|
White
|
0
|
0
|
0
|
316
|
$33,266.00
|
0.67
|
White
|
0
|
0
|
0
|
685
|
$33,808.00
|
1.67
|
White
|
0
|
0
|
0
|
369
|
$34,331.00
|
1.00
|
White
|
0
|
0
|
0
|
In the regression model, we would predict salary by including Time-in-Job, Black Dcode, Hispanic Dcode, and Asian Dcode[2]. If the regression coefficient associated with a particular dummy code is statistically significant, then there is support that the average salary for that particular race category is significantly different from the White category, after accounting for the influence of Time-in-Job on salary. The sign in front of the statistically significant regression coefficient indicates whether the average salary for the group is higher or lower than the White group, after accounting for the influence of Time-in-Job on salary. The table below presents the different results and interpretations of the regression coefficients associated with the dummy codes.
Race Category
|
Regression Coefficient Sign
|
Statistically Significant
|
Not Statistically Significant
|
Black
|
Positive
|
The average salary for the Black group is significantly
higher than that for the White group, after accounting for the influence of Time-in-Job on Salary
|
No significant difference in salary from the White Group
|
Negative
|
The average salary for the Black group is significantly
lower than that for the White group, after accounting for the influence of Time-in-Job on Salary
|
||
Hispanic
|
Positive
|
The average salary for the Hispanic group is significantly
higher than that for the White group, after accounting for the influence of Time-in-Job on Salary
|
|
Negative
|
The average salary for the Hispanic group is significantly
lower than that for the White group, after accounting for the influence of Time-in-Job on Salary
|
||
Asian
|
Positive
|
The average salary for the Asian group is significantly
higher than that for the White group, after accounting for the influence of Time-in-Job on Salary
|
|
Negative
|
The average salary for the Asian group is significantly
lower than that for the White group, after accounting for the influence of Time-in-Job on Salary
|
Note, there is no statistical test of whether the average salary for the Black category differs from that of the Hispanic category or that of the Asian category, after accounting for Time-in-Job. Similarly, there is no statistical test of whether the average salary for the Hispanic category differs from that of the Asian category, after accounting for Time-in-Job.
Employee ID
|
Salary
|
Race
|
Time-in-Job
|
Black Dcode
|
Hispanic Dcode
|
White Dcode
|
919
|
$37,001.00
|
1.00
|
Asian
|
0
|
0
|
0
|
527
|
$39,420.00
|
2.33
|
Asian
|
0
|
0
|
0
|
287
|
$38,510.00
|
1.33
|
Asian
|
0
|
0
|
0
|
435
|
$38,376.00
|
1.67
|
Asian
|
0
|
0
|
0
|
96
|
$39,796.00
|
0.33
|
Asian
|
0
|
0
|
0
|
480
|
$36,174.00
|
0.67
|
Black
|
1
|
0
|
0
|
184
|
$36,801.00
|
1.67
|
Black
|
1
|
0
|
0
|
77
|
$35,259.00
|
1.67
|
Black
|
1
|
0
|
0
|
660
|
$36,682.00
|
1.00
|
Black
|
1
|
0
|
0
|
945
|
$34,100.00
|
0.67
|
Black
|
1
|
0
|
0
|
620
|
$36,815.00
|
1.67
|
Hispanic
|
0
|
1
|
0
|
810
|
$33,459.00
|
1.00
|
Hispanic
|
0
|
1
|
0
|
700
|
$33,017.00
|
0.67
|
Hispanic
|
0
|
1
|
0
|
144
|
$34,429.00
|
1.00
|
Hispanic
|
0
|
1
|
0
|
28
|
$34,884.00
|
0.33
|
Hispanic
|
0
|
1
|
0
|
225
|
$36,855.00
|
2.00
|
White
|
0
|
0
|
1
|
404
|
$35,737.00
|
2.33
|
White
|
0
|
0
|
1
|
316
|
$33,266.00
|
0.67
|
White
|
0
|
0
|
1
|
685
|
$33,808.00
|
1.67
|
White
|
0
|
0
|
1
|
369
|
$34,331.00
|
1.00
|
White
|
0
|
0
|
1
|
In the regression model, we would predict salary by including Time-in-Job, Black Dcode, Hispanic Dcode, and White Dcode[4]. If the regression coefficient associated with a particular dummy code is statistically significant, then there is support that the average salary for that particular race category is significantly different from the Asian category, after accounting for the influence of Time-in-Job on salary. The sign in front of the statistically significant regression coefficient indicates whether the average salary for the group is higher or lower than the Asian group, after accounting for the influence of Time-in-Job on salary. The table below presents the different results and interpretations of the regression coefficients associated with the dummy codes.
Note, there is no statistical test of whether the average salary for the Black category differs from that of the Hispanic category or that of the White category, after accounting for Time-in-Job. Similarly, there is no statistical test of whether the average salary for the Hispanic category differs from that of the White category, after accounting for Time-in-Job.
Race Category
|
Regression Coefficient Sign
|
Statistically Significant
|
Not Statistically Significant
|
Black
|
Positive
|
The average salary for the Black group is significantly
higher than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
|
No significant difference in salary from the Asian Group
|
Negative
|
The average salary for the Black group is significantly
lower than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
|
||
Hispanic
|
Positive
|
The average salary for the Hispanic group is significantly
higher than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
|
|
Negative
|
The average salary for the Hispanic group is significantly
lower than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
|
||
White
|
Positive
|
The average salary for the White group is significantly
higher than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
|
|
Negative
|
The average salary for the White group is significantly
lower than that for the Asian group, after accounting for the influence of Time-in-Job on Salary
|
To illustrate the issue further, we conducted actual regression analyses using the data provided in the first table. We conducted analyses under the first theory of discrimination (all groups are underpaid compared to the White category, after accounting for Time-in-Job). We also conducted analyses under the second theory of discrimination (all groups are underpaid compared to the Asian category, after accounting for Time-in-Job). The results from each analysis are below.
Example
|
R2
|
TiJ
|
Black Dcode
|
Hispanic Dcode
|
Asian Dcode
|
||||
1 (White Referent)
|
b-weight
|
t-value
|
b-weight
|
t-value
|
b-weight
|
t-value
|
b-weight
|
t-value
|
|
.73
|
941.32
|
1.93
|
1380.33
|
1.75
|
286.19
|
0.35
|
4009.46
|
5.21*
|
|
Example
|
R2
|
TiJ
|
Black Dcode
|
Hispanic Dcode
|
White Dcode
|
||||
2 (Asian Referent)
|
b-weight
|
t-value
|
b-weight
|
t-value
|
b-weight
|
t-value
|
b-weight
|
t-value
|
|
.73
|
941.32
|
1.93
|
-2629.14
|
-3.42*
|
-3723.27
|
-4.73*
|
-4009.46
|
-5.21*
|
*Statistically significant at p < .05
As is evident from the regression results, the dummy coding scheme does matter. In the first example, one would conclude that the Asian group has a significantly higher salary than the White group, after accounting for Time-in-Job. In the second example, one would conclude that the Black, Hispanic, and White groups ALL have significantly lower salaries than the Asian group, after accounting for Time-in-Job. Moreover, it is clear from the comparison of the R2 values that the two models are exactly the same in terms of how much overall variance is accounted for as well as the contribution of Time-in-Job. This is not a surprise, as exactly the same variables are included in each model: Time-in-Job and a set of dummy codes fully representing race. The models only differ in terms of which races are being compared to one another.
We present a slightly different data set and new regression results to highlight a different scenario in which under one theory of discrimination, there are no statistical flags, but under an alternative theory of discrimination, there are statistical flags. The presented results are based on the small data set below.
Employee ID
|
Salary
|
Time-in-Job
|
Race
|
919
|
$37,001
|
1.00
|
Asian
|
527
|
$39,420
|
2.33
|
Asian
|
287
|
$38,510
|
1.33
|
Asian
|
435
|
$38,376
|
1.67
|
Asian
|
96
|
$39,796
|
0.33
|
Asian
|
480
|
$36,174
|
0.67
|
Black
|
184
|
$34,801
|
1.67
|
Black
|
77
|
$35,259
|
1.67
|
Black
|
660
|
$36,682
|
1.00
|
Black
|
945
|
$32,100
|
0.67
|
Black
|
620
|
$36,815
|
1.67
|
Hispanic
|
810
|
$33,459
|
1.00
|
Hispanic
|
700
|
$33,017
|
0.67
|
Hispanic
|
144
|
$34,429
|
4.00
|
Hispanic
|
28
|
$34,884
|
0.33
|
Hispanic
|
225
|
$42,855
|
2.00
|
White
|
404
|
$40,737
|
2.33
|
White
|
316
|
$34,266
|
0.67
|
White
|
685
|
$33,808
|
1.67
|
White
|
369
|
$33,331
|
1.00
|
White
|
See the summary table below for average salary and Time-in-Job (expressed as years) for each racial/ethnic group.
Race Group
|
Salary
|
Time-in-Job
|
Asian
|
$38,621
|
1.33
|
Black
|
$35,003
|
1.13
|
Hispanic
|
$34,521
|
1.53
|
White
|
$36,999
|
1.53
|
Using the theory of discrimination presented in the first example, we hypothesize that at least one race category is potentially underpaid compared to the White category. The dummy coding scheme would follow that presented earlier for Example 1. The results presented in the “Example 3” row found in the table below highlight that the average salaries for the Black, Hispanic, and Asian groups are not significantly different than the average salaries for the White group, after accounting for the influence of Time-in-Job.
Alternatively, suppose our theory of discrimination is consistent with that used in the second example, in which we hypothesize that at least one race category is potentially underpaid compared to the Asian category (i.e., the Race group with the highest average salary, merit variables notwithstanding). Again, our dummy coding scheme would follow that presented for the second example. The results presented in the “Example 4” row found in the table below indicate that the average salary for the White group is not significantly different from the average salary for the Asian group, after accounting for Time-in-Job. Conversely, however, the results indicate that the average salary for the Black and Hispanic groups are significantly lower than the average salary for the Asian group, after accounting for Time-in-Job.
Example
|
R2
|
TiJ
|
Black Dcode
|
Hispanic Dcode
|
Asian Dcode
|
||||
3 (White Referent)
|
b-weight
|
t-value
|
b-weight
|
t-value
|
b-weight
|
t-value
|
b-weight
|
t-value
|
|
.40
|
858.59
|
1.26
|
-1654.48
|
-1.02
|
-2478.60
|
-1.55
|
1794.63
|
1.12
|
|
Example
|
R2
|
TiJ
|
Black Dcode
|
Hispanic Dcode
|
White Dcode
|
||||
4 (Asian Referent)
|
b-weight
|
t-value
|
b-weight
|
t-value
|
b-weight
|
t-value
|
b-weight
|
t-value
|
|
.40
|
858.59
|
1.26
|
-3449.12
|
-2.15*
|
-4273.23
|
-2.67*
|
-1794.63
|
-1.12
|
*Statistically significant at p < .05Such results provide another illustration that the dummy coding scheme does matter. In the third example, one would conclude that no racial/ethnic groups have significantly different salaries than the White group, after accounting for Time-in-Job. In the fourth example, one would conclude that the Black and Hispanic groups have significantly lower salaries than the Asian group, after accounting for Time-in-Job.
Conclusion
We hope that this white paper helps to clarify DCI’s concerns about the lack of agency guidance on the approach that federal contractors should take when conducting compensation analyses. It is our hope that the OFCCP considers the problems resulting from an absent central strategy on the issue. We look forward to publication of official guidance from OFCCP to help contractors conduct proactive analyses in manners consistent with the agency’s approach.
_________________________________________________________________
[1] It is evident by looking at the patterns of 1s and 0s across the three dummy codes that the set of dummy codes will fully represent race when entered into the regression equation. The Asian category has a pattern of 001 across dummy codes, the Black category has a pattern of 100 across dummy codes, the Hispanic category has a pattern of 010 across dummy codes, and the White category has a pattern of 000 across dummy codes.
[2] The regression equation would be Salary = Intercept + b1(Time-in-Job) + b2(Black Dcode) + b3(Hispanic Dcode) + b4(Asian Dcode)
[4] The regression equation would be Salary = Intercept + b1(Time-in-Job) + b2(Black Dcode) + b3(Hispanic Dcode) + b4(White Dcode)
by Kayo Sady, Ph.D., Consultant, DCI Consulting Group