In a prior blog post, we indicated that OFCCP has taken a position that the specific strategy for dummy coding races in a regression analysis does not matter. Specifically, they have argued that the group that one chooses as the referent group is inconsequential. DCI disagrees and provides an illustration of how conclusions based on the regression analysis results may differ substantially based on the dummy coding scheme one adopts.
As an initial basis for our illustration, we use the small data set below.
Employee ID

Imagine that the data set represents 20 employees for a particular job. Five of the employees are Asian, five are Black, five are Hispanic, and five are White. The compensation team identified TimeinJob as the only factor that should influence salary in the job. See the summary table below for average salary and TimeinJob (expressed as years) for each racial/ethnic group.
Race Group

Salary

TimeinJob

Asian

$38,620.60

1.33

Black

$35,803.20

1.13

Hispanic

$34,520.80

0.93

White

$34,799.40

1.53

According to the summary table, the Asian Group on average has the highest salary, and the White Group on average has the highest TimeinJob. Using a regression equation, we can account for the influence of TimeinJob on Salary and then evaluate whether any statistically significant salary differences across racial/ethnic categories remain. That is, we can evaluate whether there are unexplained salary differences across racial/ethnic categories after accounting for how long each individual has been in his or her job.
Although the procedure is fairly straightforward on its face, the racial/ethnic categories must be coded using a scheme called dummy coding in order for race/ethnicity to be included in the regression equation. A dummy code is simply a variable representing a particular race/ethnicity such that everybody in the particular racial/ethnic category receives a 1 for the variable and everybody in a different race/ethnicity category receives a 0 for the variable. The complication is that to fully represent race/ethnicity in the regression analysis, one does not have to create a dummy code for every racial/ethnic category. One only needs to create a dummy code for every category but one. The category that does not receive a dummy code is called the referent group. The referent group is the category to which all other races with dummy codes are compared in the regression equation. That is, the regression results for each dummy code included in the regression analysis reflect the difference in salary between the race with the dummy code and the race without the dummy code (the referent group). So, determination of which race serves as the referent group will dictate what specific comparisons one is making. We provide two examples below to illustrate.
Employee ID

Salary

TimeinJob

Race

Black Dcode

Hispanic Dcode

Asian Dcode

919

$37,001.00

1.00

Asian

0

0

1

527

$39,420.00

2.33

Asian

0

0

1

287

$38,510.00

1.33

Asian

0

0

1

435

$38,376.00

1.67

Asian

0

0

1

96

$39,796.00

0.33

Asian

0

0

1

480

$36,174.00

0.67

Black

1

0

0

184

$36,801.00

1.67

Black

1

0

0

77

$35,259.00

1.67

Black

1

0

0

660

$36,682.00

1.00

Black

1

0

0

945

$34,100.00

0.67

Black

1

0

0

620

$36,815.00

1.67

Hispanic

0

1

0

810

$33,459.00

1.00

Hispanic

0

1

0

700

$33,017.00

0.67

Hispanic

0

1

0

144

$34,429.00

1.00

Hispanic

0

1

0

28

$34,884.00

0.33

Hispanic

0

1

0

225

$36,855.00

2.00

White

0

0

0

404

$35,737.00

2.33

White

0

0

0

316

$33,266.00

0.67

White

0

0

0

685

$33,808.00

1.67

White

0

0

0

369

$34,331.00

1.00

White

0

0

0

In the regression model, we would predict salary by including TimeinJob, Black Dcode, Hispanic Dcode, and Asian Dcode[2]. If the regression coefficient associated with a particular dummy code is statistically significant, then there is support that the average salary for that particular race category is significantly different from the White category, after accounting for the influence of TimeinJob on salary. The sign in front of the statistically significant regression coefficient indicates whether the average salary for the group is higher or lower than the White group, after accounting for the influence of TimeinJob on salary. The table below presents the different results and interpretations of the regression coefficients associated with the dummy codes.
Race Category

Note, there is no statistical test of whether the average salary for the Black category differs from that of the Hispanic category or that of the Asian category, after accounting for TimeinJob. Similarly, there is no statistical test of whether the average salary for the Hispanic category differs from that of the Asian category, after accounting for TimeinJob.
Employee ID

Salary

Race

TimeinJob

Black Dcode

Hispanic Dcode

White Dcode

919

$37,001.00

1.00

Asian

0

0

0

527

$39,420.00

2.33

Asian

0

0

0

287

$38,510.00

1.33

Asian

0

0

0

435

$38,376.00

1.67

Asian

0

0

0

96

$39,796.00

0.33

Asian

0

0

0

480

$36,174.00

0.67

Black

1

0

0

184

$36,801.00

1.67

Black

1

0

0

77

$35,259.00

1.67

Black

1

0

0

660

$36,682.00

1.00

Black

1

0

0

945

$34,100.00

0.67

Black

1

0

0

620

$36,815.00

1.67

Hispanic

0

1

0

810

$33,459.00

1.00

Hispanic

0

1

0

700

$33,017.00

0.67

Hispanic

0

1

0

144

$34,429.00

1.00

Hispanic

0

1

0

28

$34,884.00

0.33

Hispanic

0

1

0

225

$36,855.00

2.00

White

0

0

1

404

$35,737.00

2.33

White

0

0

1

316

$33,266.00

0.67

White

0

0

1

685

$33,808.00

1.67

White

0

0

1

369

$34,331.00

1.00

White

0

0

1

In the regression model, we would predict salary by including TimeinJob, Black Dcode, Hispanic Dcode, and White Dcode[4]. If the regression coefficient associated with a particular dummy code is statistically significant, then there is support that the average salary for that particular race category is significantly different from the Asian category, after accounting for the influence of TimeinJob on salary. The sign in front of the statistically significant regression coefficient indicates whether the average salary for the group is higher or lower than the Asian group, after accounting for the influence of TimeinJob on salary. The table below presents the different results and interpretations of the regression coefficients associated with the dummy codes.
Note, there is no statistical test of whether the average salary for the Black category differs from that of the Hispanic category or that of the White category, after accounting for TimeinJob. Similarly, there is no statistical test of whether the average salary for the Hispanic category differs from that of the White category, after accounting for TimeinJob.
Race Category

Regression Coefficient Sign

Statistically Significant

Not Statistically Significant

Black

Positive

The average salary for the Black group is significantly
higher than that for the Asian group, after accounting for the influence of TimeinJob on Salary

No significant difference in salary from the Asian Group

Negative

The average salary for the Black group is significantly
lower than that for the Asian group, after accounting for the influence of TimeinJob on Salary


Hispanic

Positive

The average salary for the Hispanic group is significantly
higher than that for the Asian group, after accounting for the influence of TimeinJob on Salary


Negative

The average salary for the Hispanic group is significantly
lower than that for the Asian group, after accounting for the influence of TimeinJob on Salary


White

Positive

The average salary for the Asian group is significantly
higher than that for the Asian group, after accounting for the influence of TimeinJob on Salary


Negative

The average salary for the Asian group is significantly
lower than that for the Asian group, after accounting for the influence of TimeinJob on Salary

To illustrate the issue further, we conducted actual regression analyses using the data provided in the first table. We conducted analyses under the first theory of discrimination (all groups are underpaid compared to the White category, after accounting for TimeinJob). We also conducted analyses under the second theory of discrimination (all groups are underpaid compared to the Asian category, after accounting for TimeinJob). The results from each analysis are below.
Example

R^{2}

TiJ

Black Dcode

Hispanic Dcode

Asian Dcode


1 (White Referent)

bweight

tvalue

bweight

tvalue

bweight

tvalue

bweight

tvalue


.73

941.32

1.93

1380.33

1.75

286.19

0.35

4009.46

5.21*


Example

R^{2}

TiJ

Black Dcode

Hispanic Dcode

White Dcode


2 (Asian Referent)

bweight

tvalue

bweight

tvalue

bweight

tvalue

bweight

tvalue


.73

941.32

1.93

2629.14

3.42*

3723.27

4.73*

4009.46

5.21*

*Statistically significant at p < .05
As is evident from the regression results, the dummy coding scheme does matter. In the first example, one would conclude that the Asian group has a significantly higher salary than the White group, after accounting for TimeinJob. In the second example, one would conclude that the Black, Hispanic, and White groups ALL have significantly lower salaries than the Asian group, after accounting for TimeinJob. Moreover, it is clear from the comparison of the R2 values that the two models are exactly the same in terms of how much overall variance is accounted for as well as the contribution of TimeinJob. This is not a surprise, as exactly the same variables are included in each model: TimeinJob and a set of dummy codes fully representing race. The models only differ in terms of which races are being compared to one another.
We present a slightly different data set and new regression results to highlight a different scenario in which under one theory of discrimination, there are no statistical flags, but under an alternative theory of discrimination, there are statistical flags. The presented results are based on the small data set below.
Employee ID

Salary

TimeinJob

Race

919

$37,001

1.00

Asian

527

$39,420

2.33

Asian

287

$38,510

1.33

Asian

435

$38,376

1.67

Asian

96

$39,796

0.33

Asian

480

$36,174

0.67

Black

184

$34,801

1.67

Black

77

$35,259

1.67

Black

660

$36,682

1.00

Black

945

$32,100

0.67

Black

620

$36,815

1.67

Hispanic

810

$33,459

1.00

Hispanic

700

$33,017

0.67

Hispanic

144

$34,429

4.00

Hispanic

28

$34,884

0.33

Hispanic

225

$42,855

2.00

White

404

$40,737

2.33

White

316

$34,266

0.67

White

685

$33,808

1.67

White

369

$33,331

1.00

White

See the summary table below for average salary and TimeinJob (expressed as years) for each racial/ethnic group.
Race Group

Salary

TimeinJob

Asian

$38,621

1.33

Black

$35,003

1.13

Hispanic

$34,521

1.53

White

$36,999

1.53

Using the theory of discrimination presented in the first example, we hypothesize that at least one race category is potentially underpaid compared to the White category. The dummy coding scheme would follow that presented earlier for Example 1. The results presented in the “Example 3” row found in the table below highlight that the average salaries for the Black, Hispanic, and Asian groups are not significantly different than the average salaries for the White group, after accounting for the influence of TimeinJob.
Alternatively, suppose our theory of discrimination is consistent with that used in the second example, in which we hypothesize that at least one race category is potentially underpaid compared to the Asian category (i.e., the Race group with the highest average salary, merit variables notwithstanding). Again, our dummy coding scheme would follow that presented for the second example. The results presented in the “Example 4” row found in the table below indicate that the average salary for the White group is not significantly different from the average salary for the Asian group, after accounting for TimeinJob. Conversely, however, the results indicate that the average salary for the Black and Hispanic groups are significantly lower than the average salary for the Asian group, after accounting for TimeinJob.
Example

R^{2}

TiJ

Black Dcode

Hispanic Dcode

Asian Dcode


3 (White Referent)

bweight

tvalue

bweight

tvalue

bweight

tvalue

bweight

tvalue


.40

858.59

1.26

1654.48

1.02

2478.60

1.55

1794.63

1.12


Example

R^{2}

TiJ

Black Dcode

Hispanic Dcode

White Dcode


4 (Asian Referent)

bweight

tvalue

bweight

tvalue

bweight

tvalue

bweight

tvalue


.40

858.59

1.26

3449.12

2.15*

4273.23

2.67*

1794.63

1.12

We hope that this white paper helps to clarify DCI’s concerns about the lack of agency guidance on the approach that federal contractors should take when conducting compensation analyses. It is our hope that the OFCCP considers the problems resulting from an absent central strategy on the issue. We look forward to publication of official guidance from OFCCP to help contractors conduct proactive analyses in manners consistent with the agency’s approach.
_________________________________________________________________
[1] It is evident by looking at the patterns of 1s and 0s across the three dummy codes that the set of dummy codes will fully represent race when entered into the regression equation. The Asian category has a pattern of 001 across dummy codes, the Black category has a pattern of 100 across dummy codes, the Hispanic category has a pattern of 010 across dummy codes, and the White category has a pattern of 000 across dummy codes.
[2] The regression equation would be Salary = Intercept + b1(TimeinJob) + b2(Black Dcode) + b3(Hispanic Dcode) + b4(Asian Dcode) [3] It is evident by looking at the patterns of 1s and 0s across the three dummy codes that the set of dummy codes will fully represent race when entered into the regression equation. The Asian category has a pattern of 000 across dummy codes, the Black category has a pattern of 100 across dummy codes, the Hispanic category has a pattern of 010 across dummy codes, and the White category has a pattern of 001 across dummy codes.
[4] The regression equation would be Salary = Intercept + b1(TimeinJob) + b2(Black Dcode) + b3(Hispanic Dcode) + b4(White Dcode)
by Kayo Sady, Ph.D., Consultant, DCI Consulting Group