compare slopes in linear regression

Oct 2, 2020 3 min read statistics, rstats

I asked this question on twitter.

load the package

library(tidyverse)

make some dummy data

The dummy example: We have two groups of samples: disease and health. We treat those cells in vitro with different dosages (0, 1, 5) of a chemical X and count the cell number after 3 hours.

x <- tibble(
  '0' = c(8.66,
               11.50,
               7.01,
               13.40,
               11.30,
               8.13,
               5.92,
               7.54),
  '1' = c(22.10,
               23.00,
               22.00,
               35.70,
               32.60,
               26.90,
               23.30,
               21.60),
  '5' = c(31.20,
               30.60,
               28.40,
               37.30,
               41.40,
               40.20,
               29.20,
               35.40))

y <- tibble(
  '0' = c(13.90,
               5.67,
               2.59,
               9.77,
               5.63,
               4.59,
               7.35,
               5.92),
  '1' = c(12.60,
               8.48,
               5.59,
               5.43,
               13.10,
               18.00,
               23.90,
               14.50),
  '5' = c(20.00,
               10.70,
               9.73,
               16.20,
               15.10,
               21.30,
               25.60,
               18.40))

x<- x %>% tidyr::gather(1:3, key= "dose", value = "cells") %>%
  mutate(group = "Health")

y<- y %>% tidyr::gather(1:3, key= "dose", value = "cells") %>%
  mutate(group = "Disease")

z<- rbind(x,y)

ggplot(z, aes(x= dose, y = cells)) +
  geom_boxplot(aes(color = dose)) +
  geom_point() +
  facet_wrap(~group) +
  theme_bw(base_size = 14)

We see that as the dosage increases, the cell number increases in both disease and healthy groups. However, the question is that if the cell number increases faster in the healthy group than the disease group or not? How can I get a p-value for that?

As Chenxin Li pointed out. The dose has to be numeric:

z<- z %>%
  mutate(dose = as.numeric(dose)) 

lm_res<- lm(cells ~  dose *group, data = z)
summary(lm_res)

## 
## Call:
## lm(formula = cells ~ dose * group, data = z)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.839 -4.918 -1.357  3.801 16.771 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        8.7048     1.7367   5.012 9.26e-06 ***
## dose               1.7737     0.5899   3.007  0.00435 ** 
## groupHealth        6.0546     2.4560   2.465  0.01766 *  
## dose:groupHealth   2.3958     0.8343   2.872  0.00626 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.243 on 44 degrees of freedom
## Multiple R-squared:  0.6839, Adjusted R-squared:  0.6624 
## F-statistic: 31.74 on 3 and 44 DF,  p-value: 4.443e-11

Interpret the result

\(beta 0 = 8.7\) is the intercept.
\(beta 1 = 1.77\) is the coefficient of the dose. It is interpreted as the unique effect of dose when group is disease (reference group).
\(beta 2 = 6.05\) is the coefficient of the group. It is interpreted as the effect of group when the dose is 0.
\(beta 3 = 2.39\) is the coefficient of the interaction term dose:group. \(beta3\) is the difference in the slopes of non-reference minus reference (reference is the Disease group chosen by R alphabetically).

In other words, the slope of Healthy group - Disease group = 2.39, which means the cell number in the healthy group increases faster. The p-value is 0.00626

I want to thank Collin in Shirley’s lab and other tweeps for helping out for my question.

I found this explanation of how to inteprete the interaction term is very good. I highly recommend reading through this as well: https://lindeloev.github.io/tests-as-linear/#62_two-way_anova_(plot_in_progress)

statistics rstats linear-regresssion

compare slopes in linear regression

load the package

make some dummy data

Interpret the result

Related