Last updated: 2024-12-18

Checks: 7 0

Knit directory: survival-susie/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20230201) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version e6ba60e. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    analysis/.DS_Store

Untracked files:
    Untracked:  survival_demo.RData

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/coxph_susie_intro.Rmd) and HTML (docs/coxph_susie_intro.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd e6ba60e Peter Carbonetto 2024-12-18 workflowr::wflow_publish("analysis/coxph_susie_intro.Rmd", verbose = TRUE,
html ede6168 Peter Carbonetto 2024-12-18 Added prior variance estimation and the SER BFs to the
Rmd 0789339 Peter Carbonetto 2024-12-18 workflowr::wflow_publish("analysis/coxph_susie_intro.Rmd", verbose = TRUE,
html 9189825 Peter Carbonetto 2024-12-13 In coxph_susie_intro demo, added call to ibss_from_ser() with
Rmd e483e17 Peter Carbonetto 2024-12-13 workflowr::wflow_publish("analysis/coxph_susie_intro.Rmd", verbose = TRUE)
html dd1fd56 Peter Carbonetto 2024-12-10 Updated the coxph_susie_intro page.
Rmd 0dec649 Peter Carbonetto 2024-12-10 Updated authors and date for coxph_susie_intro vignette.
html 183f1fb Peter Carbonetto 2024-12-10 Added comparison of coefs to coxph_susie_intro demo.
Rmd 991bd75 Peter Carbonetto 2024-12-10 workflowr::wflow_publish("coxph_susie_intro.Rmd", verbose = TRUE,
html de9be10 Peter Carbonetto 2024-12-09 Small fix to one of the plots in the introductory vignette.
Rmd 35b0289 Peter Carbonetto 2024-12-09 workflowr::wflow_publish("coxph_susie_intro.Rmd", verbose = TRUE)
html 7442aeb Peter Carbonetto 2024-12-09 Added more explanatory text to the introductory vignette.
Rmd 10c35d0 Peter Carbonetto 2024-12-09 workflowr::wflow_publish("coxph_susie_intro.Rmd", verbose = TRUE)
Rmd caed3b0 Peter Carbonetto 2024-12-09 Made a few improvements to the text of the intro vignette.
Rmd b323562 Peter Carbonetto 2024-12-09 Added fine-mapping step and fine-mapping plots to coxph_susie_intro Rmd.
Rmd 0328388 Peter Carbonetto 2024-12-09 Added a bit of text to the coxph_susie_intro vignette.
html b849946 Peter Carbonetto 2024-12-09 Added some material to the coxph-susie demo.
html 92e5671 Peter Carbonetto 2024-12-09 Added link to the overview page.
Rmd 32e28c5 Peter Carbonetto 2024-12-09 workflowr::wflow_publish("analysis/index.Rmd")
html d421a83 Peter Carbonetto 2024-12-06 Build site.
Rmd 2a72a02 Peter Carbonetto 2024-12-06 workflowr::wflow_publish("analysis/coxph_susie_intro.Rmd")
html be527ce Peter Carbonetto 2024-12-06 First build of the coxph_susie_intro vignette.
Rmd f62c839 Peter Carbonetto 2024-12-06 workflowr::wflow_publish("analysis/coxph_susie_intro.Rmd")

This short vignette is intended to (i) introduce CoxPH-SuSiE for fine-mapping time-to-event (TTE) phenotypes, and (ii) illustrate how to use CoxPH-SuSiE on an example genetic data set.

Initial setup

To run the code below, you will need the survival, susieR and logisticsusie R packages, which can be installed with these commands in R:

install.packages("remotes")
install.packages("survival")
remotes::install_github("stephenslab/susieR")
remotes::install_github("karltayeb/logisticsusie")

Also, we will use the ggplot2 and cowplot packages to create some helpful visualizations of the data and results:

install.packages("ggplot2")
install.packages("cowplot")

Once the packages are installed, load them.

library(survival)
library(susieR)
library(logisticsusie)
library(ggplot2)
library(cowplot)

Set the seed for reproducibility.

set.seed(1)

The data set

The example data set is found in the “data” directory of this GitHub repository, or can be downloaded by running these commands in R:

data_url <- paste0("https://raw.githubusercontent.com/",
                   "yunqiyang0215/survival-susie/",
                   "refs/heads/master/data/survival_demo.RData")
download.file(data_url,"survival_demo.RData")
load("survival_demo.RData")

In this example, the TTE data are “censored” disease diagnosis times, meaning that the observations are either: the time of diagnosis (for uncensored samples); or a lower limit on the time of diagnosis for censored samples. The censored samples are indicated by the “+” (this is called “right censoring”):

pheno <- Surv(time,status)
head(pheno,n = 10)
#  [1] 0.3085771  0.4654124  2.7448602+ 1.3071323+ 0.2252482  3.3300303+
#  [7] 1.0663338+ 7.4103627+ 0.9918607  0.0637775

Among the 574 diagnosis times in this data set, 330 are observed, and the remaining 244 are right-censored:

table(status)
# status
#   0   1 
# 244 330

The “geno” matrix contains the (imputed) genotype data for 1,000 candidate genetic variants (SNPs) located near each other on the chromosome.

nrow(geno)
ncol(geno)
# [1] 574
# [1] 1000

Association analysis using the CoxPH model

Association tests performed using the Cox proportional hazards (CoxPH) model show that many of thee genetic variants are associated with diagnosis time:

pvals <-
  apply(geno,2,
        function (x) summary(coxph(pheno ~ x))$coefficients[1,"Pr(>|z|)"])
pdat <- data.frame(pos = 1:1000,pval = pvals,causal = (b != 0))
ggplot(pdat,aes(x = pos,y = -log10(pval),color = causal,shape = causal)) +
  geom_point(size = 1.25) +
  scale_color_manual(values = c("black","darkorange")) +
  scale_shape_manual(values = c(1,17)) +
  labs(x = "chromosomal position",y = "-log10 p-value") +
  theme_cowplot(font_size = 10)

Version Author Date
de9be10 Peter Carbonetto 2024-12-09
7442aeb Peter Carbonetto 2024-12-09
92e5671 Peter Carbonetto 2024-12-09

Since we simulated these data, we know that 3 of the 1,000 SNPs truly affect the TTE phenotype (drawn in orange in the plot above). All three of these “causal” SNPs show evidence for association from the CoxPH-based association tests, but the strength of support for association varies quite a bit among the three SNPs. Further, many more SNPs that do not affect the phenotype are also strongly associated with the phenotype. This happens because many of the SNPs are strongly correlated with each other (in strong linkage disequilibrium, or “LD”).

Next we will run the CoxPH-SuSiE fine-mapping method to try and narrow the association signal down to a smaller set of causal candidates.

Fine-mapping analysis using the CoxPH model

To run the fine-mapping, we first need to define a function that performs a (Bayesian) univariate association analysis (here, using the CoxPH model). More precisely, this function takes as input the regression data (X, Y, as well as an additional “offset” term), and the prior variance of the regression coefficient (v0), and outputs four quantities: (1) the posterior mean of the regression coefficient; (2) its posterior variance; (3) the log-Bayes factor comparing the model against the “null” model (when the coefficient is zero); and (4) an estimate of the prior variance parameter (here we use the maximum-likelihood estimate, which is simply the posterior second moment of the regression coefficient. (This last quantity can be a constant if you want to keep the prior variance at a fixed value.)

surv_uni_fun <- function (x, y, e, prior_variance,
                          estimate_intercept = 0, ...) {
  v0   <- prior_variance                         
  fit  <- coxph(y ~ x + offset(e))
  out  <- summary(fit)$coefficients
  bhat <- out[1,"coef"]
  s    <- out[1,"se(coef)"]
  z    <- bhat/s
  lbf  <- log(s^2/(v0 + s^2))/2 + z^2/2*v0/(v0 + s^2)
  lbf  <- lbf - z^2/2 + summary(fit)$logtest["test"]/2
  v1   <- 1/(1/v0 + 1/s^2)
  mu1  <- v1*bhat/s^2
  return(list(mu = mu1,var = v1,lbf = lbf,
              prior_variance = mu1^2 + v1,
              intercept = 0))
}

Now we can fit the CoxPH-based SuSiE model using the ibss_from_ser() function from the logisticsusie package. Note that this step may take a minute or two to run depending on the computing platform:

t0 <- proc.time()
fit <- ibss_from_ser(geno,pheno,L = 3,tol = 0.0001,maxit = 100,
                     ser_function = ser_from_univariate(surv_uni_fun))
t1 <- proc.time()
print(t1 - t0)
# 7.216 sec elapsed
#    user  system elapsed 
#   7.052   0.169   7.222

(As an aside, note that the univariate regression computations are trivially parallelizable, so you have a multicore computing platform, you can make the method run faster, e.g.,

t0 <- proc.time()
fit <- ibss_from_ser(geno,pheno,L = 3,tol = 0.0001,maxit = 100,
                     ser_function = ser_from_univariate(surv_uni_fun),
                     num_cores = 4)
t1 <- proc.time()
print(t1 - t0)
# 2.586 sec elapsed
#    user  system elapsed 
#   8.151   1.504   2.590

In this run we were able to achieve about a 3x speedup with 4 threads.)

Now we use the susie_get_cs() function from the susieR package to compute credible sets (CSs):

class(fit) <- c("susie","list")
out <- susie_get_cs(fit,geno,coverage = 0.95,min_abs_cor = 0.1)
out$cs
# $L2
# [1] 994
# 
# $L1
# [1] 550 555 561 566 582 598

Interestingly, in one credible set, CoxPH-SuSiE is able to identify a single causal SNP with very high confidence (shown in blue in the plot). Additionally, a second set of causal candidates is identified with 6 plausible candidates (shown in orange). Both of these CSs contain the true causal SNPs.

pdat <- cbind(pdat,CS = 0)
pdat[out$cs$L1,"CS"] <- 1
pdat[out$cs$L2,"CS"] <- 2
pdat <- transform(pdat,CS = factor(CS))
cs_colors <- c("black","darkorange","dodgerblue")
ggplot(pdat,aes(x = pos,y = -log10(pval),color = CS,shape = causal)) +
  geom_point(size = 1.25) +
  scale_color_manual(values = cs_colors) +
  scale_shape_manual(values = c(1,17)) +
  labs(x = "chromosomal position",y = "-log10 p-value") +
  theme_cowplot(font_size = 10)

Version Author Date
7442aeb Peter Carbonetto 2024-12-09

The PIPs (“posterior inclusion probabilities”) measure the relative support for the SNPs within each CS. Plotting the PIPs for all SNPs is helpful for understanding precisely which SNPs are the top candidates:

pdat <- cbind(pdat,pip = fit$pip)
ggplot(pdat,aes(x = pos,y = pip,color = CS,shape = causal)) +
  geom_point(size = 1.25) +
  scale_color_manual(values = cs_colors) +
  scale_shape_manual(values = c(1,17)) +
  labs(x = "chromosomal position",y = "PIP") +
  theme_cowplot(font_size = 10)

Version Author Date
9189825 Peter Carbonetto 2024-12-13
7442aeb Peter Carbonetto 2024-12-09

For example, in CS 1 (orange), 3 out of the 6 SNPs are much stronger candidates than the others.

It is also interesting that the third causal SNPs that was not identified does show some modest evidence for being a causal SNP in the PIP plot.

Let’s compare the coefficient estimates against the ground-truth estimates. These are contained in the “mu” output. Here we see that the first CS (L1) accurately estimates the coefficient for the second causal SNP (the true value is -2), and the second CS (L2) accurately estimates the coefficient for the third causal SNP (the true value is 2).

rownames(fit$mu) <- c("L1","L2","L3")
causal_snps <- which(b != 0)
round(rbind(b,fit$mu)[,causal_snps],digits = 3)
#     [,1]   [,2]  [,3]
# b  1.000 -2.000 2.000
# L1 1.215 -2.088 0.252
# L2 0.834 -0.076 1.927
# L3 0.988 -0.063 0.127

Finally, note that CoxPH-SuSiE is robust to overestimating the number of causal SNPs. For example, suppose we set the number of causal SNPs to 10:

fit <- ibss_from_ser(geno,pheno,L = 10,tol = 0.0001,maxit = 100,
                     ser_function = ser_from_univariate(surv_uni_fun))

The log-Bayes factors quantifying support for each CS (i.e., each causal SNP) are in the “lbf_ser” output:

fit$lbf_ser
#         L1         L2         L3         L4         L5         L6         L7 
# 113.309194   8.742860  -1.467698  -1.465683  -1.463014  -1.460718  -1.459312 
#         L8         L9        L10 
#  -1.458954  -1.459594  -1.461082

After the first two CSs, the remaining 8 CSs provide very little improvement in fit. This result tells us that we can focus on the results for the first two CSs and ignore the rest. (Note these log-Bayes factors are computed using the natural logarithm.)


sessionInfo()
# R version 4.3.3 (2024-02-29)
# Platform: aarch64-apple-darwin20 (64-bit)
# Running under: macOS Sonoma 14.7.1
# 
# Matrix products: default
# BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
# LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# time zone: America/Chicago
# tzcode source: internal
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] cowplot_1.1.3            ggplot2_3.5.0            logisticsusie_0.0.0.9007
# [4] susieR_0.12.42           survival_3.5-8           workflowr_1.7.1         
# 
# loaded via a namespace (and not attached):
#  [1] gtable_0.3.4       xfun_0.42          bslib_0.6.1        processx_3.8.3    
#  [5] lattice_0.22-5     callr_3.7.5        vctrs_0.6.5        tools_4.3.3       
#  [9] ps_1.7.6           generics_0.1.3     parallel_4.3.3     tibble_3.2.1      
# [13] fansi_1.0.6        highr_0.10         pkgconfig_2.0.3    Matrix_1.6-5      
# [17] RcppParallel_5.1.7 lifecycle_1.0.4    compiler_4.3.3     farver_2.1.1      
# [21] stringr_1.5.1      git2r_0.33.0       tictoc_1.2.1       munsell_0.5.0     
# [25] getPass_0.2-4      httpuv_1.6.14      htmltools_0.5.7    sass_0.4.8        
# [29] yaml_2.3.8         later_1.3.2        pillar_1.9.0       crayon_1.5.2      
# [33] jquerylib_0.1.4    whisker_0.4.1      cachem_1.0.8       tidyselect_1.2.1  
# [37] digest_0.6.34      stringi_1.8.3      purrr_1.0.2        dplyr_1.1.4       
# [41] labeling_0.4.3     splines_4.3.3      rprojroot_2.0.4    fastmap_1.1.1     
# [45] grid_4.3.3         colorspace_2.1-0   cli_3.6.2          magrittr_2.0.3    
# [49] Rfast_2.1.0        utf8_1.2.4         withr_3.0.0        scales_1.3.0      
# [53] promises_1.2.1     RcppZiggurat_0.1.6 rmarkdown_2.26     httr_1.4.7        
# [57] matrixStats_1.2.0  evaluate_0.23      knitr_1.45         irlba_2.3.5.1     
# [61] rlang_1.1.3        Rcpp_1.0.12        mixsqp_0.3-54      glue_1.7.0        
# [65] rstudioapi_0.15.0  reshape_0.8.9      jsonlite_1.8.8     R6_2.5.1          
# [69] plyr_1.8.9         fs_1.6.3