Skip to main content Link Search Menu Expand Document (external link)
Table of contents
  1. 03b_R_association_testing.R
    1. Example - Risk score testing
    2. Example - Non-risk score testing

03b_R_association_testing.R

This script is an alternative to the 03a_PLINK_association_testing.R script for running association analyses using R. While not streamlined for large-scale PheWAS, this script offers greater flexibility in the kind of analyses that can be run, such as fitting non-standard models or if the genetic data are not in a format readable by standard software. Further, this script allows for analyses of genomic risk scores (GRS), polygenic risk score (PRS), and multi-allelic copy number variants (CNV).

The script has slightly different inputs when testing risk scores vs non-risk scores (CNVs or similar).

Risk score testing

If testing GRS or PRS a guide must be provided as a csv file that acts to point the script to the correct data. A template GRS_PRS_template.csv.gz can be found here $package_folder/extdata/GRS_PRS_template.csv.gz and should be copied to another location to allow editing. The file contains five columns.

  • trait: Name of the trait that the GRS/PRS has been created from.
  • group: Grouping variable that the analysis is subset to, if no grouping has been used the default is all.
  • phenotype_data: Full file path for the phenotype data that will be used in the association analysis created using 01_phenotype_preparation.R.
  • genetic_data: Full file path to the genetic data for the GRS/PRS.
  • column_name: Column name in the genetic data file containing the relevant GRS genetic data.

GRS/PRS data is analysed per trait-group combination, results are saved as a R list object per trait that contain all trait-group combinations, this allows multiple traits to be analysed across multiple groups via a single input to the script.

Non-GRS/PRS testing

Non-GRS/PRS data is assumed to be an alternative genetic measurement with a column of IDs followed by 1-n columns of genetic data, example CNVs. Results are per column of the input file (minus ID column) with a single R object saved containing a list of results per-group, such that each non-ID column will be tested for association with all available phenotypes in each of the inputted groups. Unlike the GRS/PRS input the full location of all the phenotype files is required as input.

Covariates

If a covariate file is included then it must contain a column age, when selecting covariates note currently only participants with full covariate data will be analysed.

Usage Risk score testing

03b_R_association_testing.R --analysis_folder=<folder> --GRS_input=<FILE> 
[--covariates=<FILE> --N_cores=<number> --PheWAS_manifest_overide=<FILE> 
--use_existing_results_file=<FILE>] 
[--phenotype_inclusion_file=<FILE> | --phenotype_exclusion_file=<FILE>]

Non-risk score testing

03b_R_association_testing.R --analysis_folder=<folder> --non_GRS_data=<FILE> 
--phenotype_files=<FILE> --analysis_name=<name> 
[--covariates=<FILE> --group_name_overide=<text> --N_cores=<number> 
--PheWAS_manifest_overide=<FILE> --use_existing_results_file=<FILE>] 
[--phenotype_inclusion_file=<FILE> | --phenotype_exclusion_file=<FILE>]

For full details of all of the arguments please use the help argument -h.


Example - Risk score testing

For the example data we have provided a separate template for the --GRS_input found in $package_folder/extdata/worked_example/GRS_guide and produced simulated genetic data here $package_folder/extdata/worked_example/GRS_example_data.gz. First copy $package_folder/extdata/worked_example/GRS_guide to $phewas_folder/data/GRS_guide, edit the third column by selecting any of the complete phenotype tables found in $phewas_folder/phenotype_table and fourth column by adding in the full path to the $package_folder at the beginning of the existing text in the column.

./03b_R_association_testing.R \
--analysis_folder $phewas_folder/analysis/example_GRS \
--GRS_input /$phewas_folder/data/GRS_guide \
--PheWAS_manifest_overide $package_folder/extdata/worked_example/example_manifest.gz

One R object is generated holding the results found here /$phewas_folder/analysis/example_GRS/example/example_results_list.RDS.


Example - Non-risk score testing

Here we have used the initial example phenotype table created by the 01_phenotype_preparation.R, feel free to change to any of the other examples provided.

./03b_R_association_testing.R \
--analysis_folder $phewas_folder/analysis/example_CNV \
--non_GRS_data $package_folder/extdata/worked_example/CNV_example.gz \
--analysis_name CNV_example \
--phenotype_files  $phewas_folder/phenotype_table/all_pop_example_table.gz \
--PheWAS_manifest_overide $package_folder/extdata/worked_example/example_manifest.gz

One R object is generated holding the results found here /$phewas_folder/analysis/example_CNV/CNV_example_all_group_results_list.RDS.