Learn R Programming

SIP R Package

The SIP R Package is designed to help researchers implement the single-iteration permutation (SIP) method for large-scale biobank data (Annis et. al. “False discovery rates for genome-wide association tests in biobanks with thousands of phenotypes.” https://doi.org/10.21203/rs.3.rs-873449/v1).

Briefly, the SIP method takes advantage of analyzing many phenotypes across a biobank simultaneously. When a large number of phenotypes are analyzed in parallel, a single permutation across all phenotypes followed by genetic association analyses of the permuted data enables estimation of false discovery rates (FDRs) across the phenome. FDR estimates in turn help with interpretation of genetic associations in a biobank context.

SIP Example

The basic permutation procedure can be used with unrelated samples. SIP requires a sample-by-variable input file containing an ID, genotypic covariates, phenotypes, and phenotypic covariates:

sip_exampleData[1:5, 1:15]
##        FID     IID ANCESTRY SEX AGE BATCH STUDY         PC1       PC2
## 1 FAM00001 ID00001      AFR   1  78     1  STU1 -0.00997818 0.0363497
## 2 FAM00002 ID00002      AFR   1  86     1  STU1 -0.01303270 0.0349262
## 3 FAM00003 ID00003      AFR   1  71     1  STU1 -0.01071080 0.0345597
## 4 FAM00004 ID00004      EUR   2  54     1  STU1 -0.01244230 0.0364125
## 5 FAM00005 ID00005      EUR   2  74     1  STU1 -0.01109950 0.0370039
##            PC3        PC4 PHENO1 PHENO2 PHENO3 PHENO4
## 1 -0.002534320 0.00362118      0      0      0      0
## 2  0.002535730 0.00291765      1      1      0      0
## 3 -0.002234060 0.00399563      1      1      0      1
## 4 -0.000127672 0.00474610      1      1      0      1
## 5 -0.001726030 0.00579052      1      1      1      1

The default permutation method will permute within biological sex and requires specification of a sex variable and male and female values as well as ID and genotypic covariates. Additional ID variables (e.g., family ID) can be placed within the genotypic covariate vector. For example:

sip(df = sip_exampleData, id.var = "IID", sex.var = "SEX", male.val = 1, female.val = 2, geno.vars = c("FID","ANCESTRY","BATCH",paste0("PC",1:4)))

Users may specify a seed. If not specified, a random seed will be chosen:

sip(df = sip_exampleData, id.var = "IID", sex.var = "SEX", male.val = 1, female.val = 2, geno.vars = c("FID","ANCESTRY","BATCH",paste0("PC",1:4)), seed = 123)

If permutation without regard to sex is desired, the within.sex argument can be set to FALSE and the sex variable and male and female values are no longer required:

sip(df = sip_exampleData, id.var = "IID", within.sex = FALSE, geno.vars = c("FID","ANCESTRY","BATCH",paste0("PC",1:4)))

SIP-pair Example

The paired permutation procedure should be used with related samples:

sipPair_exampleData[1:5, 1:15]
##       FID     IID BATCH SEX AGE     PC1     PC2    PC3     PC4 PHENO1 PHENO2
## 1 FAM0001 ID00001     1   M  72 -0.0006 -0.0006 0.0034 -0.0010      0      0
## 2 FAM0001 ID00002     1   F  80 -0.0006 -0.0010 0.0033  0.0007      0      0
## 3 FAM0002 ID00003     1   F  82  0.0052 -0.0001 0.0022  0.0001      0      0
## 4 FAM0002 ID00004     1   M  80  0.0055 -0.0006 0.0020  0.0015      1      0
## 5 FAM0003 ID00005     1   F  78 -0.0013 -0.0012 0.0007 -0.0002      0      0
##   PHENO3 PHENO4 PHENO5 PHENO6
## 1      1      0      0      1
## 2      1      1      0      1
## 3      1      0      0      0
## 4      1      1      1      0
## 5      1      0      0      0

SIP-pair requires a data frame with identity by descent (IBD) relatedness information between individuals:

head(sipPair_relatednessData)
##         IID1    IID2 PropIBD
## 8343 ID04318 ID06117  0.0016
## 5839 ID06013 ID06015  0.4766
## 3122 ID02006 ID02007  0.5000
## 9641 ID04318 ID07882  0.0000
## 1989 ID06986 ID06987  0.5114
## 2450 ID04400 ID04401  0.5029

The default permutation method will permute within biological sex and requires specification of a sex variable and male and female values as well as ID and genotypic covariates. Additional ID variables (e.g., family ID) can be placed within the genotypic covariate vector. SIP-pair also requires specification of the relatedness data frame, pairs of ID variables within the relatedness data frame, and an IBD variable:

sip_pair(df = sipPair_exampleData, id.var = "IID", sex.var = "SEX", male.val = "M", female.val = "F", geno.vars = c("FID","BATCH",paste0("PC",1:4)), rel.df = sipPair_relatednessData, rid.vars=c("IID1","IID2"), ibd.var="PropIBD")

Users may specify a seed. If not specified, a random seed will be chosen:

sip_pair(df = sipPair_exampleData, id.var = "IID", sex.var = "SEX", male.val = "M", female.val = "F", geno.vars = c("FID","BATCH",paste0("PC",1:4)), rel.df = sipPair_relatednessData, rid.vars=c("IID1","IID2"), ibd.var="PropIBD", seed = 123)

If permutation without regard to sex is desired, the within.sex argument can be set to FALSE and the sex variable and male and female values are no longer required:

sip_pair(df = sipPair_exampleData, id.var = "IID", geno.vars = c("FID","BATCH",paste0("PC",1:4)), rel.df = sipPair_relatednessData, rid.vars=c("IID1","IID2"), ibd.var="PropIBD", within.sex = FALSE)

Phenotype Correlation Plots

The SIP method maintains correlation among phenotypes:

  1. Permute data
sip_exampleData_permuted <- sip(df = sip_exampleData, id.var = "IID", sex.var = "SEX", male.val = 1, female.val = 2, geno.vars = c("FID","ANCESTRY","BATCH",paste0("PC",1:4)))
## [1] "Seed: 453014"
sip_exampleData_permuted[1:5, 1:15]
##        FID     IID ANCESTRY SEX AGE BATCH STUDY         PC1       PC2
## 1 FAM00001 ID00001      AFR   1  41     1  STU1 -0.00997818 0.0363497
## 2 FAM00002 ID00002      AFR   1  59     1  STU1 -0.01303270 0.0349262
## 3 FAM00003 ID00003      AFR   1  67     1  STU1 -0.01071080 0.0345597
## 4 FAM00004 ID00004      EUR   2  36     1  STU2 -0.01244230 0.0364125
## 5 FAM00005 ID00005      EUR   2  64     1  STU1 -0.01109950 0.0370039
##            PC3        PC4 PHENO1 PHENO2 PHENO3 PHENO4
## 1 -0.002534320 0.00362118      1      1      1      0
## 2  0.002535730 0.00291765      0      0      1      0
## 3 -0.002234060 0.00399563      0      1      0      0
## 4 -0.000127672 0.00474610      1      1      0      1
## 5 -0.001726030 0.00579052      1      1      1      1
  1. Plot correlations between phenotypes in the primary data
plot_phenotype_correlations(df = sip_exampleData, pheno.vars = paste0("PHENO",1:500))

  1. Plot correlations between phenotypes in the permuted data
plot_phenotype_correlations(df = sip_exampleData_permuted, pheno.vars = paste0("PHENO",1:500))

Phenotype-IBD Correlation

The SIP-pair method preserves correlation among related samples:

  1. Permute data without pairing
sipPair_exampleData_permuted <- sip(df = sipPair_exampleData, id.var = "IID", sex.var = "SEX", male.val = "M", female.val = "F", geno.vars = c("FID","BATCH",paste0("PC",1:4)))
## [1] "Seed: 821244"
sipPair_exampleData_permuted[1:5, 1:15]
##       FID     IID BATCH SEX AGE     PC1     PC2    PC3     PC4 PHENO1 PHENO2
## 1 FAM0001 ID00001     1   M  75 -0.0006 -0.0006 0.0034 -0.0010      0      0
## 2 FAM0001 ID00002     1   F  62 -0.0006 -0.0010 0.0033  0.0007      1      0
## 3 FAM0002 ID00003     1   F  67  0.0052 -0.0001 0.0022  0.0001      0      0
## 4 FAM0002 ID00004     1   M  78  0.0055 -0.0006 0.0020  0.0015      1      0
## 5 FAM0003 ID00005     1   F  67 -0.0013 -0.0012 0.0007 -0.0002      1      0
##   PHENO3 PHENO4 PHENO5 PHENO6
## 1      0      0      1      0
## 2      1      0      0      0
## 3      0      0      0      0
## 4      0      0      0      0
## 5      0      0      1      0
  1. Permute data with pairing
sipPair_exampleData_pairPermuted <- sip_pair(df = sipPair_exampleData, id.var = "IID", sex.var = "SEX", male.val = "M", female.val = "F", geno.vars = c("FID","BATCH",paste0("PC",1:4)), rel.df = sipPair_relatednessData, rid.vars=c("IID1","IID2"), ibd.var="PropIBD")
## [1] "Seed: 903890"
sipPair_exampleData_pairPermuted[1:5, 1:15]
##       FID     IID BATCH SEX AGE     PC1     PC2    PC3     PC4 PHENO1 PHENO2
## 1 FAM0001 ID00001     1   M  84 -0.0006 -0.0006 0.0034 -0.0010      0      0
## 2 FAM0001 ID00002     1   F  85 -0.0006 -0.0010 0.0033  0.0007      0      0
## 3 FAM0002 ID00003     1   F  65  0.0052 -0.0001 0.0022  0.0001      1      0
## 4 FAM0002 ID00004     1   M  82  0.0055 -0.0006 0.0020  0.0015      0      0
## 5 FAM0003 ID00005     1   F  60 -0.0013 -0.0012 0.0007 -0.0002      1      0
##   PHENO3 PHENO4 PHENO5 PHENO6
## 1      1      1      1      0
## 2      0      0      0      0
## 3      0      0      0      0
## 4      0      0      0      0
## 5      0      0      0      0
  1. Calculate pairwise correlations between samples in primary, permuted, and pair-permuted phenotype data. Then calculate correlation between pairwise phenotype correlations and pairwise IBD status for samples.
phenotype_IBD_correlation(df = sipPair_exampleData, id.var = "IID", rel.df = sipPair_relatednessData, rid.vars=c("IID1","IID2"), ibd.var="PropIBD", pheno.vars = paste0("PHENO",1:300))
phenotype_IBD_correlation(df = sipPair_exampleData_permuted, id.var = "IID", rel.df = sipPair_relatednessData, rid.vars=c("IID1","IID2"), ibd.var="PropIBD", pheno.vars = paste0("PHENO",1:300))
phenotype_IBD_correlation(df = sipPair_exampleData_pairPermuted, id.var = "IID", rel.df = sipPair_relatednessData, rid.vars=c("IID1","IID2"), ibd.var="PropIBD", pheno.vars = paste0("PHENO",1:300))

Multiple paired permutations show preserved correlation between pairwise phenotypes and IBD status. Multiple non-paired permutations show correlations around 0 for pairwise phenotypes and IBD status.

## [1] "Seed: 843387"
## [1] "Seed: 929772"
## [1] "Seed: 70567"
## [1] "Seed: 983156"
## [1] "Seed: 823200"
## [1] "Seed: 105573"
## [1] "Seed: 588627"
## [1] "Seed: 156553"
## [1] "Seed: 57171"
## [1] "Seed: 540842"
## [1] "Seed: 899929"
## [1] "Seed: 728781"
## [1] "Seed: 752220"
## [1] "Seed: 39710"
## [1] "Seed: 897873"
## [1] "Seed: 710215"
## [1] "Seed: 705753"
## [1] "Seed: 376279"

##    Permutation     Permuted Pair_Permuted
## 1            1  0.005031483     0.1641810
## 2            2 -0.032301401     0.1395814
## 3            3 -0.041537181     0.1479389
## 4            4  0.017517838     0.1629744
## 5            5 -0.007197936     0.1560678
## 6            6  0.013668617     0.1850219
## 7            7 -0.001806165     0.1471663
## 8            8 -0.012147743     0.1359532
## 9            9  0.007154194     0.1469262
## 10          10  0.010456928     0.1118559

Additional Data

Summary statistics for associations with p<5e-6 in our primary and permuted UKB analyses (Annis et. al. “False discovery rates for genome-wide association tests in biobanks with thousands of phenotypes.” https://doi.org/10.21203/rs.3.rs-873449/v1) are provided in data/.

Copy Link

Version

Install

install.packages('SIP')

Version

0.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Aubrey Annis

Last Published

December 9th, 2025

Functions in SIP (0.1.0)

get_singSamp

This function checks for uneven numbers of individuals within relatedness category and creates a data set for permuting single samples that will not be pair permuted (sip_pair.R).
phenotype_IBD_correlation

Phenotype-IBD correlation
plot_phenotype_correlations

Phenotype correlation plot
sipPair_relatednessData

Single-iteration paired permutation relatedness data
order_permDF

This function reorders the permuted sample-by-variable data frame to match the ID order of the primary sample-by-variable data frame.
sip_pair

Single-iteration paired permutation for large-scale biobank data with relatedness
sip

Single-iteration permutation for large-scale biobank data
sipPair_exampleData

Single-iteration paired permutation phenotype and covariate data
sip_exampleData

Single-iteration permutation phenotype and covariate data
get_fixed

This function subsets the sample-by-variable dataframe to the "fixed" (i.e., non-permuted) data. This subset should include ID variables, sex, and genotypic covariates.
get_pairPerm

This function samples individuals by pairs for permuting phenotype vectors in the paired-permutation function (sipPair.R)
SIP-package

SIP-package internal setup
get_relCat

This function infers relatedness categories in the paired-permutation function (sip_pair.R) prior to permutation.
get_permDF

This function recombines the fixed data and permuted data into a permuted sample-by-variable data frame.
get_relPairs

This function pairs individuals in the paired-permutation function (sip_pair.R) into the most highly-related pairs possible.
get_permute

This function subsets the sample-by-variable data frame to the permutable data. This subset should include phenotypes and phenotypic covariates.
get_pairPermDF

This function recombines the fixed data and permuted data in the paired-permutation function (sipPair.R) into a permuted sample-by-variable data frame.
get_sexDF

This function divides the sample-by-variable data frame into males and females prior to permutation within sexes.
get_permIdx

This function samples indices for permuting phenotype vectors in the basic permutation function (sip.R) and for permuting leftover unpaired individuals in the paired-permutation function (sipPair.R).