PhenoExplorer: An Interactive Web-based Platform for Exploring (Epi)Genome-Wide Associations Using a Swiss Population-based Study

: The recent advent of high-throughput sequencing technologies has allowed the exploration of the contribution of thousands of genomic, epigenomic, transcriptomic, or proteomic variants to complex phenotypic traits. Here, we sought to conduct large-scale (Epi)Genome-Wide Association Studies (GWAS/EWAS) to investigatetheassociationsbetweengenomic(SingleNucleotidePolymorphism;SNP)andepigenomic(Cytosine-Phospho-Guanine;CpG)markers,withmultiplephenotypictraitsinapopulation-basedcontext.Weuseddata fromSKIPOGH,afamily-andpopulation-basedcohortconductedinthecitiesofLausanne,Geneva,andBern (N=1100).Weused7,577,572SNPs,420,444CpGs,and825phenotypes,includinganthropometric,clinical, blood,urine,metabolite,andmetalmeasures.GWASanalysesassessedtheassociationsbetweenSNPsand metabolitesandmetals(N=279),usingregressionmodelsadjustedforage,sex,recruitmentcenter,andfamilial structure,whereasEWASanalysesexploredtherelationsbetweenCpGsand825phenotypes,additionally adjustingfortheseasonalityofbloodsamplingandtechnicalnuisance.FollowingtheimplementationofGWAS andEWASanalyses,wedevelopedaweb-basedplatform,PhenoExplorer,aimedatprovidinganopenaccess totheobtainedresults.Ofthe279phenotypesincludedinGWAS,103displayedsignificantassociationswith 2804SNPs(2091uniqueSNPs)atBonferronithreshold,whereas109ofthe825phenotypesincludedinEWAS analyseswereassociatedwith4893CpGs(2578uniqueCpGs).AlloftheobtainedGWASandEWASresults wereeventuallymadeavailableusingthein-housebuiltweb-basedPhenoExplorerplatform,withthepurpose ofprovidinganopen-accesstothetestedassociations.Inconclusion,weprovideacomprehensiveoutlineof GWASandEWASassociationsperformedinaSwisspopulation-basedstudy.Further,wesetupaweb-based PhenoExplorerplatformwiththepurposeofcontributingtotheoverallunderstandingoftheroleofmolecular variantsinregulatingcomplexphenotypes.

markers, genes, and phenotypes to TransCure groups, with the purpose of establishing a multi-disciplinary approach in studying biological mechanisms related to membrane transporters.

Study Population
We used data from the Swiss Kidney Project on Genes in Hypertension (SKIPOGH), a Swiss family-and population-based multicentric cohort investigating the genetic and environmental determinants of health-related outcomes in the Swiss population ( Fig. 1). [23,26] Study participants were recruited in the city of Lausanne and the cantons of Geneva and Bern between 2009 and 2013 (SKIPOGH 1, baseline visit), and came for a followup visit three years later (SKIPOGH 2: 2013-2016). [23] Inclusion criteria were: (1) written informed consent; (2) ≥18 years of age; (3) Caucasian origin; (4) at least one first-degree family member willing to participate to the study. We excluded women who reported being pregnant. Upon recruitment, SKIPOGH 1 baseline visit eventually included 1129 participants grouped within 275 families, SKIPOGH 2 follow-up visit included 1034 participants coming from 270 families, whereas 983 individuals participated to both study waves. At both visits, participants attended in-depth medical and anthropometric examinations after an overnight fast, provided blood and urine samples, and completed a selfadministered questionnaire inquiring about their living standards, socioeconomic and financial circumstances across the life-course, lifestyle factors, and medical history. All participants provided written informed consent.

SNP Data Collection and Pre-processing
Genome-wide Single Nucleotide Polymorphism (SNP) data was obtained from white blood cells' DNA (SKIPOGH 1), and generated using the Illumina Human Omni 2.5 platform. [27] We subsequently performed SNP data pre-processing and quality control checks by applying an in-house built algorithm. [28][29][30][31] Briefly, after an automatic clustering in GenomeStudio (Illumina Inc. San Diego), we selected samples (participants) with a call rate (proportion of non-missing SNPs) >0.99 to update the SNP statistics. Once re-clustered, we retained markers with a call rate >0.95. The quality control and pre-processing procedures yielded 979 samples with 1,637,659 SNPs, whose call rate was >95%, whose minor allele frequency (MAF) was >2%, and whose Hardy-Weinberg Equilibrium P value was >0.001. We then performed multiple imputation for SNP missing values using the Minimac3 imputation algorithm, yielding a final set of 7,577,572 SNPs available for analyses. [30,32,33]

Introduction
The recent advent of high-throughput 'OMICS' technologies has provided a new conceptual framework for characterizing genetic, biological, and pathophysiological processes occurring in human populations. [1,2] Unlike the classical 'gene candidate approach', which investigates the relation between a candidate gene and a phenotype of interest in an experimental setting, largescale OMICS technologies allow implementing a 'hypothesisfree' or 'agnostic' approach, whereby the associations between a large number of pre-identified molecular variants (i.e. >30'000 variants) and a phenotype of interest are explored in a populationbased setting. [2,3] Typical examples of OMICS analyses include Genome-Wide Association Studies (GWAS), which investigate the associations between Single Nucleotide Polymorphisms (SNP) on one hand, and phenotypes of interest on the other hand (i.e. presence of disease, life-span, anthropometric traits). [1,4] To date, the overall impact of OMICS-based GWAS studies has been substantial in biomedical research, highlighting common genomic (or genetic) variants related to heart disease, age-related macular degeneration, chronic kidney disease, neurodegenerative disorders, as well as many other diseases and phenotypic traits. [1,[4][5][6] Genome-wide polygenic risk scores, which combine the effects of millions of markers across the human genome, have now reached potential for clinical utility in people of European and Asian ancestries to predict risk of future common chronic diseases, yet they currently have limited transferability and clinical utility in people of African ancestry. [7,8] In addition to genomic molecular variants, OMICS-Wide Association Studies further include transcriptomic, proteomic, metabolomic, as well as epigenomic molecular markers, which have been the object of keen interest in recent years, although their tissue-specific and time-dependent characteristics make them much more complex to explore. Epigenomic (or epigenetic) changes include histone modifications, chromatin remodeling, microRNAs, as well as DNA methylation, which constitutes the most well-described epigenetic process. [9] DNA methylation changes refer to the addition or removal of methyl groups to CpG dinucleotides across the genome, occurring either naturally (i.e. development, senescence), or as a result of environmental exposures, such as lifestyle factors, nutrition, adversity, or pollution. [9][10][11][12] In particular, previous studies have suggested that DNA methylation changes may constitute an intermediate process through which the external environment gets biologically embedded, whereby various environmental exposures lead to DNA. [13] Epigenome-wide association studies (EWAS), based on easily accessible white blood cells DNA methylation data can provide insight into the molecular and functional understanding of GWAS loci and further our understanding of complex genotypephenotype relationships. [14,15] In this research, we sought to conduct a large-scale multi-OMICS wide analysis using data from SKIPOGH, a multicentric population-based cohort conducted in Switzerland, mainly focusing on kidney and blood pressure related phenotypes. [16][17][18][19][20][21][22][23][24][25] The overarching objectives of this work are to document the associations between genomic/epigenomic variants available in SKIPOGH, including 7M SNPs, 420K CpGs, and over 800 different phenotypes (i.e. plasma and urinary solutes, lipids, steroids, metabolites, metals, clinical outcomes), as well as to develop a web-based platform, PhenoExplorer, providing open access to the obtained GWAS and EWAS associations results, thereby promoting the findability, accessibility and use of this publicly funded resource. Specifically focusing on the TransCure project, the goal of this work is to provide candidate (epi)genetic distribution. Applied non-linear transformations included squareroot, log 10 , inverse, inverse square-root, square, inverse square, cubic, and inverse cubic transformations. [40,41]

Genome-Wide Association Study
We tested the associations between genome-wide SNP markers (predictor variables; 0: homozygous minor allele genotype, 1: heterozygous genotype, 2: homozygous major allele genotype) and 279 phenotypes from SKIPOGH 1 using linear regression models, adjusting for age, sex, and recruitment center. We accounted for random-effect familial relations by using a kinship matrix generated based on imputation genotypes using PLINK. [42] After discarding participants with missing data for the main outcome and covariate variables (N = 241), the final analytical set included 738 individuals. We accounted for multiple testing by applying the Bonferroni correction method ensuring a control of the family wise error rate below 0.05 (P<5E-08). [43] The GWAS statistical analyses were carried out using EPACTS, Efficient and Parallelizable Association Container Toolbox (University of Michigan, Michigan, United States), while data preparation and pre-processing was conducted using the R statistical software. [44] For each GWAS, Manhattan plot and corresponding regional plots were generated, the latter including every putative local association peak under association strength of P≤1.0E-5 (chromosomewide significance). Lookup of genetic functional consequence and possible known related clinical phenotypes was performed querying ENSEMBL. [45]

Epigenome-Wide Association Study (EWAS)
We applied fixed-effect linear regression models for the EWAS analysis. [46] We ran 825 multiple linear regression models, using SKIPOGH 2 phenotypes as successive, independent variables, and pre-processed CpG DNA methylation data as a combined set of dependent variables (N = 420,444 CpG markers). Regression models were adjusted for age, sex, recruitment center (Lausanne, Geneva, Bern), seasonality of blood sampling (spring, summer, fall, winter), chip type (HM450, EPIC), 30 CPACOR principal components, and Houseman-estimated white blood cell composition (CD8, CD4, NK, B cells, Monocytes, Granulocytes). [47] The EWAS statistical analyses were carried out using the R statistical software and relevant CRAN and Bioconductor packages (R Foundation for Statistical Computing, Vienna, Austria). For each phenotype, EWAS results were represented using a Vo lcano plot (x-axis: beta regression coefficient, y-axis: P value), a Manhattan plot (x-axis: CpG chromosomal position, y-axis: P value), and a summary table including beta, standard-error (SE) and P value coefficients between each of the 420,444 CpG marker and the phenotype of interest (Supplementary Information SI_IV). Statistical significance threshold were set at P<0.05/420,444 following Bonferroni correction for multiple testing, as well as the Benjamini-Hochberg (BH) correction.

PhenoExplorer
We developed PhenoExplorer, a web-based, interactive platform leveraging EPACTS, and 'LocusZoom Standalone' pipelines, and RStudio's 'Shiny' web app, with the purpose of visualizing GWAS/ EWAS association results introduced previously. [44,48] Briefly, the first development step comprised GWAS and EWAS analyses ran on local machines aimed at generating summary statistics along with Manhattan and Vo lcano plots (7,577,572 SNPs, EWAS: 420,444 CpGs). The second development step included summary statistics filtering in order to obtain local association minima and generating regional plots for every filtered local minimum SNP (GWAS only). Third, annotated lookup tables with genetic and clinical consequences were eventually generated from the raw SKIPOGH 2 participants (follow-up visit) using the Infinium HumanMethylation450 BeadChip microarray of Illumina (HM450), assessing the methylation status at 485,512 CpG sites. For a different set of 442 SKIPOGH 2 participants, epigenomewide DNA methylation was measured using a more recent Infinium MethylationEPIC v1.0 microarray (EPIC), including >90% of the CpG sites from the HM450 and an additional 413,743 CpGs (865,859 CpGs in total). [34] For both arrays, CpG methylation data were summarized as β coefficients representing a ratio of the average signal for methylated CpG sites to the sum of methylated and unmethylated sites. CpG data pre-processing included multiple imputation of missing data using the nearest averaging multiple imputation method, [35] logit transformation of imputed values, and a 'denoising' procedure aimed at accounting for the variance introduced by participant's familial structure (random-effect confounder), whereby the resulting residuals were directly added to the transformed CpG methylation data, enabling the implementation of fixed-effect regression models. To control for the nuisance introduced by technical factors (methylation array type, array position, and plate level), the CPACOR procedure was applied, yielding 30 principal components to be used as fixed-effect covariates in the regression models. [36] The data pre-processing procedures eventually yielded 420,444 CpG sites available for Epigenome-Wide Association Study analyses (EWAS) in the SKIPOGH 2 sample.

SKIPOGH 1 GWAS Phenotypes
In SKIPOGH 1, 981 participants had available samples for plasma metabolites measurement. Samples were analyzed using Liquid Chromatography-Multiple Reaction Monitoring/Mass Spectrometry (LC-MRM/MS -100µL samples), with measures performed in positive and negative electrospray ionization, and using the Mass Spectrometry Metabolite Library as reference material for standard metabolites. [37] Of the 606 targeted raw metabolites, 232 were eventually measured and used for GWAS analyses following quality control and correction procedures to account for the correlated nature of the data. [37,38] In addition to plasma metabolites, GWAS analyses included associations for metal elements measured in 24 h urine collections ('metallomics' phenotypes). Briefly, 24 elements (Ag, Al, As, Be, Bi, Cd, Co, Cr, Cu, Hg, I, Li, Mn, Mo, Ni, Pb, Pd, Pt, Sb, Se, Sn, Ti, V, Zn) were measured in day and night urine samples using the imaging mass spectrometry method (ICP-MS), enabling the detection and quantification of metals in biological samples (N = 47 additional phenotypes). [39] To account for the non-normal distribution of plasma metabolites and urinary metal measures, phenotypes were transformed using the log10 transformation.

SKIPOGH 2 EWAS Phenotypes
We included a total of 825 phenotype variables in EWAS analyses ( Fig. 1 and Supplementary Information SI_I). We subdivided phenotype variables into nine categories: . Considering that physiological phenotypes may display a non-normal distribution and potentially yield spurious associations with CpG markers, we assessed the distribution of each phenotype and performed a non-linear transformation where necessary, in order to obtain normal, or close-to-normal data querying ENSEMBL database, [45] whilst the fourth and final development step included results grouping, structuring, and web publishing via 'Shiny' RStudio app.

General Characteristics
We present the GWAS and EWAS samples characteristics in Table 1. Overall, 979 individuals (87%) were included in GWAS analyses (SKIPOGH 1: 2009-2012), of which 52% were women. The mean age was 48.3 years, with 41% of the study sample being recruited in Lausanne, 43% in Geneva, and 16% in Bern. 65% of included participants were actively working at the time when the interview was conducted, while 5% were students, 20% were retired or disabled, and 10% were either unemployed or stay-athome. The proportion of current smokers was 24%, the average BMI 25.2 kg/m 2 , whereas the proportion of individuals selfreporting chronic diseases was 12% for heart disease, 13% for kidney disease, 5% for diabetes, 32% for hypertension, and 1% for cancer (any type of malignancy). A subset of 684 participants were included in EWAS analyses using SKIPOGH 2 data (2013-2016), with a mean age of 52.6 years. Whilst we found similar distributions for sex, recruitment center, professional activity, and tobacco smoking when compared to GWAS participants, the EWAS sample included more individuals reporting heart disease (19%), and fewer individuals reporting kidney disease (4%).

GWAS
Of the 279 phenotypes included in the GWAS analyses, 103 were significantly associated with 2804 SNPs (2091 unique SNPs) at the genome-wide Bonferroni threshold (P<5.0E-08), including 90 plasma metabolites and 13 urine metal phenotypes (Supplementary Information SI_II and SI_III). In Table 2, we show a list of non-exhaustive GWAS associations, displaying 26 phenotypes along with their top-associated SNPs (threshold: P<9.4E-10). Further, among the SNPs displayed in Supplementary Information SI_III, we found that none of the highlighted markers were located within the genes of interest for the TransCure project (SLC17, SLC9, DMT, FPN, SLC11, SLC40, ABCG, FABP, TRPM4, SLC7). For each GWAS analysis, the obtained results are represented using a Manhattan plot and a summary table reporting the regression coefficients (beta, standard error, P value) as displayed in Fig. 2 (Phenotype: Bilirubin (plasma metabolite)).

PhenoExplorer
We present the inner structure of PhenoExplorer as well as the user's layout in Fig. 4. As described in the Methods section, the development process included four main steps, including running GWAS/EWAS regressions on a local server (R, EPACTS), Manhattan/Volcano/regional plot generation using summary statistics, and web-based platform organization and structuring using 'Shiny'. From the user's perspective, PhenoExplorer allows performing queries using four different criteria: (1) Phenotype of interest (i.e. urinary zinc levels, blood glucose, CRP, smoking as a result of the modular conception of PhenoExplorer (use case scenario given in the Supplementary Information SI-Video). The PhenoExplorer users' URL shall be communicated upon motivated request addressed to the authors (jean-pierre.ghobril@ unisante.ch).

Discussion
In this research, we conducted two large-scale multi-OMICS analyses using data from the SKIPOGH population-based cohort, including over 7M genetic variants, 420K epigenetic markers, and 825 phenotypes. We found that 2804 SNPS (2091 unique status, history of heart disease); (2) Genetic markers (CpG, SNP, gene name); (3) Statistical significances; and (4) genomic position. The resulting overview includes a Manhattan plot, a summary table (beta, standard error, P value, chromosome, chromosomal position, and location description (i.e. intragenic, gene body, transcription start site, exon, intron)), a Vo lcano plot (EWAS only), as well as regional plots displaying local minima of interest on a 1000bp-wide sliding window (GWAS only). Whilst PhenoExplorer includes >800 SKIPOGH phenotypes, additional GWAS and EWAS associations results (summary tables, Manhattan, Vo lcano, regional plots) can be directly added Ta ble 1. General characteristics of GWAS/EWAS included participants. Whilst the purpose of this research was to conduct large-scale GWAS and EWAS analyses without specifically interpreting the obtained results, we used some of the previously established associations to verify whether the models implemented here could partially replicate former findings. The large-scale exploratory GWAS yielded associations in line with previous investigations, but also novel findings. In particular, we reproduced known associations involving plasma biliverdin and bilirubin with rs887829, [49] rs13538 with N-acetyl-phenylalanine, [50] rs1061337 with hexanoylcarnitine, [51] as well SLC2A9 variants and uric acid. [52,53] Using tobacco smoking in EWAS analyses, we found that 109 of the 120 (91%) identified CpGs were found to be associated with smoking status in former smoking-related EWAS studies, with top hits consistently including AHRR-cg05575921, F2RL3-cg03636183, or GFI1-cg09935388 in SKIPOGH. [12,54,55] Finally, all of the GWAS and EWAS analyses presented in this research are comprehensively reported and organized within PhenoExplorer, a web-based platform providing a fast turnover and visualization of millions of associations, and offering a modular structure for the incorporation of additional results.

Strengths and Limitations
This study has several strengths, the first being the richness of the physiological, clinical, lifestyle, and (epi)genetic data available in the SKIPGOH cohort, allowing to investigate multiple research questions related to the contribution of molecular variants to complex phenotypes. Second, we developed a userfriendly, web-based platform with the purpose of openly sharing GWAS and EWAS results with researchers interested in either specific genetic or epigenetic markers, or phenotypes available in SKIPOGH. The added values of PhenoExplorer is the real time interaction with the data as well as a rapid visualization of GWAS/ EWAS associations results, whereby user queries may include phenotypes, (epi)genetic marker identifiers (SNPs, CpGs), and gene names. Finally, the modular structure of PhenoExplorer allows a rapid integration of additional GWAS/EWAS/XWAS results, further allowing the platform to grow.
This study also has important limitations to acknowledge. First, the small sample size (N GWAS = 979, N EWAS = 684) restricts the ability to detect smaller effect-size associations and limits the overall statistical power. Second, GWAS and EWAS models were minimally adjusted for potential confounders (age, sex, recruitment center, familial structure, technical nuisance), but did not include covariates that may specifically confound relations Ta ble 2. Non-exhaustive selection of SNPs associated with plasma metabolite phenotypes (SKIPOGH 1). a Linear regression model for the association between SNP marker and phenotypes of interest, adjusting for age, sex, recruitment center, and familial structure (random-effect covariable) b Significance threshold was set using the Bonferroni correction method (P <5.0E-08).

Phenotype
between (epi)genetic variants and given phenotypes, eventually leading to spurious associations. Disentangling these effects would thus require repeating GWAS/EWAS analyses by including additional covariates, depending on the choice of phenotypes and the research question. Third, unlike genetic variants, epigenetic modificationsaretissue-specificandinfluencedbymultiplefactors, such as genetic makeup, aging, and environmental exposures, and constitute a dynamic process over the life-course. [55,56] Consequently, the present EWAS analyses preclude directly establishing a cause-to-effect relation between CpG markers and included phenotypes, and may be affected by time discrepancies between the collection periods of the phenotype on one hand, and the epigenetic profile on the other hand (i.e. metabolites/metals were measured at SKIPOGH 1 while CpG data was measured at SKIPOGH 2). Fourth, although multi GWAS and EWAS analyses tend to be appointed as agnostic or hypotheses-free, they actually imply hidden hypotheses that need to be accounted for while conducting OMICS studies. [2,3] In a previous research by Barker, [3] the authors have identified three hidden hypotheses underlying EWAS studies: (1) insufficient EWAS coverage: implying that not all epigenetic markers are included in available EWAS arrays; (2) biological relevance of the tissue in which epigenetic markers are measured: whereby the tissues in which epigenetic measures are performed may not be relevant for the assessed phenotypes (i.e. DNA methylation measured in blood and tested in relation to complex neurological disease); (3) biological relevance for the phenotype of interest: postulating that the biological nature of diseases and most phenotypes is often complex, with individual epigenetic modifications generally playing a modest role in the global understanding of the trait or the pathophysiological process of interest.

Conclusion
In conclusion, this work provides a comprehensive outline of GWAS and EWAS-based associations performed in the context of a population-based study conducted in adults of European ancestry living in Switzerland, including a large number of physiological, clinical, and lifestyle-related phenotypes. Further, by providing the obtained results through the development of the PhenoExplorer web-based platform, we aim to contribute to the overall understanding of the role of genetic and epigenetic variants in the regulation of multiple phenotypes, assessed in a population-based observational setting. We believe that the use of PhenoExplorer as part of the TransCure project shall allow highlighting potential associations of interest between markers located within or in the vicinity of TransCure genes/transporters of interest and phenotypes sampled in the SKIPOGH populationbased cohort. We show here a practical example of how publicly funded results can be made easily findable, accessible and usable for future projects by the researchers with interest in the phenotypes available, in particular kidney-related phenotypes.

Conflicts of interest
None to declare.   a Linear regression model for the association between CpG markers and phenotypes of interest, adjusting for age, sex, recruitment center, seasonality of blood sampling, and familial structure (random-effect covariable) b Significance threshold was set using the Bonferroni correction method (P < 0.05/420'444)