About the TreeWAS Database

TreeWAS refers to a Bayesian approach for mapping genetic risk across disease classification codes within a hierarchical ontology [1]. The method uses the ontology to shape prior belief about the profile of pleiotropy, allows shared signal across related codes (for example subtypes of a disease) to be combined effectively, but also allows for distinct patterns of risk (or absence of risk) in other parts of the ontology. The approach measures the evidence that a variant has any effect on any disease classification code, quantified by the tree Bayes Factor, and enables posterior decoding to identify affected nodes within the ontology. Here, the Bayes Factor is expressed as log Bayes Factor (lBF), which is the logarithm to the base 10.

The TreeWAS Database provides access to the results of 36,212 SNPs genotyped or imputed in the UK Biobank (UKB) [2, 3], which has collected genetic and routine healthcare data from over 500,000 participants. The method was applied to available genetic variation data and analyzed on 19,155 diagnostic terms (clinical codes) from hospitalization episode statistics (HES), defined by the International Classification of Diseases, Tenth Revision (ICD-10) [4]. This ontology is not intended to reflect biological processes, though nevertheless captures many important relationships between related disorders, subtypes and complications. Similarly, but with fewer diagnostic terms, we provide the results of an analysis considering the self-reported statistics (SRS) on a subset of SNPs.

In addition to the genetic risk profiles measured per variant, a clustering method was applied to identify distinct risk profiles that help to better understand relationships among variants; that is, to better understand the structure of genetic pleiotropy.

  1. Cortes, A. et al. Bayesian analysis of genetic association across tree-structured routine healthcare data in the UK Biobank. Nat. Genet. 49, 1311–1318 (2017).
  2. Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
  3. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
  4. International Classification of Diseases. Tenth Revision (ICD-10).