×
  • Select the area you would like to search.
  • ACTIVE INVESTIGATIONS Search for current projects using the investigator's name, institution, or keywords.
  • EXPERTS KNOWLEDGE BASE Enter keywords to search a list of questions and answers received and processed by the ADNI team.
  • ADNI PDFS Search any ADNI publication pdf by author, keyword, or PMID. Use an asterisk only to view all pdfs.
Principal Investigator  
Principal Investigator's Name: Matthew Sinnett
Institution: PrecisionLife
Department: Disease Portfolio
Country:
Proposed Analysis: Alzheimer's disease (AD), like other complex diseases, is characterized by a high degree of heterogeneity across the patient population, reflected in a wide range of disease presentations and therapy responses. GWAS have identified several disease-associated genes, but these findings have not translated into progress in bringing effective treatments to patients as evidenced by the high clinical attrition rate. This likely reflects the limitations of GWAS in only identifying single variants with large effect sizes in a population, while the key to understanding complex diseases such as AD that are influenced by multiple genetic loci, epidemiological and/or environmental factors is to find combinations of these disease associated factors that distinguish one patient subgroup from another. Novel approaches are therefore required to give deeper insight into the underlying biology of the disease, enable new therapies, and create high quality evidence to help improve patient care. A greater understanding of the clinically relevant differences between the different AD phenotypes, and the consequential development of more targeted and efficient treatment strategies, may improve the chances of success in downstream clinical trials by selecting AD patients more likely to respond to the novel treatments under development. The PrecisionLife platform utilizes a hypothesis-free method for the detection of combinations of features that together are strongly associated with variations in disease risk, mechanistic etiology, symptoms and progression rates often observed in patients with cognitive decline. This combinatorial approach allows for the stratification of complex chronic diseases like Alzheimer’s into multiple patient subgroups that have corresponding genetic biomarkers, enabling us to identify and understand more deeply the biological mechanisms that are driving disease within those subgroups. This precision neuroscience approach will allow for the development of more effective novel therapeutic strategies based on identification of novel targets with strong genetic evidence within patient subgroups and accompanying stratification biomarkers. This will bring additional clinical utility in diagnosing patients faster, more accurately predicting their disease trajectory, and their response to treatment. To this end, we are seeking access to ADNI data to further scientific research into Alzheimer’s disease and mild cognitive impairment (MCI), specifically investigating the genetic and biological mechanisms underpinning the rate of progression from MCI to Alzheimer’s disease. This study is expected to identify novel targets for developing treatments that are aimed at decelerating disease progression from MCI to Alzheimer’s and new biomarkers that can be used to more accurately, more cheaply and more quickly diagnose or stratify AD patients for clinical trial recruitment and for effective positioning within the healthcare system. PrecisionLife operates under ISO27001 and ISO 27701 accreditation and all personnel are trained in, and used to working under, the US Health Insurance Portability and Accountability Act (HIPAA). To meet our research aim we propose the following studies utilizing ADNI data: The PrecisionLife Platform The PrecisionLife platform can be applied to case-control datasets to identify combinations of SNP genotypes that when observed together in a sample were strongly associated with the case population. The PrecisionLife platform uses a unique data analytics framework that enables efficient combinatorial analysis of large, multi-dimensional participant datasets. Navigating this data space allows for the identification of combinations of features that are significantly associated with groups of cases in a case-control dataset. The PrecisionLife combinatorial analysis is hypothesis free, involving a four-stage mining, validation, evaluation and annotation process. The PrecisionLife platform identifies combinations of feature states in ‘layers’ of increasing combinatorial complexity, i.e., singletons, pairs, triplets etc. A feature could for example be a SNP, and a feature state would consist of the SNP’s base index and its genotype, which would typically be encoded ordinally as {0, 1, 2} for homozygous major allele, heterozygous minor allele, homozygous minor allele respectively. The platform has considerably more flexibility of representation (including alternate genotype encodings, extended genetic models, polyploidy and quantitative values) if required by the feature or dataset being analyzed. In the mining phase, combinations of feature states that are overrepresented (using a Z-score or Fisher’s Exact test) in cases are identified and validated. Multiple feature states are combined iteratively until no additional features can be added that will improve the score. Combinations of feature states that have high odds ratios, low p-values (p < 0.05) and high prevalence (>5%) in cases are prioritized. The mining process is repeated across up to 1,000 cycles of fully randomized permutation of the case:control labels of all individuals in the dataset, keeping the same parameters and case-control ratio. In the validation phase, all combinations generated by the original mining run and each of the random permutation iterations of the dataset are compared. These combinations are validated using network properties such as minimum prevalence (number of cases represented, in this case >5%) as the null hypothesis when compared with the combinations generated by the random permutations. Combinations that appear in the random permutations above a specified FDR threshold (Benjamini-Hochberg FDR of 0.05) after multiple testing correction are considered to be random and eliminated. Combinations passing these tests are reported as validated disease signatures. The validated disease signatures are then evaluated. The features (which in this case only consisted of SNPs due to the limited available dataset) shared by multiple disease signatures (known as ‘critical’ SNPs) are identified. Critical SNPs, which can be thought of as the canonical features of a cluster comprised of overlapping disease signatures, are then scored using a Random Forest (RF) algorithm in a 5-fold cross-validation framework to evaluate the accuracy with which they predict the observed case-control split in a dataset (minimizing Gini impurity or the probability of misclassification). We use RF scores in similar ways to rank critical SNPs and by association the genes. Disease signatures comprising high RF scoring critical SNPs (and their genes) are then mapped to the cases in which they were found, and additional clinical data (such as blood biochemistry data, comorbidity ICD-10 codes and medication history) is used to generate a patient profile for each combinatorial disease signature. Finally, a merged network (disease architecture) view is generated by clustering all validated disease signatures based on their co-occurrence in patients in the dataset, and annotation of the validated SNPs, genes, and the druggability of targets is performed using a semantic knowledge graph. This methodology has been validated and replicated across multiple chronic, complex diseases including COVID-19 and ME/CFS (Taylor et al, 2020, Das et al, 2022). Study 1 In Study 1, PrecisionLife will perform a combinatorial analysis of genotype data from all four ADNI cohorts, comparing subjects with Alzheimer’s disease and MCI against a matched dataset of healthy controls. This will generate a disease architecture for the combined Alzheimer’s and MCI population and enable a hypothesis-free evaluation of the genetic differences which underly different subgroups of Alzheimer’s and MCI patients. A post-hoc analysis will then be performed using clinical and phenotypic data to identify patient subgroups that are highly associated with different progression rates, cognitive function test scores (MMSE, ADAS-Cog), and other clinically relevant phenotypes, such as patients with mild cognitive impairment that progresses to Alzheimer’s. Additionally, identification of patients subgroups that possess other neurological or neuropsychiatric conditions using data like the Neuropsychiatric Inventory (NPI) and Geriatric Depression Scale (GDS) will take place. Study 2 Study 2 will be split into two parts with two distinct case cohorts derived from all four ADNI groups, dividing the entire MCI subject population based on disease progression: • 2A: MCI fast progressors • 2B: MCI slow progressors Longitudinal cognitive function test scores for the MCI population will be used to define slow and fast progression populations. Rate of decrease for cognitive function scores (MMSE, ADAS-Cog) between different intervals will be used to identify and divide MCI subjects into fast or slow progression groups. Both sets of cases (2A and 2B) will be run independently through the PrecisionLife platform against the same cohort of healthy controls from ADNI and the outputs compared, looking to identify genetic and biological pathway differences between MCI fast and slow progressor groups.
Additional Investigators  
Investigator's Name: Sayoni Das
Proposed Analysis: Alzheimer's disease (AD), like other complex diseases, is characterized by a high degree of heterogeneity across the patient population, reflected in a wide range of disease presentations and therapy responses. GWAS have identified several disease-associated genes, but these findings have not translated into progress in bringing effective treatments to patients as evidenced by the high clinical attrition rate. This likely reflects the limitations of GWAS in only identifying single variants with large effect sizes in a population, while the key to understanding complex diseases such as AD that are influenced by multiple genetic loci, epidemiological and/or environmental factors is to find combinations of these disease associated factors that distinguish one patient subgroup from another. Novel approaches are therefore required to give deeper insight into the underlying biology of the disease, enable new therapies, and create high quality evidence to help improve patient care. A greater understanding of the clinically relevant differences between the different AD phenotypes, and the consequential development of more targeted and efficient treatment strategies, may improve the chances of success in downstream clinical trials by selecting AD patients more likely to respond to the novel treatments under development. The PrecisionLife platform utilizes a hypothesis-free method for the detection of combinations of features that together are strongly associated with variations in disease risk, mechanistic etiology, symptoms and progression rates often observed in patients with cognitive decline. This combinatorial approach allows for the stratification of complex chronic diseases like Alzheimer’s into multiple patient subgroups that have corresponding genetic biomarkers, enabling us to identify and understand more deeply the biological mechanisms that are driving disease within those subgroups. This precision neuroscience approach will allow for the development of more effective novel therapeutic strategies based on identification of novel targets with strong genetic evidence within patient subgroups and accompanying stratification biomarkers. This will bring additional clinical utility in diagnosing patients faster, more accurately predicting their disease trajectory, and their response to treatment. To this end, we are seeking access to ADNI data to further scientific research into Alzheimer’s disease and mild cognitive impairment (MCI), specifically investigating the genetic and biological mechanisms underpinning the rate of progression from MCI to Alzheimer’s disease. This study is expected to identify novel targets for developing treatments that are aimed at decelerating disease progression from MCI to Alzheimer’s and new biomarkers that can be used to more accurately, more cheaply and more quickly diagnose or stratify AD patients for clinical trial recruitment and for effective positioning within the healthcare system. PrecisionLife operates under ISO27001 and ISO 27701 accreditation and all personnel are trained in, and used to working under, the US Health Insurance Portability and Accountability Act (HIPAA). To meet our research aim we propose the following studies utilizing ADNI data: The PrecisionLife Platform The PrecisionLife platform can be applied to case-control datasets to identify combinations of SNP genotypes that when observed together in a sample were strongly associated with the case population. The PrecisionLife platform uses a unique data analytics framework that enables efficient combinatorial analysis of large, multi-dimensional participant datasets. Navigating this data space allows for the identification of combinations of features that are significantly associated with groups of cases in a case-control dataset. The PrecisionLife combinatorial analysis is hypothesis free, involving a four-stage mining, validation, evaluation and annotation process. The PrecisionLife platform identifies combinations of feature states in ‘layers’ of increasing combinatorial complexity, i.e., singletons, pairs, triplets etc. A feature could for example be a SNP, and a feature state would consist of the SNP’s base index and its genotype, which would typically be encoded ordinally as {0, 1, 2} for homozygous major allele, heterozygous minor allele, homozygous minor allele respectively. The platform has considerably more flexibility of representation (including alternate genotype encodings, extended genetic models, polyploidy and quantitative values) if required by the feature or dataset being analyzed. In the mining phase, combinations of feature states that are overrepresented (using a Z-score or Fisher’s Exact test) in cases are identified and validated. Multiple feature states are combined iteratively until no additional features can be added that will improve the score. Combinations of feature states that have high odds ratios, low p-values (p < 0.05) and high prevalence (>5%) in cases are prioritized. The mining process is repeated across up to 1,000 cycles of fully randomized permutation of the case:control labels of all individuals in the dataset, keeping the same parameters and case-control ratio. In the validation phase, all combinations generated by the original mining run and each of the random permutation iterations of the dataset are compared. These combinations are validated using network properties such as minimum prevalence (number of cases represented, in this case >5%) as the null hypothesis when compared with the combinations generated by the random permutations. Combinations that appear in the random permutations above a specified FDR threshold (Benjamini-Hochberg FDR of 0.05) after multiple testing correction are considered to be random and eliminated. Combinations passing these tests are reported as validated disease signatures. The validated disease signatures are then evaluated. The features (which in this case only consisted of SNPs due to the limited available dataset) shared by multiple disease signatures (known as ‘critical’ SNPs) are identified. Critical SNPs, which can be thought of as the canonical features of a cluster comprised of overlapping disease signatures, are then scored using a Random Forest (RF) algorithm in a 5-fold cross-validation framework to evaluate the accuracy with which they predict the observed case-control split in a dataset (minimizing Gini impurity or the probability of misclassification). We use RF scores in similar ways to rank critical SNPs and by association the genes. Disease signatures comprising high RF scoring critical SNPs (and their genes) are then mapped to the cases in which they were found, and additional clinical data (such as blood biochemistry data, comorbidity ICD-10 codes and medication history) is used to generate a patient profile for each combinatorial disease signature. Finally, a merged network (disease architecture) view is generated by clustering all validated disease signatures based on their co-occurrence in patients in the dataset, and annotation of the validated SNPs, genes, and the druggability of targets is performed using a semantic knowledge graph. This methodology has been validated and replicated across multiple chronic, complex diseases including COVID-19 and ME/CFS (Taylor et al, 2020, Das et al, 2022). Study 1 In Study 1, PrecisionLife will perform a combinatorial analysis of genotype data from all four ADNI cohorts, comparing subjects with Alzheimer’s disease and MCI against a matched dataset of healthy controls. This will generate a disease architecture for the combined Alzheimer’s and MCI population and enable a hypothesis-free evaluation of the genetic differences which underly different subgroups of Alzheimer’s and MCI patients. A post-hoc analysis will then be performed using clinical and phenotypic data to identify patient subgroups that are highly associated with different progression rates, cognitive function test scores (MMSE, ADAS-Cog), and other clinically relevant phenotypes, such as patients with mild cognitive impairment that progresses to Alzheimer’s. Additionally, identification of patients subgroups that possess other neurological or neuropsychiatric conditions using data like the Neuropsychiatric Inventory (NPI) and Geriatric Depression Scale (GDS) will take place. Study 2 Study 2 will be split into two parts with two distinct case cohorts derived from all four ADNI groups, dividing the entire MCI subject population based on disease progression: • 2A: MCI fast progressors • 2B: MCI slow progressors Longitudinal cognitive function test scores for the MCI population will be used to define slow and fast progression populations. Rate of decrease for cognitive function scores (MMSE, ADAS-Cog) between different intervals will be used to identify and divide MCI subjects into fast or slow progression groups. Both sets of cases (2A and 2B) will be run independently through the PrecisionLife platform against the same cohort of healthy controls from ADNI and the outputs compared, looking to identify genetic and biological pathway differences between MCI fast and slow progressor groups.
Investigator's Name: James Kozubek
Proposed Analysis: Alzheimer's disease (AD), like other complex diseases, is characterized by a high degree of heterogeneity across the patient population, reflected in a wide range of disease presentations and therapy responses. GWAS have identified several disease-associated genes, but these findings have not translated into progress in bringing effective treatments to patients as evidenced by the high clinical attrition rate. This likely reflects the limitations of GWAS in only identifying single variants with large effect sizes in a population, while the key to understanding complex diseases such as AD that are influenced by multiple genetic loci, epidemiological and/or environmental factors is to find combinations of these disease associated factors that distinguish one patient subgroup from another. Novel approaches are therefore required to give deeper insight into the underlying biology of the disease, enable new therapies, and create high quality evidence to help improve patient care. A greater understanding of the clinically relevant differences between the different AD phenotypes, and the consequential development of more targeted and efficient treatment strategies, may improve the chances of success in downstream clinical trials by selecting AD patients more likely to respond to the novel treatments under development. The PrecisionLife platform utilizes a hypothesis-free method for the detection of combinations of features that together are strongly associated with variations in disease risk, mechanistic etiology, symptoms and progression rates often observed in patients with cognitive decline. This combinatorial approach allows for the stratification of complex chronic diseases like Alzheimer’s into multiple patient subgroups that have corresponding genetic biomarkers, enabling us to identify and understand more deeply the biological mechanisms that are driving disease within those subgroups. This precision neuroscience approach will allow for the development of more effective novel therapeutic strategies based on identification of novel targets with strong genetic evidence within patient subgroups and accompanying stratification biomarkers. This will bring additional clinical utility in diagnosing patients faster, more accurately predicting their disease trajectory, and their response to treatment. To this end, we are seeking access to ADNI data to further scientific research into Alzheimer’s disease and mild cognitive impairment (MCI), specifically investigating the genetic and biological mechanisms underpinning the rate of progression from MCI to Alzheimer’s disease. This study is expected to identify novel targets for developing treatments that are aimed at decelerating disease progression from MCI to Alzheimer’s and new biomarkers that can be used to more accurately, more cheaply and more quickly diagnose or stratify AD patients for clinical trial recruitment and for effective positioning within the healthcare system. PrecisionLife operates under ISO27001 and ISO 27701 accreditation and all personnel are trained in, and used to working under, the US Health Insurance Portability and Accountability Act (HIPAA). To meet our research aim we propose the following studies utilizing ADNI data: The PrecisionLife Platform The PrecisionLife platform can be applied to case-control datasets to identify combinations of SNP genotypes that when observed together in a sample were strongly associated with the case population. The PrecisionLife platform uses a unique data analytics framework that enables efficient combinatorial analysis of large, multi-dimensional participant datasets. Navigating this data space allows for the identification of combinations of features that are significantly associated with groups of cases in a case-control dataset. The PrecisionLife combinatorial analysis is hypothesis free, involving a four-stage mining, validation, evaluation and annotation process. The PrecisionLife platform identifies combinations of feature states in ‘layers’ of increasing combinatorial complexity, i.e., singletons, pairs, triplets etc. A feature could for example be a SNP, and a feature state would consist of the SNP’s base index and its genotype, which would typically be encoded ordinally as {0, 1, 2} for homozygous major allele, heterozygous minor allele, homozygous minor allele respectively. The platform has considerably more flexibility of representation (including alternate genotype encodings, extended genetic models, polyploidy and quantitative values) if required by the feature or dataset being analyzed. In the mining phase, combinations of feature states that are overrepresented (using a Z-score or Fisher’s Exact test) in cases are identified and validated. Multiple feature states are combined iteratively until no additional features can be added that will improve the score. Combinations of feature states that have high odds ratios, low p-values (p < 0.05) and high prevalence (>5%) in cases are prioritized. The mining process is repeated across up to 1,000 cycles of fully randomized permutation of the case:control labels of all individuals in the dataset, keeping the same parameters and case-control ratio. In the validation phase, all combinations generated by the original mining run and each of the random permutation iterations of the dataset are compared. These combinations are validated using network properties such as minimum prevalence (number of cases represented, in this case >5%) as the null hypothesis when compared with the combinations generated by the random permutations. Combinations that appear in the random permutations above a specified FDR threshold (Benjamini-Hochberg FDR of 0.05) after multiple testing correction are considered to be random and eliminated. Combinations passing these tests are reported as validated disease signatures. The validated disease signatures are then evaluated. The features (which in this case only consisted of SNPs due to the limited available dataset) shared by multiple disease signatures (known as ‘critical’ SNPs) are identified. Critical SNPs, which can be thought of as the canonical features of a cluster comprised of overlapping disease signatures, are then scored using a Random Forest (RF) algorithm in a 5-fold cross-validation framework to evaluate the accuracy with which they predict the observed case-control split in a dataset (minimizing Gini impurity or the probability of misclassification). We use RF scores in similar ways to rank critical SNPs and by association the genes. Disease signatures comprising high RF scoring critical SNPs (and their genes) are then mapped to the cases in which they were found, and additional clinical data (such as blood biochemistry data, comorbidity ICD-10 codes and medication history) is used to generate a patient profile for each combinatorial disease signature. Finally, a merged network (disease architecture) view is generated by clustering all validated disease signatures based on their co-occurrence in patients in the dataset, and annotation of the validated SNPs, genes, and the druggability of targets is performed using a semantic knowledge graph. This methodology has been validated and replicated across multiple chronic, complex diseases including COVID-19 and ME/CFS (Taylor et al, 2020, Das et al, 2022). Study 1 In Study 1, PrecisionLife will perform a combinatorial analysis of genotype data from all four ADNI cohorts, comparing subjects with Alzheimer’s disease and MCI against a matched dataset of healthy controls. This will generate a disease architecture for the combined Alzheimer’s and MCI population and enable a hypothesis-free evaluation of the genetic differences which underly different subgroups of Alzheimer’s and MCI patients. A post-hoc analysis will then be performed using clinical and phenotypic data to identify patient subgroups that are highly associated with different progression rates, cognitive function test scores (MMSE, ADAS-Cog), and other clinically relevant phenotypes, such as patients with mild cognitive impairment that progresses to Alzheimer’s. Additionally, identification of patients subgroups that possess other neurological or neuropsychiatric conditions using data like the Neuropsychiatric Inventory (NPI) and Geriatric Depression Scale (GDS) will take place. Study 2 Study 2 will be split into two parts with two distinct case cohorts derived from all four ADNI groups, dividing the entire MCI subject population based on disease progression: • 2A: MCI fast progressors • 2B: MCI slow progressors Longitudinal cognitive function test scores for the MCI population will be used to define slow and fast progression populations. Rate of decrease for cognitive function scores (MMSE, ADAS-Cog) between different intervals will be used to identify and divide MCI subjects into fast or slow progression groups. Both sets of cases (2A and 2B) will be run independently through the PrecisionLife platform against the same cohort of healthy controls from ADNI and the outputs compared, looking to identify genetic and biological pathway differences between MCI fast and slow progressor groups.
Investigator's Name: Marianna Sanna
Proposed Analysis: Alzheimer's disease (AD), like other complex diseases, is characterized by a high degree of heterogeneity across the patient population, reflected in a wide range of disease presentations and therapy responses. GWAS have identified several disease-associated genes, but these findings have not translated into progress in bringing effective treatments to patients as evidenced by the high clinical attrition rate. This likely reflects the limitations of GWAS in only identifying single variants with large effect sizes in a population, while the key to understanding complex diseases such as AD that are influenced by multiple genetic loci, epidemiological and/or environmental factors is to find combinations of these disease associated factors that distinguish one patient subgroup from another. Novel approaches are therefore required to give deeper insight into the underlying biology of the disease, enable new therapies, and create high quality evidence to help improve patient care. A greater understanding of the clinically relevant differences between the different AD phenotypes, and the consequential development of more targeted and efficient treatment strategies, may improve the chances of success in downstream clinical trials by selecting AD patients more likely to respond to the novel treatments under development. The PrecisionLife platform utilizes a hypothesis-free method for the detection of combinations of features that together are strongly associated with variations in disease risk, mechanistic etiology, symptoms and progression rates often observed in patients with cognitive decline. This combinatorial approach allows for the stratification of complex chronic diseases like Alzheimer’s into multiple patient subgroups that have corresponding genetic biomarkers, enabling us to identify and understand more deeply the biological mechanisms that are driving disease within those subgroups. This precision neuroscience approach will allow for the development of more effective novel therapeutic strategies based on identification of novel targets with strong genetic evidence within patient subgroups and accompanying stratification biomarkers. This will bring additional clinical utility in diagnosing patients faster, more accurately predicting their disease trajectory, and their response to treatment. To this end, we are seeking access to ADNI data to further scientific research into Alzheimer’s disease and mild cognitive impairment (MCI), specifically investigating the genetic and biological mechanisms underpinning the rate of progression from MCI to Alzheimer’s disease. This study is expected to identify novel targets for developing treatments that are aimed at decelerating disease progression from MCI to Alzheimer’s and new biomarkers that can be used to more accurately, more cheaply and more quickly diagnose or stratify AD patients for clinical trial recruitment and for effective positioning within the healthcare system. PrecisionLife operates under ISO27001 and ISO 27701 accreditation and all personnel are trained in, and used to working under, the US Health Insurance Portability and Accountability Act (HIPAA). To meet our research aim we propose the following studies utilizing ADNI data: The PrecisionLife Platform The PrecisionLife platform can be applied to case-control datasets to identify combinations of SNP genotypes that when observed together in a sample were strongly associated with the case population. The PrecisionLife platform uses a unique data analytics framework that enables efficient combinatorial analysis of large, multi-dimensional participant datasets. Navigating this data space allows for the identification of combinations of features that are significantly associated with groups of cases in a case-control dataset. The PrecisionLife combinatorial analysis is hypothesis free, involving a four-stage mining, validation, evaluation and annotation process. The PrecisionLife platform identifies combinations of feature states in ‘layers’ of increasing combinatorial complexity, i.e., singletons, pairs, triplets etc. A feature could for example be a SNP, and a feature state would consist of the SNP’s base index and its genotype, which would typically be encoded ordinally as {0, 1, 2} for homozygous major allele, heterozygous minor allele, homozygous minor allele respectively. The platform has considerably more flexibility of representation (including alternate genotype encodings, extended genetic models, polyploidy and quantitative values) if required by the feature or dataset being analyzed. In the mining phase, combinations of feature states that are overrepresented (using a Z-score or Fisher’s Exact test) in cases are identified and validated. Multiple feature states are combined iteratively until no additional features can be added that will improve the score. Combinations of feature states that have high odds ratios, low p-values (p < 0.05) and high prevalence (>5%) in cases are prioritized. The mining process is repeated across up to 1,000 cycles of fully randomized permutation of the case:control labels of all individuals in the dataset, keeping the same parameters and case-control ratio. In the validation phase, all combinations generated by the original mining run and each of the random permutation iterations of the dataset are compared. These combinations are validated using network properties such as minimum prevalence (number of cases represented, in this case >5%) as the null hypothesis when compared with the combinations generated by the random permutations. Combinations that appear in the random permutations above a specified FDR threshold (Benjamini-Hochberg FDR of 0.05) after multiple testing correction are considered to be random and eliminated. Combinations passing these tests are reported as validated disease signatures. The validated disease signatures are then evaluated. The features (which in this case only consisted of SNPs due to the limited available dataset) shared by multiple disease signatures (known as ‘critical’ SNPs) are identified. Critical SNPs, which can be thought of as the canonical features of a cluster comprised of overlapping disease signatures, are then scored using a Random Forest (RF) algorithm in a 5-fold cross-validation framework to evaluate the accuracy with which they predict the observed case-control split in a dataset (minimizing Gini impurity or the probability of misclassification). We use RF scores in similar ways to rank critical SNPs and by association the genes. Disease signatures comprising high RF scoring critical SNPs (and their genes) are then mapped to the cases in which they were found, and additional clinical data (such as blood biochemistry data, comorbidity ICD-10 codes and medication history) is used to generate a patient profile for each combinatorial disease signature. Finally, a merged network (disease architecture) view is generated by clustering all validated disease signatures based on their co-occurrence in patients in the dataset, and annotation of the validated SNPs, genes, and the druggability of targets is performed using a semantic knowledge graph. This methodology has been validated and replicated across multiple chronic, complex diseases including COVID-19 and ME/CFS (Taylor et al, 2020, Das et al, 2022). Study 1 In Study 1, PrecisionLife will perform a combinatorial analysis of genotype data from all four ADNI cohorts, comparing subjects with Alzheimer’s disease and MCI against a matched dataset of healthy controls. This will generate a disease architecture for the combined Alzheimer’s and MCI population and enable a hypothesis-free evaluation of the genetic differences which underly different subgroups of Alzheimer’s and MCI patients. A post-hoc analysis will then be performed using clinical and phenotypic data to identify patient subgroups that are highly associated with different progression rates, cognitive function test scores (MMSE, ADAS-Cog), and other clinically relevant phenotypes, such as patients with mild cognitive impairment that progresses to Alzheimer’s. Additionally, identification of patients subgroups that possess other neurological or neuropsychiatric conditions using data like the Neuropsychiatric Inventory (NPI) and Geriatric Depression Scale (GDS) will take place. Study 2 Study 2 will be split into two parts with two distinct case cohorts derived from all four ADNI groups, dividing the entire MCI subject population based on disease progression: • 2A: MCI fast progressors • 2B: MCI slow progressors Longitudinal cognitive function test scores for the MCI population will be used to define slow and fast progression populations. Rate of decrease for cognitive function scores (MMSE, ADAS-Cog) between different intervals will be used to identify and divide MCI subjects into fast or slow progression groups. Both sets of cases (2A and 2B) will be run independently through the PrecisionLife platform against the same cohort of healthy controls from ADNI and the outputs compared, looking to identify genetic and biological pathway differences between MCI fast and slow progressor groups.