The fast development of high-throughput technologies such as microarray or next-generation sequencing, and the consequent in-depth investigation of the genome in several international large scale projects, have led to the generation of large amounts of high-dimensional omics datasets. Scientists can use such data to acquire a deep understanding of complex cellular mechanisms, the molecular basis of diseases’ development, etc. Among other questions, relationships between different genes or other similar units can reveal regulatory mechanisms whose disruption can be associated with diseases. Network inference methods and, more specifically, graphical models estimation can be used to identify gene relationships and direct interactions not mediated by other factors. Simply speaking, a graphical model is a graph whose vertices correspond to random variables and edges denote conditional dependence relationships between them. There are plenty of methods for carrying out graphical model inference from a given dataset, even in the high-dimensional setting where the number of variables is much larger than the number of samples (a common situation in omics studies for the enormous number of genes involved and a limited number of samples collected). However, nowadays, it is common to collect and analyze more than one dataset. Multiple datasets can be obtained in different laboratories or with different technologies, arise from various studies, or be of different omics types. Their joint analysis can lead to a more accurate characterization of the underlying biological system, but it also requires specific techniques. In this thesis, we propose jewel – a novel method for the joint analysis of multiple datasets under the assumption that they are drawn from Gaussian distributions that share the same network dependency. In this context, the conditional dependence relationships between variables (genes) are encoded by the inverse covariance matrix. Although we assume that the conditional dependence structure is the same between different conditions, we let the covariance matrices be different to account for different sources of data origin. In this setting, combining the individual datasets into a single one and estimating a sole graphical model would mask the covariance matrices’ heterogeneity, while estimating separate models for each case would not take advantage of the common underlying structure. Therefore, a joint analysis of the datasets is preferable, and to this aim in this thesis we present a novel joint estimation method jewel. It extends the Meinshausen and Bühlmann regression-based approach to the case of multiple datasets by the mean of a group lasso penalty which guarantees the symmetry of the solution. We design a fast algorithm for the method’s implementation, incorporating the smart active shooting approach for a fixed regularization parameter and the warm start approach for an entire grid of regularization parameters. We also state a theorem for jewel’s consistency, providing upper and lower bounds for regularization parameter. Moreover, we extend the Bayesian information criterion and cross-validation procedures to the multiple datasets framework to provide a practical tool for real case applications. We explore the behavior of jewel in different simulation settings, analyzing the influence of various input parameters, and comparing the method to other available alternatives for joint estimation, revealing good and competitive performances. Finally, we illustrate the method’s performance in real data example regarding transcriptional regulatory networks based on gene expression data. We implement the proposed method in the novel R package jewel.

Joint estimation of multiple graphical models / Plaksienko, Anna. - (2021 Apr 30).

Joint estimation of multiple graphical models

PLAKSIENKO, ANNA
2021

Abstract

The fast development of high-throughput technologies such as microarray or next-generation sequencing, and the consequent in-depth investigation of the genome in several international large scale projects, have led to the generation of large amounts of high-dimensional omics datasets. Scientists can use such data to acquire a deep understanding of complex cellular mechanisms, the molecular basis of diseases’ development, etc. Among other questions, relationships between different genes or other similar units can reveal regulatory mechanisms whose disruption can be associated with diseases. Network inference methods and, more specifically, graphical models estimation can be used to identify gene relationships and direct interactions not mediated by other factors. Simply speaking, a graphical model is a graph whose vertices correspond to random variables and edges denote conditional dependence relationships between them. There are plenty of methods for carrying out graphical model inference from a given dataset, even in the high-dimensional setting where the number of variables is much larger than the number of samples (a common situation in omics studies for the enormous number of genes involved and a limited number of samples collected). However, nowadays, it is common to collect and analyze more than one dataset. Multiple datasets can be obtained in different laboratories or with different technologies, arise from various studies, or be of different omics types. Their joint analysis can lead to a more accurate characterization of the underlying biological system, but it also requires specific techniques. In this thesis, we propose jewel – a novel method for the joint analysis of multiple datasets under the assumption that they are drawn from Gaussian distributions that share the same network dependency. In this context, the conditional dependence relationships between variables (genes) are encoded by the inverse covariance matrix. Although we assume that the conditional dependence structure is the same between different conditions, we let the covariance matrices be different to account for different sources of data origin. In this setting, combining the individual datasets into a single one and estimating a sole graphical model would mask the covariance matrices’ heterogeneity, while estimating separate models for each case would not take advantage of the common underlying structure. Therefore, a joint analysis of the datasets is preferable, and to this aim in this thesis we present a novel joint estimation method jewel. It extends the Meinshausen and Bühlmann regression-based approach to the case of multiple datasets by the mean of a group lasso penalty which guarantees the symmetry of the solution. We design a fast algorithm for the method’s implementation, incorporating the smart active shooting approach for a fixed regularization parameter and the warm start approach for an entire grid of regularization parameters. We also state a theorem for jewel’s consistency, providing upper and lower bounds for regularization parameter. Moreover, we extend the Bayesian information criterion and cross-validation procedures to the multiple datasets framework to provide a practical tool for real case applications. We explore the behavior of jewel in different simulation settings, analyzing the influence of various input parameters, and comparing the method to other available alternatives for joint estimation, revealing good and competitive performances. Finally, we illustrate the method’s performance in real data example regarding transcriptional regulatory networks based on gene expression data. We implement the proposed method in the novel R package jewel.
graphical models; joint modelling; lasso; regression
Joint estimation of multiple graphical models / Plaksienko, Anna. - (2021 Apr 30).
File in questo prodotto:
File Dimensione Formato  
2021_PhDThesis_Plaksienko.pdf

accesso aperto

Descrizione: PhD thesis
Tipologia: Tesi di dottorato
Licenza: Accesso gratuito
Dimensione 3.87 MB
Formato Adobe PDF
3.87 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12571/21632
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact