Semantic modelling in classification and Boolean networks
GutachterKestler, Hans A.
InstitutionenInstitut für Medizinische Systembiologie
Institut für Allgemeine Physiologie
Biological systems comprise many components which interact with each other in complex networks. Furthermore, a huge body of high-dimensional gene expression data is available. Machine-aided analysis of this data often yields information difficult to interpret. This work addresses both the use of signalling networks as models for biological processes, and the use of domain knowledge in feature selection to create interpretable models. Ageing affects nearly all living organisms. The molecular wingless (Wnt), insulin-like growth factor (IGF) and nuclear factor kappa-light-chain-enhancer of activated B-cells (NF-κB) signalling pathways are known to be involved in ageing. There are known interactions (crosstalk) between IGF and Wnt signalling. In this work, a Boolean network of the crosstalk between Wnt and IGF signalling (Boolean function compiled from literature statements of single interactions) is presented and used to successfully simulate the ageing process including ageing-associated deregulations. Further analysis of the crosstalk network compared to its signular signalling subnetworks showed that crosstalk was required to simulate ageing. Robustness analyses also showed that the crosstalk stabilised the network against potential stressors. To study NF-κB signalling in ageing, Boolean networks of aged and young phenotypes were reconstructed. All data-driven derived interactions could be verified using the STRING database. It could be shown that the networks of the young phenotype were more robust to perturbation and had more interconnections and redundancies than the aged phenotype. Also, in the aged phenotype there were four genes which were deregulated and could be linked to ageing-associated processes. The presented Boolean networks are able to accurately model ageing which was corroborated using literature, and display similar ageing-typical traits independently from the model setup. Semantic feature selection using domain knowledge can be used to guide the classifier training, to create interpretable models and to generate high-level hypotheses on the underlying data. Described in this work is a newly developed semantic multi-class classifier system (SMCCS) which combines semantic feature selection and multi-class discrimination. The new method was tested on six datasets using different classifiers and classifier ensembles. In precision medicine, often different subtypes of a disease have to be distinguished. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database was used as domain knowledge. The SMCCS achieved classification results comparable to its reference experiments, and was also able to provide stable semantic feature signatures. Most could be connected to the classification task while few seemingly unconnected signatures provided new high-level hypotheses on the data. To further prove the capacity of semantic feature selection, semantic multi-classifier systems (S-MCS) were used to create feature signatures for heart failure in zebrafish. KEGG and Gene Ontology were used as domain knowledge. S-MCS achieved the best classification results compared to similar classifiers. The feature signatures were evaluated in context of heart failure and there was ample evidence of such connections. The selected signatures were successfully transferred to classify a dataset of rat origin to provide more evidence on the validity of semantic feature selection. Both established approaches - SMCCS and S-MCS - performed equally or better than previous classifiers while also providing stable, biologically relevant feature signatures.
Erstellung / Fertigstellung
Wird ergänzt durchhttps://doi.org/10.3390/biom8040158
Schlagwörter[GND]: Klassifikation | Bioinformatik | Netzwerk | Genregulation
[LCSH]: Semantic web | Gene regulatory networks
[MeSH]: Computational biology; Classification
[Freie Schlagwörter]: Semantische Klassifikation | Boolesche Netze | Genregulationsnetzwerk | Boolesche Modelle
[DDC Sachgruppe]: DDC 610 / Medicine & health
LizenzCC BY 4.0 International
DOI & Zitiervorlage
Nutzen Sie bitte diesen Identifier für Zitate & Links: http://dx.doi.org/10.18725/OPARU-39970