SARS-CoV-2 whole-genome Imputation
Mechanistic models of signaling pathways
Mechanistic models of the COVID-19 disease
A crowdsourcing database of the Spanish population genetic variability
Crowdsourcing initiative to provide information about Copy Number Variations of the Spanish population to the scientific/medical community
SMA Carrier Analysis tool
Transcription Factor Target Enrichment Analysis
Differential metabolic activity and discovery of therapeutic targets using summarized metabolic pathway models
A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways
Next Generation Sequencing data Capture Assessment Tool
impuSARS: SARS-CoV-2 whole-genome Imputation
The impuSARS is a command-line application to impute missing genomics data in viral genomes, especially SARS-CoV-2. The impuSARS has been evaluated in a wide range of conditions (continuous fragments, amplicons or sparse individual positions missing) showing great fidelity when reconstructing the original genomic sequences, recovering the lineage with a 100% precision for almost all the lineages, even in very poorly covered genomes (
Imputation can improve the pace of SARS-CoV-2 sequencing production by recovering many incomplete or low-quality sequences that would be otherwise discarded. impuSARS can be incorporated in any primary data processing pipeline for SARS-CoV-2 WGS.
The impuSARS is developed in Python and distributed as a Docker image. Also, impuSARS can be installed as a conda environment in Python. The tool is based on Minimac, the most widely used strategy for genomic data imputation, and takes advantage of the enormous amount of SARS-CoV-2 whole genome sequences available in GISAID to create a detailed reference panel. The last used reference panel (v3.0) includes >900K SARS-CoV-2 sequences. After imputation, impuSARS also estimates the recovered lineage provided by Pangolin.
HiPathia: Mechanistic models of signaling pathways
What is a mechanistic model?
Mechanistic models aim to bridge the gap between easily available genomic data, which account for gene activity (transcriptomics) or gene integrity (genome/exome sequencing), and the complex cell or organism phenotype (cell functional decisions or fate, ultimately responsible for the observed conditions, e.g. disease, drug response, etc.)
Genome-scale mechanistic models rely on the knowledge on cell signaling and metabolism already available (e.g. KEGG, Reactome, etc.) over which a mathematical model is build. These models that can quantify the intensity of signal transduction from the original measurements of gene expression, and consequently the activity of the different signaling circuits. They are called mechanistic models because they model the molecular mechanisms that dictate cell action and fate. Since they convey the notion of causality these models can be used not only to understand in detail the disease mechanisms but also to simulate the effects of interventions (e.g. drug inhibitions).
Here we present the mechanistic model HiPathia (Hidalgo et al, 2017), a model that simulates the transduction of the signal along signaling circuits in the pathways (see Figure 1), taking the gene expression values as proxies of the corresponding protein activities and considering distinct types of activities (inhibitions and activations). HiPathia is an improvement of a previous algorithm (Sebastian-Leon et al., 2013, 2014) that overcomes some limitations of the probabilistic approach.
A recent benchmarking has demonstrated that HiPathia algorithm outperforms other competing algorithms for modeling signaling pathways mentioned above. The mechanistic model implemented in HiPathia has been successfully used to understand the disease mechanisms behind different cancers, including neuroblastoma, cancer-prone rare genodermatoses, common diseases such as diabetes, the response of cell lines to drugs (Amadoz et al, 2015), drug repositioning (Esteban-Medina et al, 2019, Loucera et al., 2020) and other biologically interesting scenarios such as the molecular mechanisms that explain how stress-induced activation of brown adipose tissue prevents obesity (Razzoli et al, 2016) or the mechanisms of death and the post-mortem ischemia of a tissue. Moreover, mechanistic models have recently been used to deconvolute the functional landscape at the level of single cell in glioblastoma.
Currently three implementations of the HiPathia mechanistic model of signaling pathways are available:
R/Bioconductor, package for experienced users interested in a programmatic use of the algorithm.
Cytoscape plugin, which offer a graphic environment for end users of the Cytoscape community.
Web Tool, with a dynamic intuitive graphical interface, useful for inexperienced users, is also available. The web interface implements extra functionalities beyond the classical differential circuit activity for two-class comparisons, that include the analysis of the impact of simulated interventions (inhibitions, namely knock-outs or knock-downs, over-expressions, etc.) over the activity of the pathways and the evaluation of the potential consequences of mutations over signaling. Moreover, the web interface allows building predictors using signaling circuit activities as features. Interestingly, the features selected by the predictor as relevant for class discrimination provide at the same time valuable insights on the molecular mechanisms that explain the differences between the conditions to discriminate, namely diseases, drug action mechanisms, etc.
Hidalgo MR, Cubuk C, Amadoz A, Salavert F, Carbonell-Caballero J, Dopazo J: High throughput estimation of functional cell activities reveals disease mechanisms and predicts relevant clinical outcomes. Oncotarget 2017, 8:5160-5178
CoV-HiPathia: mechanistic models of the COVID-19 disease
CoV-Hipathia (Rian et al., 2021)[https://biodatamining.biomedcentral.com/articles/10.1186/s13040-021-0023... is a web interface that implements a comprehensive mechanistic model of the SARS-CoV-2 disease map (Ostaszewski et al., 2020)[https://www.nature.com/articles/s41597-020-0477-8]. In this framework, the detailed activity of the human signaling circuits related to the viral infection, covering from the entry and replication mechanisms to the downstream consequences as inflammation and antigenic response, can be inferred from gene expression experiments. Moreover, the effect of potential interventions, such as knock-downs, or drug effects (currently the system models the effect of more than 8000 DrugBank drugs) can be studied. This freely available tool not only provides an unprecedentedly detailed view of the mechanisms of viral invasion and the consequences in the cell but has also the potential of becoming an invaluable asset in the search for efficient antiviral treatments
CoV-Hipathia is available here: http://hipathia.babelomics.org/covid19/.
Rian K, Esteban-Medina M, Hidalgo MR, Çubuk C, Falco MM, Loucera C, Gunyel D, Ostaszewski M, Peña-Chilet M, Dopazo J. Mechanistic modeling of the SARS-CoV-2 disease map[https://biodatamining.biomedcentral.com/articles/10.1186/s13040-021-0023.... BioData Min. 2021 Jan 21;14(1):5. doi: 10.1186/s13040-021-00234-1.
CSVS: A crowdsourcing database of the Spanish population genetic variability
The knowledge of the genetic variability of the local population is of utmost importance in personalized medicine and has been revealed as a critical factor for the discovery of new disease variants. Here, we present the Collaborative Spanish Variability Server (CSVS) (Peña-Chilet el al., 2020), which currently contains more than 2000 genomes and exomes of unrelated Spanish individuals. This database has been generated in a collaborative crowdsourcing effort collecting sequencing data produced by local genomic projects and for other purposes, such as the MGP (Dopazo et al., 2016). Sequences have been grouped by ICD10 upper categories. A web interface allows querying the database removing one or more ICD10 categories. In this way, aggregated counts of allele frequencies of the pseudo-control Spanish population can be obtained for diseases belonging to the category removed. Interestingly, in addition to pseudo-control studies, some population studies can be made, as, for example, prevalence of pharmacogenomic variants, etc. In addition, this genomic data has been used to define the first Spanish Genome Reference Panel (SGRP1.0) for imputation. This is the first local repository of variability entirely produced by a crowdsourcing effort and constitutes an example for future initiatives to characterize local variability worldwide. CSVS is also part of the GA4GH Beacon network.
CSVS is available here: http://csvs.babelomics.org/
Peña-Chilet M, Roldán G, Perez-Florido J, Ortuño FM, Carmona R, Aquino V, Lopez-Lopez D, Loucera C, Fernandez-Rueda JL, Gallego A, García-Garcia F, González-Neira A, Pita G, Núñez-Torres R, Santoyo-López J, Ayuso C, Minguez P, Avila-Fernandez A, Corton M, Moreno-Pelayo MÁ, Morin M, Gallego-Martinez A, Lopez-Escamez JA, Borrego S, Antiñolo G, Amigo J, Salgado-Garrido J, Pasalodos-Sanchez S, Morte B; Spanish Exome Crowdsourcing Consortium, Carracedo Á, Alonso Á, Dopazo J. CSVS, a crowdsourcing database of the Spanish population genetic variability. Nucleic Acids Res. 2021 Jan 8;49(D1):D1130-D1137. doi: 10.1093/nar/gkaa794.
SPACNACS: A crowdsourcing initiative to provide information about Copy Number Variations of the Spanish population to the scientific/medical community.
SPACNACS is a crowdsourcing initiative to provide information about Copy Number Variations of the Spanish population to the scientific/medical community. We accept submissions from WES or WGS, no matter whether these come from healthy or diseased individuals.
The sequences were contributed by different consortiums and projects, including groups from the Spanish Network for Research in Rare Diseases, CIBERER, results from the EnoD, the Project Genome 1000 Navarra and other research groups and initiatives across Spain.
SPACNACS is an open resource available at http://csvs.clinbioinfosspa.es/spacnacs
SMAca: SMA Carrier Analysis tool.
Spinal Muscular Atrophy (SMA) is a severe neuromuscular autosomal recessive disorder affecting 1/10,000 live births. Most SMA patients present homozygous deletion of SMN1, while most SMA carriers present only a single SMN1 copy. The sequence similarity between SMN1 and SMN2, and the complexity of the SMN locus, make the estimation of the SMN1 copy-number difficult by next generation sequencing (NGS).
SMAca is a python tool to detect putative SMA carriers and estimate the absolute SMN1 copy-number in a population. Moreover, SMAca takes advantage of the knowledge of certain variants specific to SMN1 duplication to also identify the so-called “silent carriers” (i.e. individuals with two copies of SMN1 on one chromosome, but none on the other).
This tool is developed with multithreading support to afford high performance and a focus on easy installation. This combination makes it especially attractive to be integrated into production NGS pipelines.
SMAca is an open resource available at https://github.com/babelomics/SMAca
Lopez‐Lopez, D, Loucera, C, Carmona, R, et al. SMN1 copy‐number and sequence variant analysis from next‐generation sequencing data. Human Mutation. 2020; 1– 5. 10.1002/humu.24120.
TFTEA: Transcription Factor Target Enrichment Analysis
You can find TFTEA documentation and tutorials at: https://github.com/babelomics/tftea/wiki
You can report bugs or request new features at GitHub issue tracking.
Release Notes and Roadmap
Releases notes are available at GitHub releases.
Metabolizer is available here: https://github.com/babelomics/TFTEA
Metabolizer: Differential metabolic activity and discovery of therapeutic targets using summarized metabolic pathway models
Metabolizer is a web-based application that offers an intuitive, easy-to-use interactive interface to analyze differences in pathway metabolic module activities that can also be used for class prediction and in silico prediction of knock-out (KO) effects (Cubuk et al., 2019). Moreover, Metabolizer can automatically predict the optimal KO intervention for restoring a diseased phenotype. We provide different types of validations of some of the predictions made by Metabolizer. Metabolizer is a web tool that allows understanding molecular mechanisms of disease or the MoA of drugs within the context of the metabolism by using gene expression measurements. In addition, this tool automatically suggests potential therapeutic targets for individualized therapeutic interventions (Cubuk et al., 2018).
Metabolizer is available here: http://metabolizer.babelomics.org/
Çubuk C, Hidalgo MR, Amadoz A, Rian K, Salavert F, Pujana MA, Mateo F, Herranz C, Carbonell-Caballero J, Dopazo J. Differential metabolic activity and discovery of therapeutic targets using summarized metabolic pathway models. NPJ Syst Biol Appl. 2019 Mar 1;5:7. doi: 10.1038/s41540-019-0087-2. eCollection 2019.
MIGNON: A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways
MIGNON is a workflow for the analysis of RNA-Seq experiments, which not only efficiently manages the estimation of gene expression levels from raw sequencing reads, but also calls genomic variants present in the transcripts analyzed. Moreover, this is the first workflow that provides a framework for the integration of transcriptomic and genomic data based on a mechanistic model of signaling pathway activities that allows a detailed biological interpretation of the results, including a comprehensive functional profiling of cell activity. MIGNON covers the whole process, from reads to signaling circuit activity estimations, using state-of-the-art tools, it is easy to use and it is deployable in different computational environments, allowing an optimized use of the resources available.
MIGNON is available here: https://github.com/babelomics/MIGNON/
The documentation can be found at https://babelomics.github.io/MIGNON/
Instructions to run a bash script to perform a dry run can be found at https://babelomics.github.io/MIGNON/1_installation.html
Garrido-Rodriguez M, Lopez-Lopez D, Ortuno FM, Peña-Chilet M, Muñoz E, Calzado MA, Dopazo J. A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways. PLoS Comput Biol. 2021 Feb 11;17(2):e1008748.
ngsCAT:Next Generation Sequencing data Capture Assessment Tool
ngsCAT (Lopez-Domingo et al., 2014) is a command-line application written in Python which facilitates a comprehensive evaluation of the performance of the capture step in targeted high-throughput sequencing experiments in terms of:
Sensitivity, which assesses the quality of the coverage on target regions. It is also important to provide a means of estimating how this coverage would improve by increasing sequencing depth.
Specificity, which measures how much of the sequencing effort is wasted on sequencing off-target bases.
Uniformity, which assesses sequencing biases due to specific genomic locations or nucleotide composition.
ngsCAT is an easy-to-use tool that can be run with just one command line in a standard computer, generating a detailed HTML report with metrics, summary tables, figures and plots that evaluate the efficiency of targeted enrichment sequencing.
ngsCAT is available at http://ngscat.clinbioinfosspa.es/.
Francisco J. López-Domingo, Javier P. Florido, Antonio Rueda, Joaquín Dopazo and Javier Santoyo-López (2014) ngsCAT: a tool to assess the efficiency of targeted enrichment sequencing, Bioinformatics, vol.30, no.12, pp.1767-1768, 2014; doi:10.1093/bioinformatics/btu108.