TY - JOUR T1 - Community Assessment of the Predictability of Cancer Protein and Phosphoprotein Levels from Genomics and Transcriptomics. JF - Cell Syst Y1 - 2020 A1 - Yang, Mi A1 - Petralia, Francesca A1 - Li, Zhi A1 - Li, Hongyang A1 - Ma, Weiping A1 - Song, Xiaoyu A1 - Kim, Sunkyu A1 - Lee, Heewon A1 - Yu, Han A1 - Lee, Bora A1 - Bae, Seohui A1 - Heo, Eunji A1 - Kaczmarczyk, Jan A1 - Stępniak, Piotr A1 - Warchoł, Michał A1 - Yu, Thomas A1 - Calinawan, Anna P A1 - Boutros, Paul C A1 - Payne, Samuel H A1 - Reva, Boris A1 - Boja, Emily A1 - Rodriguez, Henry A1 - Stolovitzky, Gustavo A1 - Guan, Yuanfang A1 - Kang, Jaewoo A1 - Wang, Pei A1 - Fenyö, David A1 - Saez-Rodriguez, Julio KW - Crowdsourcing KW - Female KW - Genomics KW - Humans KW - Machine Learning KW - Male KW - Neoplasms KW - Phosphoproteins KW - Proteins KW - Proteomics KW - Transcriptome AB -

Cancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteogenomic challenge. We asked for methods to predict protein and phosphorylation levels from genomic and transcriptomic data in cancer patients. The best performance was achieved by an ensemble of models, including as predictors transcript level of the corresponding genes, interaction between genes, conservation across tumor types, and phosphosite proximity for phosphorylation prediction. Proteins from metabolic pathways and complexes were the best and worst predicted, respectively. The performance of even the best-performing model was modest, suggesting that many proteins are strongly regulated through translational control and degradation. Our results set a reference for the limitations of computational inference in proteogenomics. A record of this paper's transparent peer review process is included in the Supplemental Information.

VL - 11 IS - 2 U1 - https://www.ncbi.nlm.nih.gov/pubmed/32710834?dopt=Abstract ER - TY - JOUR T1 - Drug repurposing for COVID-19 using machine learning and mechanistic models of signal transduction circuits related to SARS-CoV-2 infection. JF - Signal Transduct Target Ther Y1 - 2020 A1 - Loucera, Carlos A1 - Esteban-Medina, Marina A1 - Rian, Kinza A1 - Falco, Matias M A1 - Dopazo, Joaquin A1 - Peña-Chilet, Maria KW - Computational Chemistry KW - COVID-19 KW - drug repositioning KW - Humans KW - Machine Learning KW - Molecular Docking Simulation KW - Molecular Targeted Therapy KW - Proteins KW - SARS-CoV-2 KW - Signal Transduction VL - 5 IS - 1 U1 - https://www.ncbi.nlm.nih.gov/pubmed/33311438?dopt=Abstract ER - TY - JOUR T1 - Towards Improving Skin Cancer Diagnosis by Integrating Microarray and RNA-Seq Datasets. JF - IEEE J Biomed Health Inform Y1 - 2020 A1 - Galvez, Juan M A1 - Castillo-Secilla, Daniel A1 - Herrera, Luis J A1 - Valenzuela, Olga A1 - Caba, Octavio A1 - Prados, Jose C A1 - Ortuno, Francisco M A1 - Rojas, Ignacio KW - Biomarkers, Tumor KW - Computational Biology KW - Diagnosis, Computer-Assisted KW - Gene Expression Profiling KW - Humans KW - Machine Learning KW - RNA-seq KW - Skin Neoplasms AB -

Many clinical studies have revealed the high biological similarities existing among different skin pathological states. These similarities create difficulties in the efficient diagnosis of skin cancer, and encourage to study and design new intelligent clinical decision support systems. In this sense, gene expression analysis can help find differentially expressed genes (DEGs) simultaneously discerning multiple skin pathological states in a single test. The integration of multiple heterogeneous transcriptomic datasets requires different pipeline stages to be properly designed: from suitable batch merging and efficient biomarker selection to automated classification assessment. This article presents a novel approach addressing all these technical issues, with the intention of providing new sights about skin cancer diagnosis. Although new future efforts will have to be made in the search for better biomarkers recognizing specific skin pathological states, our study found a panel of 8 highly relevant multiclass DEGs for discerning up to 10 skin pathological states: 2 healthy skin conditions a priori, 2 cataloged precancerous skin diseases and 6 cancerous skin states. Their power of diagnosis over new samples was widely tested by previously well-trained classification models. Robust performance metrics such as overall and mean multiclass F1-score outperformed recognition rates of 94% and 80%, respectively. Clinicians should give special attention to highlighted multiclass DEGs that have high gene expression changes present among them, and understand their biological relationship to different skin pathological states.

VL - 24 IS - 7 U1 - https://www.ncbi.nlm.nih.gov/pubmed/31871000?dopt=Abstract ER - TY - JOUR T1 - Antibiotic resistance and metabolic profiles as functional biomarkers that accurately predict the geographic origin of city metagenomics samples. JF - Biol Direct Y1 - 2019 A1 - Casimiro-Soriguer, Carlos S A1 - Loucera, Carlos A1 - Perez Florido, Javier A1 - López-López, Daniel A1 - Dopazo, Joaquin KW - biomarkers KW - Cities KW - Drug Resistance, Microbial KW - Machine Learning KW - Metabolome KW - Metagenome KW - metagenomics KW - Microbiota AB -

BACKGROUND: The availability of hundreds of city microbiome profiles allows the development of increasingly accurate predictors of the origin of a sample based on its microbiota composition. Typical microbiome studies involve the analysis of bacterial abundance profiles.

RESULTS: Here we use a transformation of the conventional bacterial strain or gene abundance profiles to functional profiles that account for bacterial metabolism and other cell functionalities. These profiles are used as features for city classification in a machine learning algorithm that allows the extraction of the most relevant features for the classification.

CONCLUSIONS: We demonstrate here that the use of functional profiles not only predict accurately the most likely origin of a sample but also to provide an interesting functional point of view of the biogeography of the microbiota. Interestingly, we show how cities can be classified based on the observed profile of antibiotic resistances.

REVIEWERS: Open peer review: Reviewed by Jin Zhuang Dou, Jing Zhou, Torsten Semmler and Eran Elhaik.

VL - 14 IS - 1 U1 - https://www.ncbi.nlm.nih.gov/pubmed/31429791?dopt=Abstract ER - TY - JOUR T1 - Exploring the druggable space around the Fanconi anemia pathway using machine learning and mechanistic models. JF - BMC Bioinformatics Y1 - 2019 A1 - Esteban-Medina, Marina A1 - Peña-Chilet, Maria A1 - Loucera, Carlos A1 - Dopazo, Joaquin KW - Databases, Factual KW - Fanconi Anemia KW - Genomics KW - Humans KW - Machine Learning KW - Phenotype KW - Proteins KW - Signal Transduction AB -

BACKGROUND: In spite of the abundance of genomic data, predictive models that describe phenotypes as a function of gene expression or mutations are difficult to obtain because they are affected by the curse of dimensionality, given the disbalance between samples and candidate genes. And this is especially dramatic in scenarios in which the availability of samples is difficult, such as the case of rare diseases.

RESULTS: The application of multi-output regression machine learning methodologies to predict the potential effect of external proteins over the signaling circuits that trigger Fanconi anemia related cell functionalities, inferred with a mechanistic model, allowed us to detect over 20 potential therapeutic targets.

CONCLUSIONS: The use of artificial intelligence methods for the prediction of potentially causal relationships between proteins of interest and cell activities related with disease-related phenotypes opens promising avenues for the systematic search of new targets in rare diseases.

VL - 20 IS - 1 U1 - https://www.ncbi.nlm.nih.gov/pubmed/31266445?dopt=Abstract ER -