%0 Journal Article %J Viruses %D 2022 %T Assessing the Impact of SARS-CoV-2 Lineages and Mutations on Patient Survival. %A Loucera, Carlos %A Perez-Florido, Javier %A Casimiro-Soriguer, Carlos S %A Ortuno, Francisco M %A Carmona, Rosario %A Bostelmann, Gerrit %A Martínez-González, L Javier %A Muñoyerro-Muñiz, Dolores %A Villegas, Román %A Rodríguez-Baño, Jesús %A Romero-Gómez, Manuel %A Lorusso, Nicola %A Garcia-León, Javier %A Navarro-Marí, Jose M %A Camacho-Martinez, Pedro %A Merino-Diaz, Laura %A Salazar, Adolfo de %A Viñuela, Laura %A Lepe, Jose A %A García, Federico %A Dopazo, Joaquin %K COVID-19 %K Genome, Viral %K Humans %K mutation %K Pandemics %K Phylogeny %K SARS-CoV-2 %X

OBJECTIVES: More than two years into the COVID-19 pandemic, SARS-CoV-2 still remains a global public health problem. Successive waves of infection have produced new SARS-CoV-2 variants with new mutations for which the impact on COVID-19 severity and patient survival is uncertain.

METHODS: A total of 764 SARS-CoV-2 genomes, sequenced from COVID-19 patients, hospitalized from 19th February 2020 to 30 April 2021, along with their clinical data, were used for survival analysis.

RESULTS: A significant association of B.1.1.7, the alpha lineage, with patient mortality (log hazard ratio (LHR) = 0.51, C.I. = [0.14,0.88]) was found upon adjustment by all the covariates known to affect COVID-19 prognosis. Moreover, survival analysis of mutations in the SARS-CoV-2 genome revealed 27 of them were significantly associated with higher mortality of patients. Most of these mutations were located in the genes coding for the S, ORF8, and N proteins.

CONCLUSIONS: This study illustrates how a combination of genomic and clinical data can provide solid evidence for the impact of viral lineage on patient survival.

%B Viruses %V 14 %8 2022 Aug 27 %G eng %N 9 %R 10.3390/v14091893 %0 Journal Article %J Clin Microbiol Infect %D 2020 %T Association of a single nucleotide polymorphism in the ubxn6 gene with long-term non-progression phenotype in HIV-positive individuals. %A Díez-Fuertes, F %A De La Torre-Tarazona, H E %A Calonge, E %A Pernas, M %A Bermejo, M %A García-Pérez, J %A Álvarez, A %A Capa, L %A García-García, F %A Saumoy, M %A Riera, M %A Boland-Auge, A %A López-Galíndez, C %A Lathrop, M %A Dopazo, J %A Sakuntabhai, A %A Alcamí, J %K Adaptor Proteins, Vesicular Transport %K Autophagy-Related Proteins %K Caveolin 1 %K Cohort Studies %K Dendritic Cells %K Disease Progression %K Gene Frequency %K Gene Knockdown Techniques %K Genetic Association Studies %K HeLa Cells %K HIV Infections %K HIV Long-Term Survivors %K HIV-1 %K Humans %K Macrophages %K Oligonucleotide Array Sequence Analysis %K Phenotype %K Polymorphism, Single Nucleotide %K whole exome sequencing %X

OBJECTIVES: The long-term non-progressors (LTNPs) are a heterogeneous group of HIV-positive individuals characterized by their ability to maintain high CD4 T-cell counts and partially control viral replication for years in the absence of antiretroviral therapy. The present study aims to identify host single nucleotide polymorphisms (SNPs) associated with non-progression in a cohort of 352 individuals.

METHODS: DNA microarrays and exome sequencing were used for genotyping about 240 000 functional polymorphisms throughout more than 20 000 human genes. The allele frequencies of 85 LTNPs were compared with a control population. SNPs associated with LTNPs were confirmed in a population of typical progressors. Functional analyses in the affected gene were carried out through knockdown experiments in HeLa-P4, macrophages and dendritic cells.

RESULTS: Several SNPs located within the major histocompatibility complex region previously related to LTNPs were confirmed in this new cohort. The SNP rs1127888 (UBXN6) surpassed the statistical significance of these markers after Bonferroni correction (q = 2.11 × 10). An uncommon allelic frequency of rs1127888 among LTNPs was confirmed by comparison with typical progressors and other publicly available populations. UBXN6 knockdown experiments caused an increase in CAV1 expression and its accumulation in the plasma membrane. In vitro infection of different cell types with HIV-1 replication-competent recombinant viruses caused a reduction of the viral replication capacity compared with their corresponding wild-type cells expressing UBXN6.

CONCLUSIONS: A higher prevalence of Ala31Thr in UBXN6 was found among LTNPs within its N-terminal region, which is crucial for UBXN6/VCP protein complex formation. UBXN6 knockdown affected CAV1 turnover and HIV-1 replication capacity.

%B Clin Microbiol Infect %V 26 %P 107-114 %8 2020 Jan %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/31158522?dopt=Abstract %R 10.1016/j.cmi.2019.05.015 %0 Journal Article %J Biol Direct %D 2019 %T Antibiotic resistance and metabolic profiles as functional biomarkers that accurately predict the geographic origin of city metagenomics samples. %A Casimiro-Soriguer, Carlos S %A Loucera, Carlos %A Perez Florido, Javier %A López-López, Daniel %A Dopazo, Joaquin %K biomarkers %K Cities %K Drug Resistance, Microbial %K Machine Learning %K Metabolome %K Metagenome %K metagenomics %K Microbiota %X

BACKGROUND: The availability of hundreds of city microbiome profiles allows the development of increasingly accurate predictors of the origin of a sample based on its microbiota composition. Typical microbiome studies involve the analysis of bacterial abundance profiles.

RESULTS: Here we use a transformation of the conventional bacterial strain or gene abundance profiles to functional profiles that account for bacterial metabolism and other cell functionalities. These profiles are used as features for city classification in a machine learning algorithm that allows the extraction of the most relevant features for the classification.

CONCLUSIONS: We demonstrate here that the use of functional profiles not only predict accurately the most likely origin of a sample but also to provide an interesting functional point of view of the biogeography of the microbiota. Interestingly, we show how cities can be classified based on the observed profile of antibiotic resistances.

REVIEWERS: Open peer review: Reviewed by Jin Zhuang Dou, Jing Zhou, Torsten Semmler and Eran Elhaik.

%B Biol Direct %V 14 %P 15 %8 2019 08 20 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/31429791?dopt=Abstract %R 10.1186/s13062-019-0246-9 %0 Journal Article %J BMC Bioinformatics %D 2017 %T ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data. %A Gonzalez, Sergio %A Clavijo, Bernardo %A Rivarola, Máximo %A Moreno, Patricio %A Fernandez, Paula %A Dopazo, Joaquin %A Paniego, Norma %K Animals %K Databases, Genetic %K Gene Expression Profiling %K High-Throughput Nucleotide Sequencing %K Internet %K Sequence Analysis, RNA %K Transcriptome %K User-Computer Interface %X

BACKGROUND: In the last years, applications based on massively parallelized RNA sequencing (RNA-seq) have become valuable approaches for studying non-model species, e.g., without a fully sequenced genome. RNA-seq is a useful tool for detecting novel transcripts and genetic variations and for evaluating differential gene expression by digital measurements. The large and complex datasets resulting from functional genomic experiments represent a challenge in data processing, management, and analysis. This problem is especially significant for small research groups working with non-model species.

RESULTS: We developed a web-based application, called ATGC transcriptomics, with a flexible and adaptable interface that allows users to work with new generation sequencing (NGS) transcriptomic analysis results using an ontology-driven database. This new application simplifies data exploration, visualization, and integration for a better comprehension of the results.

CONCLUSIONS: ATGC transcriptomics provides access to non-expert computer users and small research groups to a scalable storage option and simple data integration, including database administration and management. The software is freely available under the terms of GNU public license at http://atgcinta.sourceforge.net .

%B BMC Bioinformatics %V 18 %P 121 %8 2017 Feb 22 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/28222698?dopt=Abstract %R 10.1186/s12859-017-1494-2 %0 Journal Article %J Nucleic acids research %D 2016 %T Actionable pathways: interactive discovery of therapeutic targets using signaling pathway models. %A Salavert, Francisco %A Hidago, Marta R %A Amadoz, Alicia %A Cubuk, Cankut %A Medina, Ignacio %A Crespo, Daniel %A Carbonell-Caballero, José %A Joaquín Dopazo %K actionable genes %K Disease mechanism %K drug action mechanism %K Drug discovery %K pathway analysis %K personalized medicine %K signalling %K therapeutic targets %X The discovery of actionable targets is crucial for targeted therapies and is also a constituent part of the drug discovery process. The success of an intervention over a target depends critically on its contribution, within the complex network of gene interactions, to the cellular processes responsible for disease progression or therapeutic response. Here we present PathAct, a web server that predicts the effect that interventions over genes (inhibitions or activations that simulate knock-outs, drug treatments or over-expressions) can have over signal transmission within signaling pathways and, ultimately, over the cell functionalities triggered by them. PathAct implements an advanced graphical interface that provides a unique interactive working environment in which the suitability of potentially actionable genes, that could eventually become drug targets for personalized or individualized therapies, can be easily tested. The PathAct tool can be found at: http://pathact.babelomics.org. %B Nucleic acids research %8 2016 May 2 %G eng %U http://nar.oxfordjournals.org/content/early/2016/05/02/nar.gkw369.full %R 10.1093/nar/gkw369 %0 Journal Article %J The Journal of molecular diagnostics : JMD %D 2016 %T Assessment of Targeted Next-Generation Sequencing as a Tool for the Diagnosis of Charcot-Marie-Tooth Disease and Hereditary Motor Neuropathy. %A Lupo, Vincenzo %A Garcia-Garcia, Francisco %A Sancho, Paula %A Tello, Cristina %A García-Romero, Mar %A Villarreal, Liliana %A Alberti, Antonia %A Sivera, Rafael %A Joaquín Dopazo %A Pascual-Pascual, Samuel I %A Márquez-Infante, Celedonio %A Casasnovas, Carlos %A Sevilla, Teresa %A Espinós, Carmen %K Charcot-Marie-Tooth %K CMT %K Diagnostic %K NGS %K Panels %K rare diseases %K Targeted resequencing %X Charcot-Marie-Tooth disease is characterized by broad genetic heterogeneity with >50 known disease-associated genes. Mutations in some of these genes can cause a pure motor form of hereditary motor neuropathy, the genetics of which are poorly characterized. We designed a panel comprising 56 genes associated with Charcot-Marie-Tooth disease/hereditary motor neuropathy. We validated this diagnostic tool by first testing 11 patients with pathological mutations. A cohort of 33 affected subjects was selected for this study. The DNAJB2 c.352+1G>A mutation was detected in two cases; novel changes and/or variants with low frequency (<1%) were found in 12 cases. There were no candidate variants in 18 cases, and amplification failed for one sample. The DNAJB2 c.352+1G>A mutation was also detected in three additional families. On haplotype analysis, all of the patients from these five families shared the same haplotype; therefore, the DNAJB2 c.352+1G>A mutation may be a founder event. Our gene panel allowed us to perform a very rapid and cost-effective screening of genes involved in Charcot-Marie-Tooth disease/hereditary motor neuropathy. Our diagnostic strategy was robust in terms of both coverage and read depth for all of the genes and patient samples. These findings demonstrate the difficulty in achieving a definitive molecular diagnosis because of the complexity of interpreting new variants and the genetic heterogeneity that is associated with these neuropathies. %B The Journal of molecular diagnostics : JMD %8 2016 Jan 2 %G eng %U http://www.sciencedirect.com/science/article/pii/S1525157815002615 %R 10.1016/j.jmoldx.2015.10.005 %0 Journal Article %J Nucleic acids research %D 2015 %T Assessing the impact of mutations found in next generation sequencing data over human signaling pathways. %A Hernansaiz-Ballesteros, Rosa D %A Salavert, Francisco %A Sebastián-Leon, Patricia %A Alemán, Alejandro %A Medina, Ignacio %A Joaquín Dopazo %K NGS %K pathways %K signalling %K Systems biology %X Modern sequencing technologies produce increasingly detailed data on genomic variation. However, conventional methods for relating either individual variants or mutated genes to phenotypes present known limitations given the complex, multigenic nature of many diseases or traits. Here we present PATHiVar, a web-based tool that integrates genomic variation data with gene expression tissue information. PATHiVar constitutes a new generation of genomic data analysis methods that allow studying variants found in next generation sequencing experiment in the context of signaling pathways. Simple Boolean models of pathways provide detailed descriptions of the impact of mutations in cell functionality so as, recurrences in functionality failures can easily be related to diseases, even if they are produced by mutations in different genes. Patterns of changes in signal transmission circuits, often unpredictable from individual genes mutated, correspond to patterns of affected functionalities that can be related to complex traits such as disease progression, drug response, etc. PATHiVar is available at: http://pathivar.babelomics.org. %B Nucleic acids research %V 43 %P W270-W275 %8 2015 Apr 16 %G eng %U http://nar.oxfordjournals.org/content/43/W1/W270 %R 10.1093/nar/gkv349 %0 Journal Article %J Bioinformatics (Oxford, England) %D 2014 %T Acceleration of short and long DNA read mapping without loss of accuracy using suffix array. %A Tárraga, Joaquín %A Arnau, Vicente %A Martinez, Hector %A Moreno, Raul %A Cazorla, Diego %A Salavert-Torres, José %A Blanquer-Espert, Ignacio %A Joaquín Dopazo %A Medina, Ignacio %K NGS %K short read mapping. HPC. suffix arrays %X HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20x for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current, state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies. %B Bioinformatics (Oxford, England) %V 30 %P 3396-3398 %8 2014 Aug 20 %G eng %U http://bioinformatics.oxfordjournals.org/content/early/2014/08/19/bioinformatics.btu553.long %R 10.1093/bioinformatics/btu553 %0 Journal Article %J Front Oncol %D 2014 %T The Activation of the Sox2 RR2 Pluripotency Transcriptional Reporter in Human Breast Cancer Cell Lines is Dynamic and Labels Cells with Higher Tumorigenic Potential. %A Iglesias, Juan Manuel %A Leis, Olatz %A Pérez Ruiz, Estíbaliz %A Gumuzio Barrie, Juan %A Garcia-Garcia, Francisco %A Aduriz, Ariane %A Beloqui, Izaskun %A Hernandez-Garcia, Susana %A Lopez-Mato, Maria Paz %A Dopazo, Joaquin %A Pandiella, Atanasio %A Menendez, Javier A %A Martin, Angel Garcia %X

The striking similarity displayed at the mechanistic level between tumorigenesis and the generation of induced pluripotent stem cells and the fact that genes and pathways relevant for embryonic development are reactivated during tumor progression highlights the link between pluripotency and cancer. Based on these observations, we tested whether it is possible to use a pluripotency-associated transcriptional reporter, whose activation is driven by the SRR2 enhancer from the Sox2 gene promoter (named S4+ reporter), to isolate cancer stem cells (CSCs) from breast cancer cell lines. The S4+ pluripotency transcriptional reporter allows the isolation of cells with enhanced tumorigenic potential and its activation was switched on and off in the cell lines studied, reflecting a plastic cellular process. Microarray analysis comparing the populations in which the reporter construct is active versus inactive showed that positive cells expressed higher mRNA levels of cytokines (IL-8, IL-6, TNF) and genes (such as ATF3, SNAI2, and KLF6) previously related with the CSC phenotype in breast cancer.

%B Front Oncol %V 4 %P 308 %8 2014 %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/25414831?dopt=Abstract %R 10.3389/fonc.2014.00308 %0 Journal Article %J Nature communications %D 2014 %T Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. %A Munro, Sarah A %A Lund, Steven P %A Pine, P Scott %A Binder, Hans %A Clevert, Djork-Arné %A Ana Conesa %A Dopazo, Joaquin %A Fasold, Mario %A Hochreiter, Sepp %A Hong, Huixiao %A Jafari, Nadereh %A Kreil, David P %A Labaj, Paweł P %A Li, Sheng %A Liao, Yang %A Lin, Simon M %A Meehan, Joseph %A Mason, Christopher E %A Santoyo-López, Javier %A Setterquist, Robert A %A Shi, Leming %A Shi, Wei %A Smyth, Gordon K %A Stralis-Pavese, Nancy %A Su, Zhenqiang %A Tong, Weida %A Wang, Charles %A Wang, Jian %A Xu, Joshua %A Ye, Zhan %A Yang, Yong %A Yu, Ying %A Salit, Marc %K RNA-seq %X There is a critical need for standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments. Here we assess technical performance with a proposed standard ’dashboard’ of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagnostic performance of differentially expressed transcript lists, limit of detection of ratio (LODR) estimates and expression ratio variability and measurement bias. The performance metrics suite is applicable to analysis of a typical experiment, and here we also apply these metrics to evaluate technical performance among laboratories. An interlaboratory study using identical samples shared among 12 laboratories with three different measurement processes demonstrates generally consistent diagnostic power across 11 laboratories. Ratio measurement variability and bias are also comparable among laboratories for the same measurement process. We observe different biases for measurement processes using different mRNA-enrichment protocols. %B Nature communications %V 5 %P 5125 %8 2014 %G eng %U http://www.nature.com/ncomms/2014/140925/ncomms6125/full/ncomms6125.html %R 10.1038/ncomms6125 %0 Journal Article %J Omics : a journal of integrative biology %D 2013 %T Assessing Differential Expression Measurements by Highly Parallel Pyrosequencing and DNA Microarrays: A Comparative Study. %A Ariño, Joaquín %A Casamayor, Antonio %A Pérez, Julián Perez %A Pedrola, Laia %A Alvarez-Tejado, Miguel %A Marbà, Martina %A Santoyo, Javier %A Joaquín Dopazo %X

Abstract To explore the feasibility of pyrosequencing for quantitative differential gene expression analysis we have performed a comparative study of the results of the sequencing experiments to those obtained by a conventional DNA microarray platform. A conclusion from our analysis is that, over a threshold of 35 normalized reads per gene, the measurements of gene expression display a good correlation with the references. The observed concordance between pyrosequencing and DNA microarray platforms beyond the threshold was of 0.8, measured as a Pearson’s correlation coefficient. In differential gene expression the initial aim is the quantification the differences among transcripts when comparing experimental conditions. Thus, even in a scenario of low coverage the concordance in the measurements is quite acceptable. On the other hand, the comparatively longer read size obtained by pyrosequencing allows detecting unconventional splicing forms.

%B Omics : a journal of integrative biology %8 2011 Sep 15 %G eng %U http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3545353/ %R 10.1089/omi.2011.0065 %0 Journal Article %J PloS one %D 2011 %T Analysis of normal-tumour tissue interaction in tumours: prediction of prostate cancer features from the molecular profile of adjacent normal cells. %A Trevino, Victor %A Tadesse, Mahlet G %A Vannucci, Marina %A Fatima Al-Shahrour %A Antczak, Philipp %A Durant, Sarah %A Bikfalvi, Andreas %A Dopazo, Joaquin %A Campbell, Moray J %A Falciani, Francesco %X

Statistical modelling, in combination with genome-wide expression profiling techniques, has demonstrated that the molecular state of the tumour is sufficient to infer its pathological state. These studies have been extremely important in diagnostics and have contributed to improving our understanding of tumour biology. However, their importance in in-depth understanding of cancer patho-physiology may be limited since they do not explicitly take into consideration the fundamental role of the tissue microenvironment in specifying tumour physiology. Because of the importance of normal cells in shaping the tissue microenvironment we formulate the hypothesis that molecular components of the profile of normal epithelial cells adjacent the tumour are predictive of tumour physiology. We addressed this hypothesis by developing statistical models that link gene expression profiles representing the molecular state of adjacent normal epithelial cells to tumour features in prostate cancer. Furthermore, network analysis showed that predictive genes are linked to the activity of important secreted factors, which have the potential to influence tumor biology, such as IL1, IGF1, PDGF BB, AGT, and TGFβ.

%B PloS one %V 6 %P e16492 %8 2011 %G eng %0 Journal Article %J Biostatistics (Oxford, England) %D 2011 %T ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments. %A Nueda, Maria J %A Alberto Ferrer %A Ana Conesa %X Transcriptomic profiling experiments that aim to the identification of responsive genes in specific biological conditions are commonly set up under defined experimental designs that try to assess the effects of factors and their interactions on gene expression. Data from these controlled experiments, however, may also contain sources of unwanted noise that can distort the signal under study, affect the residuals of applied statistical models, and hamper data analysis. Commonly, normalization methods are applied to transcriptomics data to remove technical artifacts, but these are normally based on general assumptions of transcript distribution and greatly ignore both the characteristics of the experiment under consideration and the coordinative nature of gene expression. In this paper, we propose a novel methodology, ARSyN, for the preprocessing of microarray data that takes into account these 2 last aspects. By combining analysis of variance (ANOVA) modeling of gene expression values and multivariate analysis of estimated effects, the method identifies the nonstructured part of the signal associated to the experimental factors (the noise within the signal) and the structured variation of the ANOVA errors (the signal of the noise). By removing these noise fractions from the original data, we create a filtered data set that is rich in the information of interest and includes only the random noise required for inferential analysis. In this work, we focus on multifactorial time course microarray (MTCM) experiments with 2 factors: one quantitative such as time or dosage and the other qualitative, as tissue, strain, or treatment. However, the method can be used in other situations such as experiments with only one factor or more complex designs with more than 2 factors. The filtered data obtained after applying ARSyN can be further analyzed with the appropriate statistical technique to obtain the biological information required. To evaluate the performance of the filtering strategy, we have applied different statistical approaches for MTCM analysis to several real and simulated data sets, studying also the efficiency of these techniques. By comparing the results obtained with the original and ARSyN filtered data and also with other filtering techniques, we can conclude that the proposed method increases the statistical power to detect biological signals, especially in cases where there are high levels of structural noise. Software for ARSyN is freely available at http://www.ua.es/personal/mj.nueda. %B Biostatistics (Oxford, England) %8 2011 Nov 14 %G eng %0 Journal Article %J PloS one %D 2011 %T Assessing the biological significance of gene expression signatures and co-expression modules by studying their network properties. %A Minguez, Pablo %A Dopazo, Joaquin %X

Microarray experiments have been extensively used to define signatures, which are sets of genes that can be considered markers of experimental conditions (typically diseases). Paradoxically, in spite of the apparent functional role that might be attributed to such gene sets, signatures do not seem to be reproducible across experiments. Given the close relationship between function and protein interaction, network properties can be used to study to what extent signatures are composed of genes whose resulting proteins show a considerable level of interaction (and consequently a putative common functional role).We have analysed 618 signatures and 507 modules of co-expression in cancer looking for significant values of four main protein-protein interaction (PPI) network parameters: connection degree, cluster coefficient, betweenness and number of components. A total of 3904 gene ontology (GO) modules, 146 KEGG pathways, and 263 Biocarta pathways have been used as functional modules of reference.Co-expression modules found in microarray experiments display a high level of connectivity, similar to the one shown by conventional modules based on functional definitions (GO, KEGG and Biocarta). A general observation for all the classes studied is that the networks formed by the modules improve their topological parameters when an external protein is allowed to be introduced within the paths (up to the 70% of GO modules show network parameters beyond the random expectation). This fact suggests that functional definitions are incomplete and some genes might still be missing. Conversely, signatures are clearly not capturing the altered functions in the corresponding studies. This is probably because the way in which the genes have been selected in the signatures is too conservative. These results suggest that gene selection methods which take into account relationships among genes should be superior to methods that assume independence among genes outside their functional contexts.

%B PloS one %V 6 %P e17474 %8 2011 %G eng %U http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0017474 %R doi:10.1371/journal.pone.0017474 %0 Journal Article %J Protein engineering, design & selection : PEDS %D 2009 %T Alignment of multiple protein structures based on sequence and structure features. %A Madhusudhan, M. S. %A Webb, Benjamin M %A Marti-Renom, Marc A %A Eswar, Narayanan %A Sali, Andrej %X

Comparing the structures of proteins is crucial to gaining insight into protein evolution and function. Here, we align the sequences of multiple protein structures by a dynamic programming optimization of a scoring function that is a sum of an affine gap penalty and terms dependent on various sequence and structure features (SALIGN). The features include amino acid residue type, residue position, residue accessible surface area, residue secondary structure state and the conformation of a short segment centered on the residue. The multiple alignment is built by following the ’guide’ tree constructed from the matrix of all pairwise protein alignment scores. Importantly, the method does not depend on the exact values of various parameters, such as feature weights and gap penalties, because the optimal alignment across a range of parameter values is found. Using multiple structure alignments in the HOMSTRAD database, SALIGN was benchmarked against MUSTANG for multiple alignments as well as against TM-align and CE for pairwise alignments. On the average, SALIGN produces a 15% improvement in structural overlap over HOMSTRAD and 14% over MUSTANG, and yields more equivalent structural positions than TM-align and CE in 90% and 95% of cases, respectively. The utility of accurate multiple structure alignment is illustrated by its application to comparative protein structure modeling.

%B Protein engineering, design & selection : PEDS %V 22 %P 569-74 %8 2009 Sep %G eng %0 Journal Article %J Leuk Lymphoma %D 2009 %T Analysis of chronic lymphotic leukemia transcriptomic profile: differences between molecular subgroups %A Jantus Lewintre, E. %A Reinoso Martin, C. %A Montaner, D. %A Marin, M. %A Jose Terol, M. %A Farras, R. %A Benet, I. %A Calvete, J. J. %A Dopazo, J. %A Garcia-Conde, J. %K cancer %K microarray data analysis %X

B cell chronic lymphocytic leukemia (CLL) is a lymphoproliferative disorder with a variable clinical course. Patients with unmutated IgV(H) gene show a shorter progression-free and overall survival than patients with immunoglobulin heavy chain variable regions (IgV(H)) gene mutated. In addition, BCL6 mutations identify a subgroup of patients with high risk of progression. Gene expression was analysed in 36 early-stage patients using high-density microarrays. Around 150 genes differentially expressed were found according to IgV(H) mutations, whereas no difference was found according to BCL6 mutations. Functional profiling methods allowed us to distinguish KEGG and gene ontology terms showing coordinated gene expression changes across subgroups of CLL. We validated a set of differentially expressed genes according to IgV(H) status, scoring them as putative prognostic markers in CLL. Among them, CRY1, LPL, CD82 and DUSP22 are the ones with at least equal or superior performance to ZAP70 which is actually the most used surrogate marker of IgV(H) status.

%B Leuk Lymphoma %V 50 %P 68-79 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19127482 %0 Book Section %B Computational Structural Biology %D 2008 %T Assessment of protein structure predictions %A E. Capriotti %A M. A. Marti-Renom %B Computational Structural Biology %I World Scientific Publishing Company %C New Jersey, USA %G eng %U http://www.amazon.com/dp/9812778772/ %0 Journal Article %J BMC Genomics %D 2007 %T Analysis of 13000 unique Citrus clusters associated with fruit quality, production and salinity tolerance %A Terol, J. %A A. Conesa %A Colmenero, J. M. %A Cercos, M. %A Tadeo, F. %A Agusti, J. %A Alos, E. %A Andres, F. %A Soler, G. %A Brumos, J. %A Iglesias, D. J. %A Gotz, S. %A Legaz, F. %A Argout, X. %A Courtois, B. %A Ollitrault, P. %A Dossat, C. %A Wincker, P. %A Morillon, R. %A Talon, M. %K Acclimatization/*genetics Amino Acid Motifs Citrus/*genetics Cluster Analysis Expressed Sequence Tags Fruit/genetics Gene Duplication *Gene Expression Regulation %K Plant Gene Library Genes %K Plant Genomics Molecular Sequence Data Multigene Family Phylogeny *Salts/adverse effects %X BACKGROUND: Improvement of Citrus, the most economically important fruit crop in the world, is extremely slow and inherently costly because of the long-term nature of tree breeding and an unusual combination of reproductive characteristics. Aside from disease resistance, major commercial traits in Citrus are improved fruit quality, higher yield and tolerance to environmental stresses, especially salinity. RESULTS: A normalized full length and 9 standard cDNA libraries were generated, representing particular treatments and tissues from selected varieties (Citrus clementina and C. sinensis) and rootstocks (C. reshni, and C. sinenis x Poncirus trifoliata) differing in fruit quality, resistance to abscission, and tolerance to salinity. The goal of this work was to provide a large expressed sequence tag (EST) collection enriched with transcripts related to these well appreciated agronomical traits. Towards this end, more than 54000 ESTs derived from these libraries were analyzed and annotated. Assembly of 52626 useful sequences generated 15664 putative transcription units distributed in 7120 contigs, and 8544 singletons. BLAST annotation produced significant hits for more than 80% of the hypothetical transcription units and suggested that 647 of these might be Citrus specific unigenes. The unigene set, composed of 13000 putative different transcripts, including more than 5000 novel Citrus genes, was assigned with putative functions based on similarity, GO annotations and protein domains CONCLUSION: Comparative genomics with Arabidopsis revealed the presence of putative conserved orthologs and single copy genes in Citrus and also the occurrence of both gene duplication events and increased number of genes for specific pathways. In addition, phylogenetic analysis performed on the ammonium transporter family and glycosyl transferase family 20 suggested the existence of Citrus paralogs. Analysis of the Citrus gene space showed that the most important metabolic pathways known to affect fruit quality were represented in the unigene set. Overall, the similarity analyses indicated that the sequences of the genes belonging to these varieties and rootstocks were essentially identical, suggesting that the differential behaviour of these species cannot be attributed to major sequence divergences. This Citrus EST assembly contributes both crucial information to discover genes of agronomical interest and tools for genetic and genomic analyses, such as the development of new markers and microarrays. %B BMC Genomics %V 8 %P 31 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17254327 %0 Journal Article %J BMC Bioinformatics %D 2007 %T The AnnoLite and AnnoLyze programs for comparative annotation of protein structures %A M. A. Marti-Renom %A Rossi, A. %A Fatima Al-Shahrour %A Davis, F. P. %A Pieper, U. %A Dopazo, J. %A Sali, A. %K *Algorithms Amino Acid Sequence Confidence Intervals Data Interpretation %K Amino Acid *Software Structure-Activity Relationship %K Protein Information Storage and Retrieval/methods Molecular Sequence Data Proteins/*chemistry/classification/*metabolism Sensitivity and Specificity Sequence Alignment/*methods Sequence Analysis %K Protein/*methods Sequence Homology %K Statistical *Databases %X BACKGROUND: Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. DESCRIPTION: AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of 90% and average precision of 80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of 70% and average precision of 30%, correctly localizing binding sites for small molecules in 95% of its predictions. CONCLUSION: The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at http://salilab.org/DBAli/. %B BMC Bioinformatics %V 8 Suppl 4 %P S4 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17570147 %0 Journal Article %J Cancer Res %D 2007 %T Association study of 69 genes in the ret pathway identifies low-penetrance loci in sporadic medullary thyroid carcinoma %A Ruiz-Llorente, S. %A Montero-Conde, C. %A Milne, R. L. %A Moya, C. M. %A Cebrian, A. %A Leton, R. %A Cascon, A. %A Mercadillo, F. %A Landa, I. %A Borrego, S. %A Perez de Nanclares, G. %A Alvarez-Escola, C. %A Diaz-Perez, J. A. %A Carracedo, A. %A Urioste, M. %A Gonzalez-Neira, A. %A Benitez, J. %A Santisteban, P. %A Dopazo, J. %A Ponder, B. A. %A M. Robledo %K 80 and over Carcinoma %K Adolescent Adult Aged Aged %K Genetic %K Genetic Proto-Oncogene Proteins c-ret/*genetics/metabolism Signal Transduction Thyroid Neoplasms/*genetics/metabolism Transcription %K Medullary/*genetics/metabolism Case-Control Studies Cyclin-Dependent Kinase Inhibitor p15/biosynthesis/genetics Female Genetic Predisposition to Disease Germ-Line Mutation Haplotypes Humans Male Middle Aged Penetrance Polymorphism %K Single Nucleotide Promoter Regions %X To date, few association studies have been done to better understand the genetic basis for the development of sporadic medullary thyroid carcinoma (sMTC). To identify additional low-penetrance genes, we have done a two-stage case-control study in two European populations using high-throughput genotyping. We selected 417 single nucleotide polymorphisms (SNP) belonging to 69 genes either related to RET signaling pathway/functions or involved in key processes for cancer development. TagSNPs and functional variants were included where possible. These SNPs were initially studied in the largest known series of sMTC cases (n = 266) and controls (n = 422), all of Spanish origin. In stage II, an independent British series of 155 sMTC patients and 531 controls was included to validate the previous results. Associations were assessed by an exhaustive analysis of individual SNPs but also considering gene- and linkage disequilibrium-based haplotypes. This strategy allowed us to identify seven low-penetrance genes, six of them (STAT1, AURKA, BCL2, CDKN2B, CDK6, and COMT) consistently associated with sMTC risk in the two case-control series and a seventh (HRAS) with individual SNPs and haplotypes associated with sMTC in the Spanish data set. The potential role of CDKN2B was confirmed by a functional assay showing a role of a SNP (rs7044859) in the promoter region in altering the binding of the transcription factor HNF1. These results highlight the utility of association studies using homogeneous series of cases for better understanding complex diseases. %B Cancer Res %V 67 %P 9561-7 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17909067 %0 Journal Article %J Proteins %D 2006 %T Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets %A Melo, F. %A M. A. Marti-Renom %K Amino Acid Sequence Amino Acids/*chemistry/classification/*metabolism Consensus Sequence Molecular Sequence Data Oxidation-Reduction *Protein Folding Proteins/*chemistry/*metabolism Sequence Alignment/*methods Structural Homology %K Protein %X Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs. %B Proteins %V 63 %P 986-95 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16506243 %0 Journal Article %J Nature %D 2005 %T An anaerobic mitochondrion that produces hydrogen %A Boxma, B. %A de Graaf, R. M. %A van der Staay, G. W. %A van Alen, T. A. %A Ricard, G. %A Gabaldón, T. %A van Hoek, A. H. %A Moon-van der Staay, S. Y. %A Koopman, W. J. %A van Hellemond, J. J. %A Tielens, A. G. %A Friedrich, T. %A Veenhuis, M. %A M. A. Huynen %A Hackstein, J. H. %K *Anaerobiosis Animals Ciliophora/*cytology/genetics/*metabolism/ultrastructure Cockroaches/parasitology DNA %K Mitochondrial/genetics Electron Transport Electron Transport Complex I/antagonists & inhibitors/metabolism Genome Glucose/metabolism Hydrogen/*metabolism Mitochondria/enzymology/genetics/*metabolism/ultrastructure Molecular Sequence Data Open Reading Fra %X Hydrogenosomes are organelles that produce ATP and hydrogen, and are found in various unrelated eukaryotes, such as anaerobic flagellates, chytridiomycete fungi and ciliates. Although all of these organelles generate hydrogen, the hydrogenosomes from these organisms are structurally and metabolically quite different, just like mitochondria where large differences also exist. These differences have led to a continuing debate about the evolutionary origin of hydrogenosomes. Here we show that the hydrogenosomes of the anaerobic ciliate Nyctotherus ovalis, which thrives in the hindgut of cockroaches, have retained a rudimentary genome encoding components of a mitochondrial electron transport chain. Phylogenetic analyses reveal that those proteins cluster with their homologues from aerobic ciliates. In addition, several nucleus-encoded components of the mitochondrial proteome, such as pyruvate dehydrogenase and complex II, were identified. The N. ovalis hydrogenosome is sensitive to inhibitors of mitochondrial complex I and produces succinate as a major metabolic end product–biochemical traits typical of anaerobic mitochondria. The production of hydrogen, together with the presence of a genome encoding respiratory chain components, and biochemical features characteristic of anaerobic mitochondria, identify the N. ovalis organelle as a missing link between mitochondria and hydrogenosomes. %B Nature %V 434 %P 74-9 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15744302 %0 Journal Article %J Protein Sci %D 2004 %T Alignment of protein sequences by their profiles %A M. A. Marti-Renom %A Madhusudhan, M. S. %A Sali, A. %K *Algorithms Amino Acid Sequence Computational Biology Databases %K Protein Markov Chains Molecular Sequence Data *Protein Folding Protein Structure %K Tertiary Proteins/*chemistry *Sequence Alignment Sequence Homology *Software %X The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences. %B Protein Sci %V 13 %P 1071-87 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15044736 %0 Journal Article %J Comp Funct Genomics %D 2003 %T An approach to inferring transcriptional regulation among genes from large-scale expression data %A Herrero, J. %A Diaz-Uriarte, R. %A Dopazo, J. %X The use of DNA microarrays opens up the possibility of measuring the expression levels of thousands of genes simultaneously under different conditions. Time-course experiments allow researchers to study the dynamics of gene interactions. The inference of genetic networks from such measures can give important insights for the understanding of a variety of biological problems. Most of the existing methods for genetic network reconstruction require many experimental data points, or can only be applied to the reconstruction of small subnetworks. Here we present a method that reduces the dimensionality of the dataset and then extracts the significant dynamic correlations among genes. The method requires a number of points achievable in common time-course experiments. %B Comp Funct Genomics %V 4 %P 148-54 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18629097 %0 Journal Article %J Microb Drug Resist %D 2001 %T Annotated draft genomic sequence from a Streptococcus pneumoniae type 19F clinical isolate %A Dopazo, J. %A Mendoza, A. %A Herrero, J. %A Caldara, F. %A Humbert, Y. %A Friedli, L. %A Guerrier, M. %A Grand-Schenk, E. %A Gandin, C. %A de Francesco, M. %A Polissi, A. %A Buell, G. %A Feger, G. %A Garcia, E. %A Peitsch, M. %A Garcia-Bustos, J. F. %K Bacterial Molecular Sequence Data Pneumococcal Infections/*microbiology Prokaryotic Cells RNA %K Bacterial/chemistry/genetics Genes %K Bacterial/genetics *Genome %K DNA %K Transfer/metabolism Streptococcus pneumoniae/*genetics %X The public availability of numerous microbial genomes is enabling the analysis of bacterial biology in great detail and with an unprecedented, organism-wide and taxon-wide, broad scope. Streptococcus pneumoniae is one of the most important bacterial pathogens throughout the world. We present here sequences and functional annotations for 2.1-Mbp of pneumococcal DNA, covering more than 90% of the total estimated size of the genome. The sequenced strain is a clinical isolate resistant to macrolides and tetracycline. It carries a type 19F capsular locus, but multilocus sequence typing for several conserved genetic loci suggests that the strain sequenced belongs to a pneumococcal lineage that most often expresses a serotype 15 capsular polysaccharide. A total of 2,046 putative open reading frames (ORFs) longer than 100 amino acids were identified (average of 1,009 bp per ORF), including all described two-component systems and aminoacyl tRNA synthetases. Comparisons to other complete, or nearly complete, bacterial genomes were made and are presented in a graphical form for all the predicted proteins. %B Microb Drug Resist %V 7 %P 99-125 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11442348