%0 Journal Article %J Hum Mutat %D 2020 %T SMN1 copy-number and sequence variant analysis from next-generation sequencing data. %A López-López, Daniel %A Loucera, Carlos %A Carmona, Rosario %A Aquino, Virginia %A Salgado, Josefa %A Pasalodos, Sara %A Miranda, María %A Alonso, Ángel %A Dopazo, Joaquin %K Base Sequence %K DNA Copy Number Variations %K High-Throughput Nucleotide Sequencing %K Humans %K Reproducibility of Results %K Software %K Survival of Motor Neuron 1 Protein %X

Spinal muscular atrophy (SMA) is a severe neuromuscular autosomal recessive disorder affecting 1/10,000 live births. Most SMA patients present homozygous deletion of SMN1, while the vast majority of SMA carriers present only a single SMN1 copy. The sequence similarity between SMN1 and SMN2, and the complexity of the SMN locus makes the estimation of the SMN1 copy-number by next-generation sequencing (NGS) very difficult. Here, we present SMAca, the first python tool to detect SMA carriers and estimate the absolute SMN1 copy-number using NGS data. Moreover, SMAca takes advantage of the knowledge of certain variants specific to SMN1 duplication to also identify silent carriers. This tool has been validated with a cohort of 326 samples from the Navarra 1000 Genomes Project (NAGEN1000). SMAca was developed with a focus on execution speed and easy installation. This combination makes it especially suitable to be integrated into production NGS pipelines. Source code and documentation are available at https://www.github.com/babelomics/SMAca.

%B Hum Mutat %V 41 %P 2073-2077 %8 2020 12 %G eng %N 12 %1 https://www.ncbi.nlm.nih.gov/pubmed/33058415?dopt=Abstract %R 10.1002/humu.24120 %0 Journal Article %J Nature %D 2020 %T Transparency and reproducibility in artificial intelligence. %A Haibe-Kains, Benjamin %A Adam, George Alexandru %A Hosny, Ahmed %A Khodakarami, Farnoosh %A Waldron, Levi %A Wang, Bo %A McIntosh, Chris %A Goldenberg, Anna %A Kundaje, Anshul %A Greene, Casey S %A Broderick, Tamara %A Hoffman, Michael M %A Leek, Jeffrey T %A Korthauer, Keegan %A Huber, Wolfgang %A Brazma, Alvis %A Pineau, Joelle %A Tibshirani, Robert %A Hastie, Trevor %A Ioannidis, John P A %A Quackenbush, John %A Aerts, Hugo J W L %K Algorithms %K Artificial Intelligence %K Reproducibility of Results %B Nature %V 586 %P E14-E16 %8 2020 10 %G eng %N 7829 %1 https://www.ncbi.nlm.nih.gov/pubmed/33057217?dopt=Abstract %R 10.1038/s41586-020-2766-y %0 Journal Article %J Bioinformatics %D 2017 %T Reference genome assessment from a population scale perspective: an accurate profile of variability and noise. %A Carbonell-Caballero, José %A Amadoz, Alicia %A Alonso, Roberto %A Hidalgo, Marta R %A Cubuk, Cankut %A Conesa, David %A López-Quílez, Antonio %A Dopazo, Joaquin %K Animals %K Genetic Variation %K Genome %K Genomics %K Genotype %K Humans %K Models, Statistical %K Quality Control %K Reproducibility of Results %K Software %X

Motivation: Current plant and animal genomic studies are often based on newly assembled genomes that have not been properly consolidated. In this scenario, misassembled regions can easily lead to false-positive findings. Despite quality control scores are included within genotyping protocols, they are usually employed to evaluate individual sample quality rather than reference sequence reliability. We propose a statistical model that combines quality control scores across samples in order to detect incongruent patterns at every genomic region. Our model is inherently robust since common artifact signals are expected to be shared between independent samples over misassembled regions of the genome.

Results: The reliability of our protocol has been extensively tested through different experiments and organisms with accurate results, improving state-of-the-art methods. Our analysis demonstrates synergistic relations between quality control scores and allelic variability estimators, that improve the detection of misassembled regions, and is able to find strong artifact signals even within the human reference assembly. Furthermore, we demonstrated how our model can be trained to properly rank the confidence of a set of candidate variants obtained from new independent samples.

Availability and implementation: This tool is freely available at http://gitlab.com/carbonell/ces.

Contact: jcarbonell.cipf@gmail.com or joaquin.dopazo@juntadeandalucia.es.

Supplementary information: Supplementary data are available at Bioinformatics online.

%B Bioinformatics %V 33 %P 3511-3517 %8 2017 Nov 15 %G eng %U https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btx482 %N 22 %1 https://www.ncbi.nlm.nih.gov/pubmed/28961772?dopt=Abstract %R 10.1093/bioinformatics/btx482 %0 Journal Article %J Bioinformatics %D 2016 %T Integrated gene set analysis for microRNA studies. %A Garcia-Garcia, Francisco %A Panadero, Joaquin %A Dopazo, Joaquin %A Montaner, David %K Computational Biology %K Gene Expression Profiling %K Gene ontology %K Gene Regulatory Networks %K High-Throughput Nucleotide Sequencing %K Humans %K MicroRNAs %K Neoplasms %K Reproducibility of Results %X

MOTIVATION: Functional interpretation of miRNA expression data is currently done in a three step procedure: select differentially expressed miRNAs, find their target genes, and carry out gene set overrepresentation analysis Nevertheless, major limitations of this approach have already been described at the gene level, while some newer arise in the miRNA scenario.Here, we propose an enhanced methodology that builds on the well-established gene set analysis paradigm. Evidence for differential expression at the miRNA level is transferred to a gene differential inhibition score which is easily interpretable in terms of gene sets or pathways. Such transferred indexes account for the additive effect of several miRNAs targeting the same gene, and also incorporate cancellation effects between cases and controls. Together, these two desirable characteristics allow for more accurate modeling of regulatory processes.

RESULTS: We analyze high-throughput sequencing data from 20 different cancer types and provide exhaustive reports of gene and Gene Ontology-term deregulation by miRNA action.

AVAILABILITY AND IMPLEMENTATION: The proposed methodology was implemented in the Bioconductor library mdgsa http://bioconductor.org/packages/mdgsa For the purpose of reproducibility all of the scripts are available at https://github.com/dmontaner-papers/gsa4mirna

CONTACT: : david.montaner@gmail.com

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

%B Bioinformatics %V 32 %P 2809-16 %8 2016 09 15 %G eng %N 18 %1 https://www.ncbi.nlm.nih.gov/pubmed/27324197?dopt=Abstract %R 10.1093/bioinformatics/btw334 %0 Journal Article %J PLoS One %D 2016 %T The Mutational Landscape of Acute Promyelocytic Leukemia Reveals an Interacting Network of Co-Occurrences and Recurrent Mutations. %A Ibáñez, Mariam %A Carbonell-Caballero, José %A García-Alonso, Luz %A Such, Esperanza %A Jiménez-Almazán, Jorge %A Vidal, Enrique %A Barragán, Eva %A López-Pavía, María %A LLop, Marta %A Martín, Iván %A Gómez-Seguí, Inés %A Montesinos, Pau %A Sanz, Miguel A %A Dopazo, Joaquin %A Cervera, José %K Exome %K Gene Regulatory Networks %K Genome, Human %K Humans %K INDEL Mutation %K Leukemia, Promyelocytic, Acute %K mutation %K Mutation Rate %K Polymorphism, Single Nucleotide %K Reproducibility of Results %X

Preliminary Acute Promyelocytic Leukemia (APL) whole exome sequencing (WES) studies have identified a huge number of somatic mutations affecting more than a hundred different genes mainly in a non-recurrent manner, suggesting that APL is a heterogeneous disease with secondary relevant changes not yet defined. To extend our knowledge of subtle genetic alterations involved in APL that might cooperate with PML/RARA in the leukemogenic process, we performed a comprehensive analysis of somatic mutations in APL combining WES with sequencing of a custom panel of targeted genes by next-generation sequencing. To select a reduced subset of high confidence candidate driver genes, further in silico analysis were carried out. After prioritization and network analysis we found recurrent deleterious mutations in 8 individual genes (STAG2, U2AF1, SMC1A, USP9X, IKZF1, LYN, MYCBP2 and PTPN11) with a strong potential of being involved in APL pathogenesis. Our network analysis of multiple mutations provides a reliable approach to prioritize genes for additional analysis, improving our knowledge of the leukemogenesis interactome. Additionally, we have defined a functional module in the interactome of APL. The hypothesis is that the number, or the specific combinations, of mutations harbored in each patient might not be as important as the disturbance caused in biological key functions, triggered by several not necessarily recurrent mutations.

%B PLoS One %V 11 %P e0148346 %8 2016 %G eng %N 2 %1 https://www.ncbi.nlm.nih.gov/pubmed/26886259?dopt=Abstract %R 10.1371/journal.pone.0148346 %0 Journal Article %J Plant Physiol %D 2011 %T Early transcriptional defense responses in Arabidopsis cell suspension culture under high-light conditions. %A González-Pérez, Sergio %A Gutiérrez, Jorge %A Garcia-Garcia, Francisco %A Osuna, Daniel %A Dopazo, Joaquin %A Lorenzo, Oscar %A Revuelta, José L %A Arellano, Juan B %K Arabidopsis %K Blotting, Western %K Cell Culture Techniques %K Cells, Cultured %K Chloroplasts %K Cluster Analysis %K Gene Expression Profiling %K Gene Expression Regulation, Plant %K Hydrogen Peroxide %K Light %K mutation %K Oligonucleotide Array Sequence Analysis %K Photosystem II Protein Complex %K Plant Growth Regulators %K Reproducibility of Results %K Reverse Transcriptase Polymerase Chain Reaction %K RNA, Messenger %K Signal Transduction %K Stress, Physiological %K Transcription, Genetic %X

The early transcriptional defense responses and reactive oxygen species (ROS) production in Arabidopsis (Arabidopsis thaliana) cell suspension culture (ACSC), containing functional chloroplasts, were examined at high light (HL). The transcriptional analysis revealed that most of the ROS markers identified among the 449 transcripts with significant differential expression were transcripts specifically up-regulated by singlet oxygen ((1)O(2)). On the contrary, minimal correlation was established with transcripts specifically up-regulated by superoxide radical or hydrogen peroxide. The transcriptional analysis was supported by fluorescence microscopy experiments. The incubation of ACSC with the (1)O(2) sensor green reagent and 2',7'-dichlorofluorescein diacetate showed that the 30-min-HL-treated cultures emitted fluorescence that corresponded with the production of (1)O(2) but not of hydrogen peroxide. Furthermore, the in vivo photodamage of the D1 protein of photosystem II indicated that the photogeneration of (1)O(2) took place within the photosystem II reaction center. Functional enrichment analyses identified transcripts that are key components of the ROS signaling transduction pathway in plants as well as others encoding transcription factors that regulate both ROS scavenging and water deficit stress. A meta-analysis examining the transcriptional profiles of mutants and hormone treatments in Arabidopsis showed a high correlation between ACSC at HL and the fluorescent mutant family of Arabidopsis, a producer of (1)O(2) in plastids. Intriguingly, a high correlation was also observed with ABA deficient1 and more axillary growth4, two mutants with defects in the biosynthesis pathways of two key (apo)carotenoid-derived plant hormones (i.e. abscisic acid and strigolactones, respectively). ACSC has proven to be a valuable system for studying early transcriptional responses to HL stress.

%B Plant Physiol %V 156 %P 1439-56 %8 2011 Jul %G eng %N 3 %1 https://www.ncbi.nlm.nih.gov/pubmed/21531897?dopt=Abstract %R 10.1104/pp.111.177766 %0 Journal Article %J PLoS One %D 2011 %T Mutation screening of multiple genes in Spanish patients with autosomal recessive retinitis pigmentosa by targeted resequencing. %A González-del Pozo, María %A Borrego, Salud %A Barragán, Isabel %A Pieras, Juan I %A Santoyo, Javier %A Matamala, Nerea %A Naranjo, Belén %A Dopazo, Joaquin %A Antiňolo, Guillermo %K Alleles %K DNA Mutational Analysis %K Exons %K Genetic Variation %K Genome %K Hispanic or Latino %K Humans %K Introns %K Language %K mutation %K Mutation, Missense %K Oligonucleotide Array Sequence Analysis %K Polymerase Chain Reaction %K Reproducibility of Results %K Retinitis pigmentosa %K United States %X

Retinitis Pigmentosa (RP) is a heterogeneous group of inherited retinal dystrophies characterised ultimately by the loss of photoreceptor cells. RP is the leading cause of visual loss in individuals younger than 60 years, with a prevalence of about 1 in 4000. The molecular genetic diagnosis of autosomal recessive RP (arRP) is challenging due to the large genetic and clinical heterogeneity. Traditional methods for sequencing arRP genes are often laborious and not easily available and a screening technique that enables the rapid detection of the genetic cause would be very helpful in the clinical practice. The goal of this study was to develop and apply microarray-based resequencing technology capable of detecting both known and novel mutations on a single high-throughput platform. Hence, the coding regions and exon/intron boundaries of 16 arRP genes were resequenced using microarrays in 102 Spanish patients with clinical diagnosis of arRP. All the detected variations were confirmed by direct sequencing and potential pathogenicity was assessed by functional predictions and frequency in controls. For validation purposes 4 positive controls for variants consisting of previously identified changes were hybridized on the array. As a result of the screening, we detected 44 variants, of which 15 are very likely pathogenic detected in 14 arRP families (14%). Finally, the design of this array can easily be transformed in an equivalent diagnostic system based on targeted enrichment followed by next generation sequencing.

%B PLoS One %V 6 %P e27894 %8 2011 %G eng %N 12 %1 https://www.ncbi.nlm.nih.gov/pubmed/22164218?dopt=Abstract %R 10.1371/journal.pone.0027894 %0 Journal Article %J BMC Genomics %D 2009 %T Gene set internal coherence in the context of functional profiling. %A Montaner, David %A Minguez, Pablo %A Al-Shahrour, Fátima %A Dopazo, Joaquin %K Algorithms %K Breast Neoplasms %K Carcinoma, Intraductal, Noninfiltrating %K Computational Biology %K Databases, Nucleic Acid %K Female %K Gene Expression Profiling %K Genomics %K Humans %K Oligonucleotide Array Sequence Analysis %K Papillomavirus Infections %K Reproducibility of Results %X

BACKGROUND: Functional profiling methods have been extensively used in the context of high-throughput experiments and, in particular, in microarray data analysis. Such methods use available biological information to define different types of functional gene modules (e.g. gene ontology -GO-, KEGG pathways, etc.) whose representation in a pre-defined list of genes is further studied. In the most popular type of microarray experimental designs (e.g. up- or down-regulated genes, clusters of co-expressing genes, etc.) or in other genomic experiments (e.g. Chip-on-chip, epigenomics, etc.) these lists are composed by genes with a high degree of co-expression. Therefore, an implicit assumption in the application of functional profiling methods within this context is that the genes corresponding to the modules tested are effectively defining sets of co-expressing genes. Nevertheless not all the functional modules are biologically coherent entities in terms of co-expression, which will eventually hinder its detection with conventional methods of functional enrichment.

RESULTS: Using a large collection of microarray data we have carried out a detailed survey of internal correlation in GO terms and KEGG pathways, providing a coherence index to be used for measuring functional module co-regulation. An unexpected low level of internal correlation was found among the modules studied. Only around 30% of the modules defined by GO terms and 57% of the modules defined by KEGG pathways display an internal correlation higher than the expected by chance.This information on the internal correlation of the genes within the functional modules can be used in the context of a logistic regression model in a simple way to improve their detection in gene expression experiments.

CONCLUSION: For the first time, an exhaustive study on the internal co-expression of the most popular functional categories has been carried out. Interestingly, the real level of coexpression within many of them is lower than expected (or even inexistent), which will preclude its detection by means of most conventional functional profiling methods. If the gene-to-function correlation information is used in functional profiling methods, the results obtained improve the ones obtained by conventional enrichment methods.

%B BMC Genomics %V 10 %P 197 %8 2009 Apr 27 %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/19397819?dopt=Abstract %R 10.1186/1471-2164-10-197