TY - JOUR T1 - Using AnABlast for intergenic sORF prediction in the Caenorhabditis elegans genome. JF - Bioinformatics Y1 - 2020 A1 - Casimiro-Soriguer, C S A1 - Rigual, M M A1 - Brokate-Llanos, A M A1 - Muñoz, M J A1 - Garzón, A A1 - Pérez-Pulido, A J A1 - Jimenez, J KW - Animals KW - Caenorhabditis elegans KW - Computational Biology KW - Genome KW - Open Reading Frames KW - Software AB -

MOTIVATION: Short bioactive peptides encoded by small open reading frames (sORFs) play important roles in eukaryotes. Bioinformatics prediction of ORFs is an early step in a genome sequence analysis, but sORFs encoding short peptides, often using non-AUG initiation codons, are not easily discriminated from false ORFs occurring by chance.

RESULTS: AnABlast is a computational tool designed to highlight putative protein-coding regions in genomic DNA sequences. This protein-coding finder is independent of ORF length and reading frame shifts, thus making of AnABlast a potentially useful tool to predict sORFs. Using this algorithm, here, we report the identification of 82 putative new intergenic sORFs in the Caenorhabditis elegans genome. Sequence similarity, motif presence, expression data and RNA interference experiments support that the underlined sORFs likely encode functional peptides, encouraging the use of AnABlast as a new approach for the accurate prediction of intergenic sORFs in annotated eukaryotic genomes.

AVAILABILITY AND IMPLEMENTATION: AnABlast is freely available at http://www.bioinfocabd.upo.es/ab/. The C.elegans genome browser with AnABlast results, annotated genes and all data used in this study is available at http://www.bioinfocabd.upo.es/celegans.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

VL - 36 IS - 19 U1 - https://www.ncbi.nlm.nih.gov/pubmed/32614398?dopt=Abstract ER - TY - JOUR T1 - Reference genome assessment from a population scale perspective: an accurate profile of variability and noise. JF - Bioinformatics Y1 - 2017 A1 - Carbonell-Caballero, José A1 - Amadoz, Alicia A1 - Alonso, Roberto A1 - Hidalgo, Marta R A1 - Cubuk, Cankut A1 - Conesa, David A1 - López-Quílez, Antonio A1 - Dopazo, Joaquin KW - Animals KW - Genetic Variation KW - Genome KW - Genomics KW - Genotype KW - Humans KW - Models, Statistical KW - Quality Control KW - Reproducibility of Results KW - Software AB -

Motivation: Current plant and animal genomic studies are often based on newly assembled genomes that have not been properly consolidated. In this scenario, misassembled regions can easily lead to false-positive findings. Despite quality control scores are included within genotyping protocols, they are usually employed to evaluate individual sample quality rather than reference sequence reliability. We propose a statistical model that combines quality control scores across samples in order to detect incongruent patterns at every genomic region. Our model is inherently robust since common artifact signals are expected to be shared between independent samples over misassembled regions of the genome.

Results: The reliability of our protocol has been extensively tested through different experiments and organisms with accurate results, improving state-of-the-art methods. Our analysis demonstrates synergistic relations between quality control scores and allelic variability estimators, that improve the detection of misassembled regions, and is able to find strong artifact signals even within the human reference assembly. Furthermore, we demonstrated how our model can be trained to properly rank the confidence of a set of candidate variants obtained from new independent samples.

Availability and implementation: This tool is freely available at http://gitlab.com/carbonell/ces.

Contact: jcarbonell.cipf@gmail.com or joaquin.dopazo@juntadeandalucia.es.

Supplementary information: Supplementary data are available at Bioinformatics online.

VL - 33 UR - https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btx482 IS - 22 U1 - https://www.ncbi.nlm.nih.gov/pubmed/28961772?dopt=Abstract ER - TY - JOUR T1 - Evidence for short-time divergence and long-time conservation of tissue-specific expression after gene duplication. JF - Brief Bioinform Y1 - 2011 A1 - Huerta-Cepas, Jaime A1 - Dopazo, Joaquin A1 - Huynen, Martijn A A1 - Gabaldón, Toni KW - Animals KW - Conserved Sequence KW - Evolution, Molecular KW - Gene Duplication KW - gene expression KW - Genome KW - Humans KW - Mice KW - Organ Specificity AB -

Gene duplication is one of the main mechanisms by which genomes can acquire novel functions. It has been proposed that the retention of gene duplicates can be associated to processes of tissue expression divergence. These models predict that acquisition of divergent expression patterns should be acquired shortly after the duplication, and that larger divergence in tissue expression would be expected for paralogs, as compared to orthologs of a similar age. Many studies have shown that gene duplicates tend to have divergent expression patterns and that gene family expansions are associated with high levels of tissue specificity. However, the timeframe in which these processes occur have rarely been investigated in detail, particularly in vertebrates, and most analyses do not include direct comparisons of orthologs as a baseline for the expected levels of tissue specificity in absence of duplications. To assess the specific contribution of duplications to expression divergence, we combine here phylogenetic analyses and expression data from human and mouse. In particular, we study differences in spatial expression among human-mouse paralogs, specifically duplicated after the radiation of mammals, and compare them to pairs of orthologs in the same species. Our results show that gene duplication leads to increased levels of tissue specificity and that this tends to occur promptly after the duplication event.

VL - 12 IS - 5 U1 - https://www.ncbi.nlm.nih.gov/pubmed/21515902?dopt=Abstract ER - TY - JOUR T1 - Mutation screening of multiple genes in Spanish patients with autosomal recessive retinitis pigmentosa by targeted resequencing. JF - PLoS One Y1 - 2011 A1 - González-del Pozo, María A1 - Borrego, Salud A1 - Barragán, Isabel A1 - Pieras, Juan I A1 - Santoyo, Javier A1 - Matamala, Nerea A1 - Naranjo, Belén A1 - Dopazo, Joaquin A1 - Antiňolo, Guillermo KW - Alleles KW - DNA Mutational Analysis KW - Exons KW - Genetic Variation KW - Genome KW - Hispanic or Latino KW - Humans KW - Introns KW - Language KW - mutation KW - Mutation, Missense KW - Oligonucleotide Array Sequence Analysis KW - Polymerase Chain Reaction KW - Reproducibility of Results KW - Retinitis pigmentosa KW - United States AB -

Retinitis Pigmentosa (RP) is a heterogeneous group of inherited retinal dystrophies characterised ultimately by the loss of photoreceptor cells. RP is the leading cause of visual loss in individuals younger than 60 years, with a prevalence of about 1 in 4000. The molecular genetic diagnosis of autosomal recessive RP (arRP) is challenging due to the large genetic and clinical heterogeneity. Traditional methods for sequencing arRP genes are often laborious and not easily available and a screening technique that enables the rapid detection of the genetic cause would be very helpful in the clinical practice. The goal of this study was to develop and apply microarray-based resequencing technology capable of detecting both known and novel mutations on a single high-throughput platform. Hence, the coding regions and exon/intron boundaries of 16 arRP genes were resequenced using microarrays in 102 Spanish patients with clinical diagnosis of arRP. All the detected variations were confirmed by direct sequencing and potential pathogenicity was assessed by functional predictions and frequency in controls. For validation purposes 4 positive controls for variants consisting of previously identified changes were hybridized on the array. As a result of the screening, we detected 44 variants, of which 15 are very likely pathogenic detected in 14 arRP families (14%). Finally, the design of this array can easily be transformed in an equivalent diagnostic system based on targeted enrichment followed by next generation sequencing.

VL - 6 IS - 12 U1 - https://www.ncbi.nlm.nih.gov/pubmed/22164218?dopt=Abstract ER - TY - JOUR T1 - SNP and haplotype mapping for genetic analysis in the rat. JF - Nat Genet Y1 - 2008 A1 - Saar, Kathrin A1 - Beck, Alfred A1 - Bihoreau, Marie-Thérèse A1 - Birney, Ewan A1 - Brocklebank, Denise A1 - Chen, Yuan A1 - Cuppen, Edwin A1 - Demonchy, Stephanie A1 - Dopazo, Joaquin A1 - Flicek, Paul A1 - Foglio, Mario A1 - Fujiyama, Asao A1 - Gut, Ivo G A1 - Gauguier, Dominique A1 - Guigó, Roderic A1 - Guryev, Victor A1 - Heinig, Matthias A1 - Hummel, Oliver A1 - Jahn, Niels A1 - Klages, Sven A1 - Kren, Vladimir A1 - Kube, Michael A1 - Kuhl, Heiner A1 - Kuramoto, Takashi A1 - Kuroki, Yoko A1 - Lechner, Doris A1 - Lee, Young-Ae A1 - Lopez-Bigas, Nuria A1 - Lathrop, G Mark A1 - Mashimo, Tomoji A1 - Medina, Ignacio A1 - Mott, Richard A1 - Patone, Giannino A1 - Perrier-Cornet, Jeanne-Antide A1 - Platzer, Matthias A1 - Pravenec, Michal A1 - Reinhardt, Richard A1 - Sakaki, Yoshiyuki A1 - Schilhabel, Markus A1 - Schulz, Herbert A1 - Serikawa, Tadao A1 - Shikhagaie, Medya A1 - Tatsumoto, Shouji A1 - Taudien, Stefan A1 - Toyoda, Atsushi A1 - Voigt, Birger A1 - Zelenika, Diana A1 - Zimdahl, Heike A1 - Hubner, Norbert KW - Animals KW - Chromosome Mapping KW - Databases, Genetic KW - Genome KW - Haplotypes KW - Linkage Disequilibrium KW - Phylogeny KW - Polymorphism, Single Nucleotide KW - Quantitative Trait Loci KW - Rats KW - Rats, Inbred Strains KW - Recombination, Genetic AB -

The laboratory rat is one of the most extensively studied model organisms. Inbred laboratory rat strains originated from limited Rattus norvegicus founder populations, and the inherited genetic variation provides an excellent resource for the correlation of genotype to phenotype. Here, we report a survey of genetic variation based on almost 3 million newly identified SNPs. We obtained accurate and complete genotypes for a subset of 20,238 SNPs across 167 distinct inbred rat strains, two rat recombinant inbred panels and an F2 intercross. Using 81% of these SNPs, we constructed high-density genetic maps, creating a large dataset of fully characterized SNPs for disease gene mapping. Our data characterize the population structure and illustrate the degree of linkage disequilibrium. We provide a detailed SNP map and demonstrate its utility for mapping of quantitative trait loci. This community resource is openly available and augments the genetic tools for this workhorse of physiological studies.

VL - 40 IS - 5 U1 - https://www.ncbi.nlm.nih.gov/pubmed/18443594?dopt=Abstract ER -