- Registration and Welcome
- Opening Remarks
- Introduction to GATK
- Introduction to sequencing data
- Introduction to pre-processing
- Coffee break
- Introduction to pipelining
- Getting Started with Pipelining
Massive sequencing is becoming mainstream in many fields of biomedicine, including the clinical practice.
However, the management of sequencing raw data to transform them into valuable biological information on variants and its further interpretation is still a complex task that requires of intensive computing and trained personnel.
This course covers all the steps from the raw sequencing data, produced by the sequencers, to the obtention of lists of variants using the popular GATK software. The course is followed by a tutorial on tertiary analysis (diagnostic and secondary findings) using the tools developed by the Platform of Computational Medicine. GATK workshops are designed to provide a comprehensive onboarding experience to new users, as well as access to more advanced understanding for users who are already familiar with the toolkit.
The workshop is aimed at a mixed audience of people who are new to the topic of variant discovery or to GATK, seeking an introductory course into the tools, or who are already GATK users seeking to improve their understanding of and proficiency with the tools.
Participants should already be familiar with the basic terms and concepts of genetics and genomics. Basic familiarity with the command line environment is required. Participants will be expected to bring their own laptops with software preinstalled (detailed instructions will be posted two weeks before the course) unless the workshop host provides a computer lab.
Attendance is typically limited to 30 or 40 participants depending on the number of trainers and the size of the room that is available.
Please note that this workshop is focused on human data analysis.
The majority of the materials presented does apply equally to non-human data, and we will address some questions regarding adaptations that are needed for analysis of non-human data, but we will not go into much detail on those points.
The hands-on exercises are designed to teach participants concrete skills and enable them to use the tools in their own research.
In the hands-on sessions focused on analysis, we walk participants through exercises that teach them how to manipulate the standard data formats involved in variant discovery and how to apply GATK tools appropriately to common use cases and data types. In the course of these exercises, we demonstrate useful tips and tricks for interacting with GATK and Picard tools, dealing with problems, and using third-party tools such as IGV and RStudio.
In the optional hands-on sessions on pipelining, we walk participants through exercises that teach them to write workflow scripts using WDL, the Broad's new Workflow Description Language, and to execute these workflows locally with Cromwell as well as through Terra, our publicly available, secure cloud-based analysis service.