WGS and WES validation – challenges
Martin Kašný (Institute of Applied Biotechnologies a.s.)
Kvapilová Kateřina (IAB a.s., PřF UK), Brož Petr (IAB a.s., 2. LF UK), Daniel Lukáš (IAB a.s.), Novotný Adam (IAB a.s.), Kvapil Petr (IAB a.s.)
The validation of WGS and WES protocols to fulfill the rules of good laboratory practice according to the requirements of ČSN EN ISO 15189 ed.2:2013 is a challenging task. In the frame of the ENIGMA project we created experimental protocols of WGS as well as WES analysis and validated these protocols during the pilot phase of the above-mentioned project.
ENIGMA (Etalon of National Interpreted Genome Map) is a collaborative project executed by the Institute of Applied Biotechnologies Inc. and the Institute of Molecular and Translational Medicine of Palacky University. ENIGMA represents the first population genomics project in the Czech Republic and aims to create a digital genome standard of the Czech population and thus significantly increase the potential for the implementation of novel molecular diagnostic procedures and protocols.
The pilot phase included a comprehensive analysis of a whole-genome as well as a whole-exome of 100 samples. The normative requirements related to the validation of sample collection, handling, anonymization, storage and DNA extraction were performed by the Institute of Molecular and Translational Medicine. It was carried out according to their good laboratory practice protocols already in use in University Hospital Olomouc. Informed consents were prepared by the ENIGMA team and approved by the ethics committee of Palacky University as well as University Hospital Olomouc. The WGS and WES protocols, including the wet lab workflow, QC steps and QC parameters evaluation were optimized – starting with sample identification and handling, preparation of DNA libraries (WGS: TruSeq DNA PCR-Free High Throughput Library kit, Illumina and WES: Twist Human Core Exome EF Multiplex Complete Kit, Twist Bioscience), through standardization of sequencing processes (iSeq, NovaSeq 6000), ending up with data quality and integrity validation and application of bioinformatic analytical pipelines (DRAGEN, Finalist Platform, Illumina) as well as data storage (Tape drive) all including their respective crucial QC evaluation steps. The resulting variant statistics of both WG as well as WE were compared. As a result of the pilot study, the optimized procedures were transformed into standardized protocols corresponding to ČSN EN ISO 15189 ed.2:2013 certification.
The comparison of bioinformatic approaches applied for the detection and the analysis of CNVs in the WGS and the WES dataset.
Petr Brož (IAB a.s., 2 LF UK)
Kvapilová Kateřina (IAB a.s., PřF UK ), Daniel Lukáš (IAB a.s.), Kašný Martin (IAB a.s.), Novotný Adam (IAB a.s.), Kvapil Petr (IAB a.s.)
Genomic structural variants of the CNVs not only play an important role in human evolution, but also provide phenotypic diversity among individuals. At the same time, they are responsible for a wide range of diseases, including autism, schizophrenia, obesity, and others. Clinically established laboratory techniques for the detection of CNVs are: FISH, CGH array, SNP arrays or MLPA. These techniques can detect CNVs from one kilobase (Kb) to several megabases (Mb). Since NGS has been first presented, a number of algorithms for finding CNVs with resolutions ranging from 50 base pairs (bp) to megabases have been developed. Currently, there are four basic bioinformatic approaches for detecting CNVs from NGS data sequenced by the paired-end method: (1): read depth (RD), (2): paired-end mapping (PEM), (3): split reads (SR) and (4): assembly (AS). In our study, which is a pilot part of the ENIGMA project – a national digital map of the Czech genome, we selected a cohort of 100 probands whose DNA was sequenced on the NovaSeq 6000 platform in two ways: (1): genome-wide sequencing (WGS); (2): whole-exom sequencing (WES). The aim of the study was to compare algorithmic bioinformatic methods for the detection of CNVs and to compare frequencies and the occurrence of individual structural variants.