Raw RAD data processing

process_radtagsFrom http://catchenlab.life.illinois.edu/stacks/manual/#clean

The first thing you need to do for analyzing RAD-seq data or any other next generation sequencing data is removing low-quality sequences and separating reads from different samples using individual assigned barcodes (also called ‘de-multiplexing’).

For the sequence quality filtering as well as de-multiplexing, we are going to use process_radtags program from Stacks. You can find more information and examples in the manual page of process_radtags on Stacks website.

Sequence quality filtering:

Not all the reads of your data are high quality. Generally, you need to remove the lowest quality reads from you data before any other analyses. For de novo assembling, higher stringency is required compared to the analysis which alignment to a reference genome. It is worth noticing that low quality sequences will always affect the downstream analysis, causing false positive of SNP calling.

Here are some useful links about sequence quality scores: