RAD-tag denovo assembling

After RAD raw data processing (low-quality data removing and de-multiplexing), the next step is to assemble RAD-tag de novo from different samples. Stacks’ denovo_map.pl pipeline program will run ustacks, cstacks, and sstacks together. You may notice that there are many parameters such as m, M, n, etc.  More information about those parameters can be found in Stacks homepage. Different values of those parameters will significantly affect your results (e.g. number of loci, SNPs, etc.), and you need to optimize the parameters setting for your dataset.

Please note that each dataset is different, for example, the biology of your organism; the questions you want to ask; RAD library construction, etc. So do not just rely on other people’s setting for your dataset. Here, I am going to walk you through the procedure of parameters optimization for denovo assembling in Stacks. 

There are four major parameters that are very important (little m, big M, small n, and max_locus_stacks).

  • The detailed information about the first three parameters (m, M, and n) can be found in Stacks homepage.
  •  max_locus_stacks is the maximum number of stacks allowed at a single locus.

Generally, for diploid, you should not have more than two stacks per locus (max_locus_stacks=2). However, you always will get convergent errors in reads that formed small stacks because of the errors from Illumina data. Those errors can be handled easily through SNP model. However, if you restrict max_locus_stacks=2, you will blacklist these loci. So, I would recommend running your data with the parameter set to 2, 3, or 4, and then compare the effect.


  1. Running stacks with different parameters setting combinations: m (1-4), M (1-4), n (0-4), max_locus_stacks (2-4).
  2. Record output for each combination: Number of reads, Number of assembled loci, Number of polymorphic loci, and Number of SNPs.
  3. choose the parameters combination which generated the highest number of SNPs.

P.S. We want to have more relaxed filtering to get as many data as possible during the stage of denovo assembling. We can always use more restrict filtering for the later analysis.