domingo, 11 de dezembro de 2011
New era of Population Genetics data analysis
As we have already know the R environment and other open sources tools are changing the way we perform and think on data analysis. And this revolution is touching the population genetics data analysis as well. Recently I discovered (so late) two very interesting R packages that perform basic and multivariate statistical analysis on gene frequencies accounted from populations. These packages are “pegas” and “adegenet”. The former I am using to test HW equilibrion (loci), gametic disequilibrium and F-statistics. The later I am using to access population structure (or sub-structure) into my data set through multivariate methods. For more information I recommend:
‘pegas’ page:
‘adegenet’ page:
segunda-feira, 5 de dezembro de 2011
GENEPOP or FREENA for NULL alleles estimation
Right now, when I was performing some preliminary data analysis (data obtained during my Ms that I was so late to publish the paper), I came across with something interesting. As we have already known GENEPOP and FREENA are brother softwares developed for the analysis of microsatellite and some co-dominante markers (as Allozymes and RFLPs). Also, as we have already known, as a first step analysis we must have to calculate the null allele frequency on each loci through each population to confirm the presence of null alleles. The most powerful way to estimate the frequency of the null allele is using maximum likelihood estimator with EM algorithm. Both softwares have this approach (sure!!), and in both you can assign the null alleles (if you previously know) as 999 (or “99”) and the genotyping failures as 000 (or “00”). As a good approach if you really don´t know how occurred you must assign 000. Of course we expect to have the same null allele’s frequency estimative (or approximately). For the majority of the loci I had the same estimative (the MLE with EM is sufficiently robust) but for some loci I had a super estimative of null alleles. It occurs with the GENEPOP. I have seem that for some locus the highest allele (>pb) have taken as a null. For that reason at the output file the null allele is the bigger one (considered as null). I have been using both, for some analysis as null allele estimative and of course for FST with ENA correction. But I prefer GENEPOP to other analysis as HW test specially.
Assinar:
Comentários (Atom)