Travis-CI Build Status CRAN status GitHub release license


Population Assignment using Genetic, Non-Genetic or Integrated Data in a Machine-learning Framework


This R package helps perform population assignment and infer population structure using a machine-learning framework. It employs supervised machine-learning methods to evaluate the discriminatory power of your data collected from source populations, and is able to analyze large genetic, non-genetic, or integrated (genetic plus non-genetic) data sets. This framework is designed for solving the upward bias issue discussed in previous studies. Main features are listed as follows.

Install assignPOP

You can install the released version from CRAN or the up-to-date version from this Github respository.

Note: When you install the package from Github, you may need to install additional packages before the assignPOP can be successfully installed. Follow the hints that R provided and then re-run install_github("alexkychen/assignPOP").

Package tutorial

Please visit our tutorial website for more infomration *

What’s new

Changes in ver. 1.2.4 (2021.10.27) - Update membership.plot - add argument ‘plot.k’ and ‘plot.loci’ to skip related question prompt.


Changes in ver. 1.2.3 (2021.8.17) - Update assign.X - (1)Add argument ‘common’ to specify whether stopping the analysis when inconsistent features between data sets were found. (2)Add argument ‘skipQ’ to skip data type checking on non-genetic data. (3)Modify argument ‘mplot’ to handle membership probability plot output.

Changes in ver. 1.2.2 (2020.11.6) - Update read.Genepop and read.Structure - locus has only one allele across samples will be kept. Use reduce.allele to remove single-allele or low variance loci. - In ver. 1.2.1, errors might be generated when running assign.MC (and other assignment test functions) due to existence of single-allele loci. (fixed in ver. 1.2.2)

Changes in ver. 1.2.1 (2020.8.24) - Update read.Genepop to increase file reading speed (~40 times faster) - Update read.Structure to increase file reading speed (~90 times faster) - read.Structure now also can handle triploid and tetraploid organisms (see arg. ploidy) - fix bug in allele.reduce to handle small p threshold across all loci

Changes in ver. 1.2.0 (2020.7.24) - Add codes to check model name in assign.MC, assign.kfold, assign.X - Add text to SVM description - Fix cbind/stringsAsFactors issues in several places for R 4.0 - Able to inject arugments used in models (e.g., gamma in SVM)

Changes in ver. 1.1.9 (2020.3.16) - Fix input non-genetic data (x1) error in assign.X

Changes in ver. 1.1.8 (2020.2.28) - update following functions to work with R 4.0.0 - accuracy.MC, accuracy.kfold, assign.matrix,, membership.plot - add stringsAsFactor=T to read.table and read.csv - temporarily turn off testthat due to its current failure to pass test in Debian system

Changes in ver. 1.1.7 (2019.8.26) - add broken-stick method for principal component selection in assign.MC, assign.kfold, and assign.X functions - update accuracy.MC, accuracy.kfold, assign.matrix to handle missing levels of predicted population in test results - update assign. and accuracy. functions to handle numeric population names

Changes in ver. 1.1.6 (2019.6.8) - fix multiprocess issue in assign.kfold function

Changes in ver. 1.1.5 (2018.3.23) - Update assign.MC & assign.kfold to detect pop size and train.inds/k.fold setting - Update accuracy.MC & assign.matrix to handle test individuals not from every pop - Slightly modify levels method in accuracy.kfold - fix bugs in accuracy.plot for K-fold results - fix membership.plot title positioning and set text size to default

Changes in ver. 1.1.4 (2018.3.8) - Fix missing assign.matrix function

Changes in ver. 1.1.3 (2017.6.15) - Add unit tests (using package testthat)

Changes in ver. 1.1.2 (2017.5.13) - Change function name read.genpop to read.Genepop; Add function read.Structure. - Update read.genpop function, now can read haploid data

Cite this package

Chen, K. Y., Marschall, E. A., Sovic, M. G., Fries, A. C., Gibbs, H. L., & Ludsin, S. A. (2018). assign POP: An R package for population assignment using genetic, non-genetic, or integrated data in a machine-learning framework. Methods in Ecology and Evolution. 9(2)439-446.

Papers citing our package

Previous version

Previous packages can be found and downloaded at the releases page

Version compatibility (2020.7.24)

assignPOP version 1.1.9 and earlier are not fully compatible with newly released R 4.0.0. If you’re using R 4.0.0 (or newer), please update your assignPOP to 1.2.0.