anavar_utils

A collection of python classes making use of the anavar package simpler from within python.

anavar_utils

Introduction

anavar_utils is a python package designed to facilitate the easy creation of control files for anavar (Barton and Zeng, 2018) as well as providing classes for the easy interpretation of anavar’s results files.

Anavar is available here, the manual describes in much more detail the models available and the parameters in the control files than in these docs.

To use the package to create a simple control file:

from __future__ import print_function
import anavar_utils as an

# data needed for anavar, SFS, callable sites and sample size
sample_size = 10
site_frequencies = [20, 19, 18, 17, 16, 15, 2, 1]
n_sites = 10000

# anavar_utils takes the sfs data in a dictionary
sfs_dict = {'SNP': (site_frequencies, n_sites)}

# intiate control file instance
control_file = an.Snp1ControlFile()

# set the data
control_file.set_data(sfs_dict, sample_size)

# construct control file string
control_contents = control_file.construct()

# output to a file or to stdout
print(control_contents)

This gives:

[algorithm_commands]
search_algorithm: NLOPT_LD_LBFGS
maxeval: 100000
maxtime: 600
num_searches: 500
nnoimp: 1
maximp: 3
optional: false

[model_commands]
model: SNP_1
n: 17
m: 10000
folded: false
sfs: 20, 19, 18, 17, 16, 15, 2, 1
dfe: discrete
c: 1
theta_range: 1e-06, 0.1
gamma_range: -250, 10
e_range: 0.0, 0.5
constraint: none

API

Control Files

Site frequency data format

In its simplest form a control file can be created for a given model with only the site frequency spectrum, the number of callable sites (in some cases and if per site parameter estimates are not required this can be set to 0) and the sample size. The control file classes take this data (using the .set_data() method) in the form of a dictionary, with keys specific to each model, as show in the table below.

model dictionary format
SNP_1 {'SNP': (sfs, n_sites)}
INDEL_1 {'INS': (sfs, n_sites), 'DEL': (sfs, n_sites)}
gBGC_GLEMIN_EXTENDED_M1* {'neutral_SNPs': (sfs, n_sites), 'ws_SNPs': (sfs, n_sites), 'sw_SNPs': (sfs, n_sites)}
neutralINDEL_vs_selectedINDEL {'neutral_INS': (sfs, n_sites), 'neutral_DEL': (sfs, n_sites), 'selected_INS': (sfs, n_sites), 'selected_DEL': (sfs, n_sites)}
neutralSNP_vs_selectedSNP {'neutral_SNP': (sfs, n_sites), 'selected_SNP': (sfs, n_sites)}

The site frequency data needs to provide in the form of frequency counts from low freq to high freq with frequencies with no variants entered as 0. For example this site frequency spectrum (SFS):

sfs

Would be expressed as:

sfs = [100, 50, 33, 25, 20, 17, 14, 12, 11, 10, 9, 8, 8, 7, 7, 6, 6, 6, 5]

class Snp1ControlFile

Snp1ControlFile()

Initiates instance of class to be used for creating a control file for the SNP_1 model. Also the parent class for all other control file classes.

construct(self)

Creates the final control file string with all specified settings

set_alg_opts(self, alg='NLOPT_LD_LBFGS', maxeval=100000, maxtime=600, search=500, nnoimp=1, maximp=3, optional=False, size=10000, key=3, epsabs=1e-50, epsrel=1e-10, rftol=1e-10, init=())

Sets algorithm options in control file

Parameters:

set_constraint(self, constraint)

Sets model constraints in control file

Parameters:

set_data(self, sfs_m, n, snp_fold=False, dfe='discrete', c=1, theta_r=(1e-06, 0.1), gamma_r=(-250, 10), error_r=(0.0, 0.5), shape_r=(0.001, 200), scale_r=(0.1, 1000.0), r_r=(0.05, 5.0))

Sets model and dfe commands in control file

Parameters:

set_dfe_optional_opts(self, optional=False, fraction=0.005, degree=50, delta=1e-05)

Sets optional commands for the DFE

Parameters:

class GbgcControlFile

GbgcControlFile(Snp1ControlFile)

Initiates instance of class to be used for creating a control file for the gBGC_EXTENDED_M1* model.

Inherits methods from Snp1ControlFile.

set_constraint(self, constraint)

Sets model constraints in control file

Parameters:

class Indel1ControlFile

Indel1ControlFile(Snp1ControlFile)

Initiates instance of class to be used for creating a control file for the INDEL_1 model.

Inherits methods from Snp1ControlFile.

set_constraint(self, constraint)

Sets model constraints in control file

Parameters:

class IndelNeuSelControlFile

IndelNeuSelControlFile(Snp1ControlFile)

Initiates instance of class to be used for creating a control file for the neutralINDEL_vs_selectedINDEL model.

Inherits methods from Snp1ControlFile.

set_constraint(self, constraint)

Sets model constraints in control file

Parameters:

class SNPNeuSelControlFile

SNPNeuSelControlFile(Snp1ControlFile)

Initiates instance of class to be used for creating a control file for the neutralSNP_vs_selectedSNP model.

Inherits methods from Snp1ControlFile.

set_constraint(self, constraint)

Sets model constraints in control file

Parameters:

Results Files

class ResultsFile

ResultsFile(self, anavar_results_file)

Creates an anavar results file object from a file object of a valid anavar results file

Parameters:

bounds_hit(self, theta_r=(1e-06, 0.1), gamma_r=(-250, 10), error_r=(0.0, 0.5), shape_r=(0.001, 200), scale_r=(0.1, 1000.0), r_r=(0.05, 5.0))

Determines if any of the parameter estimates in the results file have hit their upper or lower limits

Parameters:

Returns:

control_file(self)

Returns:

converged(self)

Determines if algorithm has converged. If ln likelihoods differ by less than 0.1 algorithm is said to have converged

Returns:

data_type(self)

Returns:

dfe(self)

Returns:

estimates(self, as_string=False)

Parameters:

Yields:

free_parameters(self)

Returns:

get_alpha(self, dn, ds, var_type)

Calculates alpha (proportion of substitutions fixed by positive selection) using equation 19 in Barton and Zeng (2018)

Parameters:

Returns:

header(self)

Returns:

ml_estimate(self, as_string=False)

Gets the maximum likelihood estimate from the results file (assumes a sorted results file)

Parameters:

Returns:

num_class(self)

Returns:

num_runs(self)

Returns:

seed(self)

Returns:

class MultiResultsFile

MultiResultsFile(self, file_list)

Merges anavar results files by creating ResultsFile instances for each file and writing a temporary results file of merged results before calling a ResultsFile instance on the new merged file, which is then deleted.

Parameters: