Created: 13 Jul 2013 | Modified: 30 Jun 2016 | BibTeX Entry | RIS Citation |
The research problem is to understand how the dynamics of cultural transmission processes are altered if we observe transmitted information through a classificatory filter rather than tabulating the actual copying events themselves.
I did a first implementation of this in TransmissionFramework, but even with a single classification and a couple of dimensions of underlying variation, performance really sucked. Mainly because I was acting like a good Java programmer and writing to interfaces, using generic classes so that my framework was configurable and generic and….you get the idea. I tried some experiments with simuPOP earlier this year, and the performance is excellent. And I’ve figured out how to do much of what I need for dissertation simulations, which is all I need.1
But for the coarse graining project it’ll work well. Hence, I’ve started coding CTPy, a library of Python functions and classes written to the simuPOP API for performing cultural transmission simulations.
The design goal is to overlay arbitrary paradigmatic classifications on top of the variation being evolved in a simuPOP simulation, and then produce class counts and richness values as we do for the raw variation itself. General requirements:
The main issue is mapping the allele space to modes. In TF and my dissertation proposal (Madsen 2012), I modeled trait dimensions as the unit interval \([0,1]\), with traits taking real-valued locations on the interval. This offered an infinite-alleles model of mutation (it’s always possible to distinguish two values, at least for a very large number of values given 64 bits), but a constrained method of doing allelic partitions for modes. An example of chopping a dimension into three modes might be:
\([0, 0.4) [0.4, 0.7) [0.7, 1.0]\)
Obviously, a trait with value \(0.3576\) would be identified to Mode 1 in this dimension. Simple. Generating random partitions was also easy, or doing a hierarchy of classification levels.
In simuPOP, the KAllelesMutator I’m using operates with LONG integer values, so it’s not truly infinite alleles, just many more than you’ll need unless you have a giant population and a very long simulation run. So we set the maximum allelic value (MAXVAL) when we construct the mutator, and then it seems to choose new alleles uniformly from \([0, MAXVAL]\).
What we don’t have is a good allelic distribution of initial variants – variants appear to be sequential and taken from zero. Either we ought to have initial alleles chosen randomly from the permissible space, or we simply have to wait until all the initial alleles are gone before we start recording data.
Then, we can specify random partitions as fractions of MAXVAL. Classifications could also be specified manually in the same way I did it in TF, and simply interpreted as fractions of MAXVAL. MAXVAL itself could be calculated somehow, based on popsize and length of simulation run and mutation rate. Or it could be set to a very large number. Either way, it simply needs to be recorded as part of the simulation run info.
Madsen, Mark. 2012. “Dissertation Proposal: Empirical Sufficiency and Neutral Theory: Building Seriation and Classification into Archaeological Models of Cultural Transmission.” http://dx.doi.org/10.6084/m9.figshare.745321.