## SeriationCT Sample Size Series experiments

### Effect of Varying Sample Size

After SAA’s, I used some existing simulations (seriationct-27 from the experiment-seriationct-2 repository) to prototype examining the effects of sample size on:

- The gross structure of the seriation solutions: how much branching when small? do we get isolates?

- Lineage structure recovery

Working with the existing simulation results and post-processed samples, I wrote a quick modification of the assemblage sampler called `seriationct-sample-assemblages-for-samplesize-sequence-seriation.py`

. The script takes the output from the `seriationct-export-data.py`

script and performs subsampling as follows:

- Uses a
`samplefraction`

parameter to select an initial sample of assemblages, just like the normal `seriationct-sample-assemblages-for-seriation.py`

.

- Selects these samples given the
`sampletype`

parameter, as a pure random sample of assemblages, a spatially stratified sample given NxN quadrats, temporally stratified given N even periods of time, and spatiotemporal sampling which stratifies by quadrats and periods.
- Subsamples the
`samplefraction`

assemblages in a sequence decreasing by 2, so for example, if `samplefraction`

is 30, the largest set of assemblages will be 30 randomly sampled, and then 28 sampled from the 30, 26 sampled from the original 30, and so on.

The net result is a nested series of samples (rather than independent random samples of different sizes).

### Statistics Brainstorm

Some of the things I want to know are:

- How consistent is the lineage structure of each seriation solution, at each sample size? (i.e., along a branch, how mixed are lineage identifiers?)
- How consistent is the temporal sequence of the solution?

- How “branchy” is the solution?

- Furthermore, what is the average and variance of these measures, across the sequence of sample sizes? Or perhaps the quantiles and order statistics?
- At what sample size does lineage structure and chronology become visible and semi-stable?

I’m just starting to build measures for these items, since they involve traversal and parsing of the annotated minmax graphs.

### References Cited