Lab Notebook for Mark E. Madsen

The Real Core of the Scientific Method and Why We Should Trust It

2017-06-23T00:00:00-07:00

Apropos of much in our world today, I have been rereading some of the better philosophy of science lately, and some of the critiques of the idea that there is a universally valid “scientific method.” I have nothing to really say to those whose resistance to the methods and conclusions of science comes from a deep-seated place where faith – of which ultimately identity and solidarity figure prominently – now requires them to reject it. My purpose lies elsewhere today.

But to those who are inundated daily with skepticism about complex issues such as climate change, vaccination, energy production, or food safety, and honestly wonder what the “truth” is….please know this. There is no such thing as a single “scientific method.” The fact that investigating climate change, for example, does not look like the precise methods of particle physics or the amazing marriage of biochemistry and algorithms in modern genomics, does not make it a lesser science, it merely reflects the difficulty of data collection on the most massive scales, the impossibility of precise modeling on scales ranging several orders of magnitude, and the inherent randomness of the phenomena.

This does not mean that science does not work, however. Science, and every honest inquiry we humans undertake, involves a simple principle:

Do not continue to bullshit yourself and others that you are correct when there is a reason to suspect that you might be wrong. Let your ego lie in the knowledge that you work hard to be right, not the conviction that you are right. If most people – not even everyone! – does this, we move our knowledge forward, and make better decisions for ourselves, and our descendants.

From this, we have gone from migrating out of Africa, to measuring the circumference of the earth by measuring shadows, to understanding the true place of our world in the universe, to understanding the evolution of life on our world, to preventing disease, to building technologies that might allow us to mitigate the effects of our present problems.

But it works only if we change what makes us feel good – our “reward structure.” We have to value our personal contributions in terms of making good decisions rather than “our team winning.” Tribalism and individualism lead nowhere on the long term. The economics of self-interest are tragically incomplete and thus, wrong and self-limiting.

Nature is not merely “red in tooth and claw.” You would not exist if competition were the only important value. The rich panoply of evolutionary history is abundant testament to this. Cooperation and altruism are not foolish, they are fundamental to evolution and success.

The Key Role of Cooperation in Evolution and Political Economy

2017-02-12T00:00:00-08:00

Darwin’s birthday is a good opportunity to reflect on the larger significance of evolutionary thinking in our common life. This is especially important as we head into a period of history in which competition and the “war of nature” appear poised to replace communal action and empathy for the plight of those our politics leaves behind. Our national discussions over immigration, race, and the rising distrust of the “other” by much of white America highlight this shift, but no less significant is a decades-long trend to replace the New Deal consensus on economic fairness, common infrastructure, and political equality with the individualist, anti-cooperative rhetoric of libertarian and conservative economists and politicians.

Darwin Day is an especially important time to contemplate this shift because so much of economic theory is rooted in claims about what is “natural” in social behavior, and thus in our economic relations. Darwin himself seems to have painted a vision of organic evolution which was competitive and individualistic, with little or no explanation for the cooperation that is rife in the biological world. In the famous closing of the Origins, for example, the evolutionary process is described in poetic but essentially hostile terms:

It is interesting to contemplate an entangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent on each other in so complex a manner, have all been produced by laws acting around us. These laws, taken in the largest sense, being Growth with Reproduction; Inheritance which is almost implied by reproduction; Variability from the indirect and direct action of the external con- ditions of life, and from use and disuse; a Ratio of Increase so high as to lead to a Struggle for Life, and as a consequence to Natural Selection, entailing Divergence of Character and the Extinction of less-improved forms. Thus, from the war of nature, from famine and death, the most exalted object which we are capable of conceiving, namely, the production of the higher animals, directly follows. There is grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.

There is certainly grandeur in this view of life, and I have always found this a deeply moving passage, as have many since it was written in 1859. But there are several possible lessons from this passage, some of which highlight the cooperative element in evolution, and some which lead to thinking of evolution as essentially anti-cooperative and individualistic.

Darwin highlights, in his image of the “entangled bank,” with a multitude of species, so different and yet dependent upon each other for subsistance. The main message of the first part of this famous passage is the one that has always drawn me: the idea that the complexity of diversity of life is produced by the action of a few simple principles interacting in the fullness of nature’s circumstances. The link to cooperation as an essential part of evolution is weak, coming only through the mention of “dependency.” The passage finishes more explicitly, claiming that it is the “struggle for life” that the beauty and complexity of life arises.

It is fundamentally this “competitive” aspect of Darwinian evolution that gave rise, almost immediately, to “social Darwinism” and theories of eugenics that were used by late Victorians and the elites of the Gilded Age and early 20th century to justify their grip on economic and political power, their immiseration of the poor in the course of achieving their own wealth, and the lack of social welfare provisions or protections in our politics until much later.

The image of evolution as inherently competitive, and not cooperative, persists in our popular and even some learned cultures, despite the fact that the last half-century has seen an explosion in our understanding of the centrality of cooperation to evolutionary theory, and a deepening of our understanding of how natural selection can create biological and social mechanisms that foster cooperation.

The main difficulty in explaining how “selfish” selection favors “altruistic” cooperation, in fact, is not anything in nature itself, which is rife with cooperative phenomena, but our pre-existing biases, and the tendency of those biases to cause us to oversimplify complex phenomena. We in the social and biological sciences learn about the “prisoner’s dilemma” and elementary game theory, for example, and are easily convinced that cooperation is hard to evolve, and thus that self-interested behavior is “rational” and “cooperation” requires one to be irrational.

But of course I would not be writing this, and you would not be reading it, if cooperative behavior did not pay off. The earliest phases in the evolution of life were unicellular, and remained so for most of the history of life on Earth. Every animal, plant, or fungus visible to the naked eye is the product of a major evolutionary shift, where multiple independent cells banded together, first into loose colonies (as bacteria do today in biofilms) and then, in certain lineages, into simple multicellular organisms (such as Volvox).

The exact mechanisms by which multicellularity evolved are difficult to demonstrate given the low preservation of such forms in the fossil record, but they must involve a suppression of competition between the cells which form constituent parts of the larger organism. This suppression is accomplished by many mechanisms. One of the most important is the zygotic bottleneck that most animals and plants go through, where a new organism arises from a single egg or cell, and thus the many cells of the new organism’s body are (largely) genetically identical (see the work of biologists Leo Buss and Richard Michod for more detailed descriptions of the importance of germ-line/somatic sequestration). A second mechanism is essentially punitive: the cells of most animals, including ourselves, are programmed for automatic cellular death if they attempt to leave their cooperative role and drive for their own reproduction. This anti-social escaping of the cooperative bonds is what we call “cancer.”

Moreover, the evolution of multicellularity is not a “frozen accident” of evolution, having occcured by chance and then locked in. If we look at cellular aggregation as the requirement for multicellularity, a conservative estimate is that it evolved independently at least 25 times. Stricter definitions of multicelluarity, involving cellular communication and connection mechanisms, still show at least ten separate origins within eukaryotic organisms (once in animals, three in fungi, and six in plants). Far from being an accident of evolutionary history, not to be repeated if we could “replay the tape,” multicelluarity and the mechanisms of self-control that go with it appear to be common solutions to problems of life in certain environments.

Everywhere we look in the natural world, we see cooperative behavior, and the success it creates within species and social groups. The lesson, for economics and political economy, should be clear: self-interest is not the only principle that underpins behavior in our social world. In fact, in the last several decades research on the detailed mechanisms by which pro-social, cooperative behavior evolves and is stabilized within human societies has exploded. It is not longer possible to know, and assemble into a list, all of the papers and studies on the subject (even if I thought you, dear reader, wanted a bibliography from me).

The mechanisms by which cooperation can evolve within social groups, by biological as well as cultural evolution, are many, and they go by technical names such as “indirect reciprocity,” but they boil down to some simple principles. Far from being anonymous, one-shot interactions of the kind that most simple economic theories assume (usually in the name of mathematical tractability), social life is a multi-player, repeated interaction where we gain and lose reputation based on our ability to observe how others behave, “keep score,” and when needed, mete out social sanctions against those who fail to act well. These mechanisms operate in families, friendship circles, work groups, cities and towns, and even among sets of nations.

We have been told, in recent days, that our society’s success depends upon putting ourselves “first” and not cooperating with a variety of other groups: that the US should not cooperate with its long-time allies worldwide, that conservatives should not compromise with liberals on a spectrum of issues from health care to immigration to education, that there is a “cold war” between urban and rural America. These conflicts are undoubtedly real, and we should not lightly dismiss their depth or severity.

But we should also remember that each issue has many possible means of solution, and many means by which we can achieve failure or bad outcomes. Many of the best solutions, I believe, will necessitate cooperation at least in part, and we can only achieve that by ensuring that the full set of social sanctions and mechanisms are brought to bear to ensure that cooperation can win out and give us the best solutions possible.

It seems appropriate to close by quoting a slightly later Darwin, in his 1871 book “The Descent of Man,” where he wrote:

There can be no doubt that the tribe including many members who are always ready to give aid to each other, and to sacrifice themselves for the common good, would be victorious over other tribes. And this would be natural selection.

Happy Darwin Day!

Human Behavior and Evolution Society 2016 Talks

2016-06-30T00:00:00-07:00

I participated in two papers today at the 2016 meetings of the Human Behavior and Evolution Society, in Vancouver, B.C. The first was solo work, titled “Computational Methods for Identifying Metapopulation Interaction Patterns From Seriation Solutions.” The presentation slides are on Figshare now.

The second paper was jointly done with Carl P. Lipo, and comprises work on new variations on seriation methods. The title is: “Continuity-based approaches to seriation and the study of patterns of cultural inheritance”. The presentation slides are on Figshare now.

Research Priorities for 2016

2016-04-10T00:00:00-07:00

I am somewhat remiss in discussing research goals for the year, because of some family issues which have taken much of my time. But I’m on a flight back from Orlando from the Society for American Archaeology annual meetings, and I’ve accumulated notes and ideas over the last three days about where my research stands and what my next steps are. I also want to evaluate how I did in addressing the priorities I set for 2015.

Overall, I got a lot more research done than expected given other responsibilities, but a lot less writing, which I suppose is to be expected. I was able to tuck bits of work in between caregiving duties, while it’s much harder to find blocks of time where I can write, with all of the necessary materials at hand. That remains a challenge this year, and one I need to fix since my time will be impacted on an ongoing basis.

I started the year understanding the nature of the final conceptual bits of my project to infer metapopulation structure (in terms of cultural transmission patterns) from diachronic seriation solutions, and at this point (early April 2016), the concepts and connections are firming rapidly into an analytic method and a set of applications. That feels really good. I’ve given an exploratory talk about this in Binghamton recently in my EVoS seminar, and a larger sample of models and more sophisticated analysis will be the subject of a talk at the Human Behavior and Evolution Society annual meeting in Vancouver in late June.

Seriation and temporal network models
Parallel seriationct processing completed
Network models for seriationct finalization
Likelihood of empirical seriation WRT models
Seriation of additional data sets
Relation between seriation and cladistics
Continuity HBES paper
SeriationCT HBES paper
Future questions

Seriation and Temporal Network Models

The basic question I’m addressing is whether we can use diachronic seriation solutions, which map trait similarity across both space and time, to infer something of the topology of the temporal network formed (conceptually) by the changing interaction strength between past communities, where “interaction strength” refers to the intensity with which people migrated between communities or engaged in social learning with individuals outside their local subpopulation.

In prototype, the answer appears to be YES. Metapopulation interaction structures that have significantly different topologies for weighted edges in an interval temporal network display different seriation structures, measured as the Laplacian eigenvalues of the seriation expressed as a graph (as we do in our IDSS software). I am testing differentiation of interaction network classes using a standard, high quality machine learning classification algorithm (e.g., gradient boosted trees or random forests).

There is much unfinished business turning my early results into a solid body of scientific results and a general method. The next two priorities cover the “big” tasks, but I would be remiss if I didn’t mention the important element of ensuring that I am sampling and seriating over randomized instances of interaction network models, not just Monte Carlo samples of time averaged cultural transmission. Some networks have very little scope for randomization, such as the “panmictic” case where interaction is represented as a uniformly weighted complete temporal network (i.e., where the network at any point in time is \(K_n\)). Some randomization related to community/assemblage duration could be modeled by randomizing the choices of number of slices and total simulation length, and that probably needs to be done.

I should also mention that randomization of the network model involved fully rewriting the post-simulation processing chain, such that we pass and use the correct network model to every step of the chain, and can associate parameters and info from each stage of the processing chain to downstream elements for analysis. That work is nearly complete (15 April target).

The two weightier methodological issues are discussed in their own sections. Once complete, the main computational task is to develop a reference library of simulated seriations resulting from the chosen suite of network models, across priors for both network parameters and CT simulation parameters.

Network Models for SeriationCT

Right now I have the following interaction network models:

Lineage splitting or coalescence
Complete network/panmixis
Approximate nearest neighbor interaction

This set has not been chosen because it represents the “right” set of models for any specific empirical case, but because I was developing ways of representing various topological characteristics (e.g., distance-respecting interaction, distance-insensitive interaction, large-scale splits in interaction or coalescence). It’s apparent to me that there are really two levels of topological features we might be able to examine:

Mesoscale connection variability: sparseness, evenness of interaction, and the decay of interaction with distance all speak to mesoscale connectivity
Macroscale history: the history of lineage splitting and coalescence events which give us the structure we see at very large historical scales

In a mature inference method, we need an ABC reference library of seriations that includes a good spectrum of mesoscale options, expressed in whatever set of macroscale options seem most likely given our gross-scale culture-historical knowledge or previous research.

Thus, I am going to work on finalizing graph builders that incorporate:

Panmixis
Nearest neighbor interaction with tunable small world links
Hierarchical (2- and 3-tier) nearest neighbor interaction with tunable small world links

Each of these graph builders should then have the ability, ultimately, to also implement a lineage split or coalescence “on top” of that mesoscale connectivity pattern.

Likelihood of Empirical Seriations

Given a reference table of seriations (seriation graph Laplacian eigenvalues, more precisely), a classifier model will always give me an answer as to which interaction model a seriation belongs to. So it’s really the full ABC inference loop that will help us figure out which of the models might be “least wrong,” with the possibility that none are very close always open. I have explored the Euclidean distance/L2 loss between the empirical seriation and the eigenvalue spectra of reference table data points, and that will be the first criterion used, although I want to fully explore Pudlo and Robert’s (2014) suggestion that a two step random forest analysis could perform better than simpler rejection or threshold methods in this situation.

Relation Between Seriation and Cladistics

This is really just a downpayment on a note, but I talked with Carl Lipo a great deal this weekend about an idea I’ve been developing, that cladistics and seriation are really separated by a “level” (sensu Dunnell (1971)) distinction. Frequency seriation, whatever the ordering algorithm, takes advantage of trait polymorphism in the population to make ordering decisions, whereas standard phylogenetic methods tend towards presence/absence of binary or multivalued variables. Thus, phylogenetics operates at a coarser level of analysis (but not scale!) and makes coarser distinctions.

Of course, there is a small literature on polymorphic characters in seriation, but it seems to die out and there are no packages I know of that use character frequencies in tree construction. If there were, those methods would be comparable to frequency seriation.

So really, various types of occurrence or character-state seriation are comparable to cladistics, as “macroevolutionary” methods, and frequency seriation is “mesoevolutionary” at a finer level of analysis.

TODO: discuss synapomorphies in cultural traits

Continuity HBES Paper

We have a start on the continuity paper already, from SAA’s. The point of the empirical example in it right now is simply to show that we get the same answers when we examine frequency data with unimodality as the ordering criterion, compared to exact distance minimization. I think perhaps the point needs to shift to demonstrating how we can do much larger data sets with continuity seriation, which is crucial for truly understanding macroevolutionary patterning, while retaining the information about polymorphism that seriation employs (and cladistics generally does not, see previous section). This would be a good place to try to look at the LMV as a whole, adding Mainfort, PFG/Lipo, and anything else we have with enough sample size (it would be good, for example, to incorporate data from Greg Fox’s dissertation, from slightly further north in SE Missouri), and even look to seek if there are any data that would connect us up towards Cahokia.

SeriationCT HBES Paper

The goal is to actually give several empirical demonstrations, with brief descriptions of what interaction pattern (or patterns) are believed to hold in each case, and walk through the analysis to describe how it’s done, and show answers for 1-3 data sets. This might be two chunks of Mississippian that have different local interaction patterns, and the Woodland data. I need to think more carefully about the difference between interaction expectations for Woodland vs. Late Prehistoric, since the dispersed vs nucleated issue will bear on what suite of models we examine for each. I may not be at a point where I can develop the reference library of seriation data for Woodland dispersed communities yet, and perhaps I need to focus on 1-3 different late expressions.

Future Questions

These are questions which arise when we describe the regional structure of social learning and cultural transmission using interval temporal networks as the representation for mesoscale relationships, and employ seriation graphs and their statistics as the data to infer the class of ITN.

What effect does the mean duration of assemblages (compared to the total span of time) have on our ability to accurately classify seriations as to network model?
How does classification accuracy scale with the number of assemblages available, and the scheme by which they were sampled? (some of this may be anecdotally necessary to look at PFG, but a systematic computational analysis of the scaling can wait).
How does assemblage sample size in concert with innovation rates affect classification accuracy?

References Cited

Dunnell, Robert C. 1971. Systematics in Prehistory. New York: Free Press.

Pudlo, Pierre, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, and Christian P Robert. 2014. “ABC Model Choice via Random Forests.” ArXiv Preprint ArXiv:1406.6288.

Presentation and Paper for SAA 2016 - Measuring Cultural Relatedness Using Multiple Seriation Ordering Algorithms

2016-04-07T00:00:00-07:00

The Society for American Archaeology meetings are coming up in Orlando, and I’ll be participating in a session called:

Evolutionary Archaeologies: New Approaches, Methods, and Empirical Sufficiency

along with a number of colleagues. We opted for the “electronic symposium” option this year, which is a slightly confusing description. Instead of presenting or reading a full paper, we submit papers in advance, which are posted online (the “electronic” part). At the conference, we each get a few minutes to re-summarize our work for the audience to ensure that everyone is up to speed, and then we have Q&A and discussion for most of the time.

The slides I’ll use on Saturday for summarizing the work are located in the presentation directory from the Github repository.

You can read the conference draft of our paper from the Github repository. Comments welcome; this will be submitted for publication after expansion and revisions in the next few months so any suggested improvements are greatly appreciated. We will likely incorporate several other data examples in the final version after getting permission from the folks who collected the data sets.

Next Steps for Classifying Seriations to Temporal Network Models

2016-03-22T00:00:00-07:00

Where Things Stand

Initial experiments are promising, when using the sorted Laplacian spectrum as the features for building a classifier model. Even with small samples, it seems to show the following:

We can tell lineage splitting from a complete network from a probabilistic nearest neighbor model
We can’t tell PNN models from each other given different shapes (aspect ratio) to the region

This comes from building a multi-class GB tree model from sc-1, sc-3, sc-4-nn, and sc-4-nn, and predicting the data generating model from a 10% holdout set.

The classifier results hold pretty steady in a qualitative sets regardless of the random train/test split.

What doesn’t hold steady is the prediction and class probabilities for the PFG continuity graph. I get different answers depending upon the train/test split, which is probably a function of:

Insufficient diversity in the network models used in simulation – I need many examples of each network model
Sample size overall

Given that there isn’t much overlap in the overall classification itself, my guess is that if we could look at this in the 10 dimensional space of the eigenvalues used, we would see that:

The PFG seriation is actually not deeply embeeded in the convex hull of points for any of the classes, but is near the edge of several or even all
That the decision boundaries shift a lot with respect to the area where the PFG seriation is located

Given this, a different train/test split could shift a decision boundary very slightly, without having a major impact on the overall confusion matrix among models, and thus change the predicted assignment for the PFG sample.

We might be able to visualize something like the above by using a dimensionality reduction technique and mapping the models against say the first 3 principal components, and then putting PFG on the map. Worth a try.

Computational Next Steps

But removing this issue and getting stable predictions for PFG is going to be a function of:

More classes of network models, since PFG might not really be well described by any of the existing ones
More network models per class
More samples of simulations per network model (or model class)

While I develop more network models, I will probably start doing the second and third for the existing four models, but with PNN models collapsed down to a single model. I don’t have the formal infrastructure yet for doing multiple realizations of a single network model, so that’s the first step.

EVoS Seminar Series Talk at Binghamton University

2016-03-22T00:00:00-07:00

Yesterday I was out in Binghamton, giving a talk in the Evolutionary Studies Program seminar series. Here is a link to the talk flyer.

I’ve posted the slides (with speaker notes) on Figshare: https://dx.doi.org/10.6084/m9.figshare.3121420.

Here’s the abstract for the talk:

Evolutionary modeling of cultural transmission and cultural change has grown over the past 25 years from a handful of biologist and social scientists, to a major interdisciplinary program that involves research in every social science, cognitive and computer science, biologists, and even physicists. Formal models of cultural transmission and evolution have proliferated, but major challenges exist in testing transmission models against real world data. Difficulties exist even when we have individual-level observations. The challenge is even more profound when the only data we have on a cultural or economic phenomenon come in aggregate form: where our observations refer to groups of people, blocks of time, or both. After examining how aggregated data foil our efforts at inferring the parameters of evolutionary models or accurately choosing between models, I advocate for matching aggregate data with higher level models and research questions. I demonstrate how aggregate data from cultural transmission simulations can accurately discriminate between macroevolutionary transmission models, giving us the ability to understand large scale transmission phenomena even while micro scale causation remains obscure.

VIDEO

Update: The video and slides are now available online!

Intentionality and Cultural Evolution - Towards a Generalized Learning Theory Account

2016-03-03T00:00:00-08:00

(the following is a continuation and completion of a post begun in 2013, stimulated by a recent article by D.S. Wilson)

Among those who object to framing cultural evolution as a Darwinian theory, one of the most important reasons for objection is the evident importance of intentionality in human behavior (and among many animal species). Liane Gabora has perhaps been one of the most persistent advocates that while culture evolves, it does so by mechanisms other than natural selection, since natural selection requires variation to be “random” (Gabora 2013).

Generation of Variation in Darwinian Processes

Or does it? Lipo and I argued in response that most modern commentators are overinterpreting the term “random” in the core Darwinian paradigm: that the generation of variation is merely causally unprivileged with respect to the differential persistence of that variation (Madsen and Lipo 2013). David Sloan Wilson, in a recent and very clear article on intentional cultural change (Wilson 2016), makes the same point very crisply:

In the standard portrayal of genetic evolution, mutations occur that are arbitrary with respect to their consequences for survival and reproduction (fitness). Those that enhance fitness increase in frequency until they become species-typical. The word ‘arbitrary’ rather than ‘random’ in the previous sentence is deliberate. If a mutation is random, then it results in a new phenotype that deviates from the previous phenotype in any direction with equal probability. The standard portrayal of genetic evolution does not assume that mutations are random in this sense. Instead, the assumption is that mutations do not anticipate the phenotypes that are favored by natural selection. This is the meaning of the word ‘arbitrary’.

In fact, we can look at variation-generating mechanisms along several dimensions, as shown in this diagram. Variation may be “undirected” or “directed.” Directed variations are innovations or errors that occur when a variation-generation mechanism targets specific components of the genome or cultural repertoire. In contrast, undirected variations are errors or innovations which occur in some random component of the genome or cultural repertoire.

Classes of Variation

The classic example of undirected variation is the cosmic ray streaking through a cell nucleus, causing damage to DNA which results in “flipping” one or more nucleotides during the repair process. This is the archetype for what nearly everyone means when they talk about “blind variation” in Darwinian evolution. In cultural evolution, a person copying an artifact will make motor and perceptual errors in copying, which are often random with respect to which physical dimension they occur upon (Eerkens and Lipo 2005).

But we now also understand that there are many mechanisms whereby variation can be generated in a “directed” fashion. Environmental stress seems to have a variety of effects on the rate and location of mutations in the genome, and there are a variety of epigenetic mechanisms whereby temporary changes in gene expression can be inherited by the next generation, and thus permanently affect a lineage.

The important thing to understand about such “directed” mechanisms for generating variation is that while they are targeted, they are unprivileged with respect to knowing how selection will ultimately filter the results of their generation. Causal arrows run one direction, and variation is always generated before selection affects the frequency of variants. Genomic mechanisms which increase the mutation rate in selected regions of the genome are certainly the products of past selection, but there is no causal arrow which gives them information about how the results of their action will fare in future survival and reproduction.

And this, really, points to the way out of the issue. There is simply no requirement that variation be “random” with respect to…anything. The generation of variation is simply causally uncoupled, or unprivileged, from the “judging” of its fitness or utility down the road. The “two step process” of Darwinian evolution is defined by that uncoupling.

Intentionality

Which leads us to the thorny issue of “intentional” behavior in a mechanistic, Darwinian theory. My mentor, Robert Dunnell, was vehemently opposed to the inclusion of intentionality in any scientific theory of cultural change, for a variety of reasons which were correct. Mostly, the objection to intentionality is the causal role it plays in most social sciences, short circuiting the “two step” process and asserting that change is often a “one step” process whereby people perceive a problem, choose the best solution to deal with it, and implement that solution. Theories built on this kind of logic include variations of rational actor models in economics, overly simplistic versions of adaptationist evolutionary ecology, public choice theory, and of course a long list of unilinear, progressivist, vitalist, and Lamarckian models of cultural change in anthropology. All of which have been instrumental at various times in preventing us from constructing testable, scientific accounts of cultural change.

Which presents us with a seeming conundrum. Clearly, humans and many animals exhibit behavior which is “intentional,” and we and other species evolved this capability over a broad span of time (given its taxonomic breadth) by natural selection acting upon variation generated by various directed and undirected mechanisms. But equally clearly, intentional behavior cannot be a “shortcut” around the two step process of variation and selection, since like all other variation, our intentions and strategic planning are causally prior to the outcome of their expression, and often long prior to their downstream affects on our reproductive success, survival, and our ability to spread our ideas and cultural norms.

In his recent article, David Sloan Wilson (Wilson 2016) discussed a number of mechanisms by which intentional behavior has probably evolved in humans and other lineages. The adaptive nature of the vertebrate immune system, for example, is a paradigm example of an adaptive, open-ended process; it is, in fact, a selection process within a selection process. Wilson also describes operant conditioning and explicit decision-making.

It is worth trying to give a general account of such mechanisms, however, because it may be possible to unify many kinds of “directed” variation mechanisms, including those involved in intentional behavior, and understand what they share in common. The natural framework for unifying such examples is statistical learning theory, which attempts to describe a generalized framework whereby accurate models can be inferred by exposure (in some fashion) to data (M. J. Kearns and Vazirani 1994).

It is becoming somewhat fashionable to make connections between learning theory and directed mechanisms in evolution (for example, see (Power et al. 2015) ) after the recent book by Leslie Valiant, one of the founders of formalized statistical learning theory (Valiant 2013). Valiant’s model, PAC learning, provides a broad guarantee that we can build a statistical model capable of discriminating instances of a target distribution (or “concept” in machine learning). The nature of that guarantee is that, with enough exposure to training samples, we can select a hypothesis with low generalization error (the “approximately correct” part), with high probability (the “probably” part). PAC learning formally underlies many, but not all, “supervised” learning methods in statistics and machine learning, including much regression modeling and various classification and pattern recognition methods.

Intentional Behavior and Learning Theories

But PAC learning is not a “universal” learning theory, as Valiant notes. The basic PAC learning model applies to situations where:

The learner receives access to data, in the form of measurements of some number of features or covariates, and a “label” indicating to which model or target class that set of covariates belongs
The label given can be taken as accurate, and not associated with noise or error
The learner does not direct the generation of the data, but accepts labeled examples as given

As Valiant describes in his book, this kind of process doesn’t directly underlie most examples in genetic evolution. PAC learning, self-evidently, does underlie some types of cultural learning, since of course it (and the statistical algorithms that implement it) are cultural constructions by humans, to aid in understanding complex aspects of their environment.

But the framework is too restrictive to cover all learning in cultural contexts, and thus most of the ways in which humans formulate intentions for action on the basis of information gathered. In fact, each of the restrictions above can be relaxed, and in doing so, result in different learning models.

In discussing the relation between learning theory and biological evolution, Valiant correctly focuses upon relaxing the first requirement: that the learner see the detailed data. One way to frame biological fitness within learning theory is to treat classes of genomes as “queries” that populations make into the environment, which responds with summary data: average length of survival, average number of offspring for individuals with that class of genome. The population evolves by aggregating this feedback in the form of differential persistence of seemingly successful phenotypes.

The leaning framework just described is a modification of PAC learning by Kearns called “statistical query learning” (M. Kearns 1998), and while the interpretation of fitness as statistical queries against the environment might sound like a bit of a stretch, there are many behavioral contexts which fit such a model nicely. Individuals learning a skill, for example, might make attempts and observe the results, and modify their next attempt accordingly. Rarely will individuals have detailed information about the various contributing factors leading to the outcomes, especially in a complex activity such as hunting or making stone tools. Moreover, the same actions and tools may lead to differing outcomes on different trials, leading to only summary information about the outcome of an action or tool on average.

But humans do more than simply learn by aggregating data; we can guide the process of data collection and learning to improve performance. In situations where we can make trials, and then consult an “expert” for feedback, we can tune our next trial based upon the feedback, and repeat the loop. Much of human learning and the entire “apprenticeship” model for learning complex skills is based on this kind of model. In formal terms, this is “active learning,” which is the subject of quite active study within machine learning and statistics (Jamieson, Jain, and Fernandez 2015). Of particular interest is a recent NIPS conference paper which studied active learning when learners have access to both “strong” and “weak” labelers. A strong labeler is very accurate at providing feedback, but expensive to consult, weak labelers are cheap to consult but are occasionally inaccurate. Examples of each in a human context might be asking the master craftsman for feedback rather than students slightly more advanced than oneself, or hiring a skilled attorney rather than Googling for an answer. Zhang and colleagues find, of course, that there are situations where consulting a mix of strong and weak experts can result in highly accurate learning (Zhang and Chaudhuri 2015). This should be unsurprising, because the essential features of multiple-oracle active learning are in play in most human learning environments (e.g., schools, medical residency programs, craft apprenticeships, legal associate programs, and so on).

Furthermore, learning models must account for structured information. The learning task is rarely simply to recognize or predict a single type of distribution or concept, but instead is multi-stage, with stages building upon one another. Real knowledge has prerequisites, and structure, and we can easily fail to learn a skill if we do not yet have a solid grounding in the information and skills that come before. This will have marked effects on our cultural transmission models, and require close collaboration with learning theorists, but should yield much richer analyses of technological change in particular (Madsen and Lipo 2015).

Finally, only occasionally do we learn in a focused way where we are regularly getting labeled feedback, from whatever source. Much of our learning about the world comes in a combination of data points, some of which come with feedback, and much of which doesn’t. Such situations fall within so-called “semi-supervised” learning. Or, we get batched feedback, where we get a single evaluation for a number of different trials which may vary in subtle ways (Settles, Craven, and Ray 2008). Finally, we often learn not by exploring an entire space of possibilities, but by actively looking for the most “informative” or representative portions of that space of examples (Huang, Jin, and Zhou 2010).

Discussion

The basic point is that many variations on statistical learning models will be applicable in understanding how humans learn, both from the environment and by cultural transmission and teaching. Note, however, that in none of the models does the learner understand the ultimate consequences of incremental steps. Even in active learning models where the learner can take past knowledge to query for specific and informative data samples to guide future action, the progress in accuracy is local and stochastic; overfitting is still a concern and it is still impossible to know ahead of time what the ultimate “generalization error” of one’s strategies will be.

Cultural evolution is, undoubtedly, a mixture of intentional and unintentional processes, of unthinkingly copying someone else but also deeply studying carefully chosen mentors and experts. There is room in our theories of cultural evolution for pure diffusion processes that look very much like epidemiological or simple population genetic models, and processes that draw deeply from cognitive science, childhood development, and rich ethnography for their details. There is even room for the subtle combinatorial cognitive processes that lead to “creativity” and true invention.

All of these, and more, are understandable without exiting the Darwinian paradigm, and various theories of statistical learning promise to play a large role in extending Darwinian evolution to those intentional processes. But intentionality, like creativity, are not a sign that natural selection cannot act on culture, or that culture is not Darwinian. The latter is a continuing misconception that largely stems from overinterpreting what “random variation” means in the evolutionary context.

References Cited

Eerkens, J.W., and C.P. Lipo. 2005. “Cultural Transmission, Copying Errors, and the Generation of Variation in Material Culture and the Archaeological Record.” Journal of Anthropological Archaeology 24 (4). Elsevier: 316–34.

Gabora, Liane. 2013. “An Evolutionary Framework for Cultural Change: Selectionism Versus Communal Exchange.” Physics of Life Reviews 10 (2): 117–45. doi:10.1016/j.plrev.2013.03.006.

Huang, S J, R Jin, and Z H Zhou. 2010. “Active learning by querying informative and representative examples.” Advances in Neural Information …. http://papers.nips.cc/paper/4176-active-learning-by-querying-informative-and-representative-examples.

Jamieson, K G, L Jain, and C Fernandez. 2015. “NEXT: A System for Real-World Development, Evaluation, and Application of Active Learning.” Advances in Neural …. http://papers.nips.cc/paper/5868-next-a-system-for-real-world-development-evaluation-and-application-of-active-learning.

Kearns, Michael. 1998. “Efficient Noise-Tolerant Learning from Statistical Queries.” J. ACM 45 (6). New York, NY, USA: ACM: 983–1006. doi:10.1145/293347.293351.

Kearns, Michael J, and Umesh Virkumar Vazirani. 1994. An Introduction to Computational Learning Theory. MIT press.

Madsen, Mark E., and Carl P. Lipo. 2013. “Saving Culture from Selection: Comment on an Evolutionary Framework for Cultural Change: Selectionism Versus Communal Exchange, by L. Gabora.” Physics of Life Reviews 10 (2): 149–50. doi:10.1016/j.plrev.2013.03.008.

———. 2015. “Behavioral Modernity and the Cultural Transmission of Structured Information: The Semantic Axelrod Model.” In Learning Strategies and Cultural Evolution During the Palaeolithic, edited by Alex Mesoudi and Kenichi Aoki, 67–83. Replacement of Neanderthals by Modern Humans Series. Springer Japan. doi:10.1007/978-4-431-55363-2_6.

Power, Daniel A, Richard A Watson, E rs Szathm ry, Rob Mills, Simon T Powers, C Patrick Doncaster, and Blazej Czapp. 2015. “What can ecosystems learn? Expanding evolutionary ecology with learning theory.” Biology Direct, December. Biology Direct, 1–24. doi:10.1186/s13062-015-0094-1.

Settles, B, M Craven, and S Ray. 2008. “Multiple-instance active learning.” Advances in Neural Information …. http://papers.nips.cc/paper/3252-multiple-instance-active-learning.

Valiant, Leslie. 2013. Probably Approximately Correct: Nature’s Algorithms for Learning and Prospering in a Complex World. Basic Books.

Wilson, David Sloan. 2016. “Intentional cultural change.” Current Opinion in Psychology 8 (April): 190–93. doi:10.1016/j.copsyc.2015.12.012.

Zhang, C, and K Chaudhuri. 2015. “Active learning from weak and strong labelers.” Advances in Neural Information Processing …. http://papers.nips.cc/paper/5988-active-learning-from-weak-and-strong-labelers.

Limits of model resolution for seriation classification

2016-02-22T00:00:00-08:00

Model Resolution and Equifinality

Experiment sc-2 was designed to examine the opposite question as sc-1; that is, when do we lose the ability to distinguish between regional interaction models by examining the structure of seriations from cultural traits transmitted through those interaction networks? This is a question of equifinality of models: do different models have empirical consequences which are indistinguishable given a particular observation technique?

To test this, I set up four models which I believe to be very “close” to each other:

Lineage splitting where 1 lineage turns into 2 lineages, the split occurring 30% of the way through the time sequence (“early split”)
Lineage splitting where 1 \(\rightarrow\) 2 lineages, split occurring 70% of the way through the time sequence (“late split”)
Lineage coalescence where 2 lineages turn into a single linage, the event occurring 30% of the way through the sequence (“early coalescence”)
Lineage coalescence where 2 \(\rightarrow\) 1 lineages, split occuring 70% of the way through the time sequence (“late coalescence”)

In all other respects, simulation of cultural transmission across these regional networks was identical, using the same prior distributions for innovation and migration rates, population sizes, and so on.

My expectation going in was that the lineage splitting and coalescence models should generate seriations which are almost indistinguishable from one another, except for their temporal orientation, and with the paired early/late comparisons, it may be difficult to tell any of these models from one another without additional feature information. In particular, I expected roughly chance performance on classification unless we could provide temporal orientation, and even then, we may only be able to tell coalescence from splitting models.

Initial SC-2 Analysis

The analysis of sc-2 followed the method used in the second trial of sc-1, calculating the Laplacian eigenvalue spectrum of the final seriation solution graphs (specifically, the minmax-by-weight solutions for continuity seriation), and using the sorted eigenvalues as features for a gradient boosted classifier.

The initial results seem indicative of real trouble telling these models apart. For guidance, class labels are as follows:

Early splitting
Early coalescence
Late split
Late coalescence

Given a hold-out test set, we see the following performance:


          predicted 0  predicted 1  predicted 2  predicted 3
actual 0            3            2            0            0
actual 1            1            1            0            0
actual 2            0            0            2            2
actual 3            0            0            4            5
             precision    recall  f1-score   support

          0       0.75      0.60      0.67         5
          1       0.33      0.50      0.40         2
          2       0.33      0.50      0.40         4
          3       0.71      0.56      0.63         9

avg / total       0.61      0.55      0.57        20

Accuracy on test: 0.550

The overall accuracy is low, but the pattern of misclassifications is key here. We never see “early” models misclassified as “late” models, but we do see splitting misclassified as coalescence (possibly because we have no orienting information). So while overall accuracy is low, we actually have perfect discrimination along one dimension of the models: when events that alter lineage structure occur, in relative terms. Not bad, considering how “close” in structure these regional interaction models are.

Optimizing Classification Performance

The above was conducted with “reasonable” hyperparameters for the gradient boosted classifier, but I want to understand our best performance in separating these models. This is accomplished by setting the hyperparameters through cross-validation. In this case, I used a grid search of the following variables and parameters for the learning rate penalty, and the number of boosting rounds (number of estimators):

     'clf__learning_rate': [5.0,2.0,1.0, 0.75, 0.5, 0.25, 0.1, 0.05, 0.01, 0.005],
     'clf__n_estimators': [10,25,50,100,250,500,1000]

Using 5-fold cross validation, this produced 350 different fits of the classifer, with the following results:

Best score: 0.593
Best parameters:
param: clf__learning_rate: 1.0
param: clf__n_estimators: 50

Some improvement is seen on overall training set accuracy, but the real surprise is test performance on the hold-out data, using the optimal hyperparameters:

          predicted 0  predicted 1  predicted 2  predicted 3
actual 0            3            2            0            0
actual 1            1            1            0            0
actual 2            0            0            3            1
actual 3            0            0            3            6
             precision    recall  f1-score   support

          0       0.75      0.60      0.67         5
          1       0.33      0.50      0.40         2
          2       0.50      0.75      0.60         4
          3       0.86      0.67      0.75         9

avg / total       0.71      0.65      0.66        20

Accuracy on test: 0.650

Overall accuracy is greatly improved, which is unusual (normally I would expect test accuracy to be less than training accuracy, but the test set is small). But we can see that we improved mainly because of our ability to predict classes 2 and 3, although the overall pattern is still the same: we have misclassification within early and within late, but perfect discrimination between the two.

Summary

Given how close these models were, I expected to have great trouble in identifying them from seriations. What I found is that I can identify part of the model class with great accuracy, and that the other modeling dimension (coalescence/splitting) with much less accuracy. This leads me to suspect that I could predict both dimensions with very high accuracy if I were to find a way to encode temporal orientation as a feature.

I will pursue this, since we often have at least some information on temporal orientation, perhaps by knowing that one assemblage in a set is much earlier or later than the rest. The challenge is finding a way to provide this kind of hint for the synthetic seriation graphs. More soon on this.

Resources

Full analysis notebook on NBViewer, from the Github repository.

Github Repository: experiment-seriation-classification

References Cited

Feature Engineering for Seriation Classification

2016-02-16T00:00:00-08:00

Feature Engineering

In my previous note, I used the graph spectral distance (i.e., the euclidean distance between Laplacian eigenvalue spectra from two seriation solutions) in a kNN classifer to predict which regional network model generated a seriation graph. This achieved accuracy around 80% with 3 nearest neighbors.

Doing better meant changing approaches, and giving the classifier a larger space within which to draw decision boundaries. My first thought was to not reduce the Laplacian spectrum to distances, but instead of use the spectra themselves as numeric features. This would require that, say, column 1 represented the largest eigenvalue in each graph’s spectrum, column 2 the second largest, etc, which is easily accomplished.

The resulting feature matrix is then suitable for any classifier algorithm. I chose gradient boosted trees because of their high accuracy (essentially equivalent to random forests or better in most applications), and without any hyperparameter tuning at all, achieve anywhere from 85% to 100% accuracy depending upon the train/test split (it’s a small sample size). Optimizing hyperparameters improves this and I can get 100% pretty often with different train test splits.

So this might be the standard method for seriation classification for the moment. The good thing is that it lends itself to direct interpretation as an ABC (approximate Bayesian computation) estimator, as described in (Pudlo et al. 2014), especially if I actually use random forests (although I’m not sure the random forest bit is terribly important).

Implementation Details

The following code snippet takes a list of NetworkX graph objects, and returns a Numpy matrix with a chosen number of eigenvalues (it isn’t clear how many are relevant):

def graphs_to_eigenvalue_matrix(graph_list, num_eigenvalues = None):
    """
    Given a list of NetworkX graphs, returns a numeric matrix where rows represent graphs, 
    and columns represent the reverse sorted eigenvalues of the Laplacian matrix for each graph,
    possibly trimmed to only use the num_eigenvalues largest values.  If num_eigenvalues is 
    unspecified, all eigenvalues are used.
    """
    # peek at the first graph and see how many eigenvalues there are
    tg = graph_list[0]
    n = len(nx.spectrum.laplacian_spectrum(tg, weight=None))
    
    # we either use all of the eigenvalues, or we use the smaller of
    # the requested number or the actual number (if it is smaller than requested)
    if num_eigenvalues is None:
        ev_used = n
    else:
        ev_used = min(n, num_eigenvalues)

    print "(debug) eigenvalues - test graph: %s num_eigenvalues: %s ev_used: %s" % (n, num_eigenvalues, ev_used)
    
    data_mat = np.zeros((len(graph_list),ev_used))
    #print "data matrix shape: ", data_mat.shape
    
    for ix in range(0, len(graph_list)):
        spectrum = sorted(nx.spectrum.laplacian_spectrum(graph_list[ix], weight=None), reverse=True)
        data_mat[ix,:] = spectrum[0:ev_used]
        
    return data_mat

Resources

Full analysis notebook on NBViewer, from the Github repository.

Github Repository: experiment-seriation-classification

References Cited

Pudlo, Pierre, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, and Christian P Robert. 2014. “ABC Model Choice via Random Forests.” ArXiv Preprint ArXiv:1406.6288.

Lab Notebook for Mark E. Madsen

The Real Core of the Scientific Method and Why We Should Trust It

The Key Role of Cooperation in Evolution and Political Economy

Human Behavior and Evolution Society 2016 Talks

Research Priorities for 2016

Contents

Seriation and Temporal Network Models

Network Models for SeriationCT

Likelihood of Empirical Seriations

Relation Between Seriation and Cladistics

Continuity HBES Paper

SeriationCT HBES Paper

Future Questions

References Cited

Presentation and Paper for SAA 2016 - Measuring Cultural Relatedness Using Multiple Seriation Ordering Algorithms

Next Steps for Classifying Seriations to Temporal Network Models

Where Things Stand

Computational Next Steps

EVoS Seminar Series Talk at Binghamton University

VIDEO

Intentionality and Cultural Evolution - Towards a Generalized Learning Theory Account

Generation of Variation in Darwinian Processes

Intentionality

Intentional Behavior and Learning Theories

Discussion

References Cited

Limits of model resolution for seriation classification

Model Resolution and Equifinality

Initial SC-2 Analysis

Optimizing Classification Performance

Summary

Resources

References Cited

Feature Engineering for Seriation Classification

Feature Engineering

Implementation Details

Resources

References Cited