Feature Engineering for Seriation Classification

 Created: 16 Feb 2016  Modified: 23 Jul 2020   BibTeX Entry   RIS Citation  Print

Feature Engineering

In my previous note, I used the graph spectral distance (i.e., the euclidean distance between Laplacian eigenvalue spectra from two seriation solutions) in a kNN classifer to predict which regional network model generated a seriation graph. This achieved accuracy around 80% with 3 nearest neighbors.

Doing better meant changing approaches, and giving the classifier a larger space within which to draw decision boundaries. My first thought was to not reduce the Laplacian spectrum to distances, but instead of use the spectra themselves as numeric features. This would require that, say, column 1 represented the largest eigenvalue in each graph’s spectrum, column 2 the second largest, etc, which is easily accomplished.

The resulting feature matrix is then suitable for any classifier algorithm. I chose gradient boosted trees because of their high accuracy (essentially equivalent to random forests or better in most applications), and without any hyperparameter tuning at all, achieve anywhere from 85% to 100% accuracy depending upon the train/test split (it’s a small sample size). Optimizing hyperparameters improves this and I can get 100% pretty often with different train test splits.

So this might be the standard method for seriation classification for the moment. The good thing is that it lends itself to direct interpretation as an ABC (approximate Bayesian computation) estimator, as described in (Pudlo et al. 2014), especially if I actually use random forests (although I’m not sure the random forest bit is terribly important).

Implementation Details

The following code snippet takes a list of NetworkX graph objects, and returns a Numpy matrix with a chosen number of eigenvalues (it isn’t clear how many are relevant):

Resources

Full analysis notebook on NBViewer, from the Github repository.

Github Repository: experiment-seriation-classification

References Cited

Pudlo, Pierre, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, and Christian P Robert. 2014. “ABC Model Choice via Random Forests.” arXiv Preprint arXiv:1406.6288.