This is a bucket list of ideas for future development and direction, and a log of more down to earth tasks such as code tidy ups, external library upgrades, etc.
Last reviewed / updated 2016-02-21
Investigate the relative merits of single versus double precision floats.
There is an open question regarding how much precision is required in neuro-evolution methods. For AI methods that learn weights by gradient following there is an argument to allow very small gradient to be followed, otherwise learning may stop. In NEAT a very similar question arises - what is the smallest useful weight change/delta when mutating a connection weight? And can that smallest level of precision be represented by single precision floats
A single precision float is 32 bits (4 bytes); double precision is 64bits (8 bytes). Therefore there is a potential speed improvement to be gained by using less precision, in terms of fitting more weights into CPU caches, efficient use of memory bandwidth and the ability to apply SIMD instructions to 2x as many weights in one operation (at time of writing 256 bit SIMD instructions are common and 512 bit instructions will be available soon, 512/32 = 16 floats)
A broader question might be whether the precision could be reduced further, given the existence of half precision floats
Fwiw, I seem to recall productively reducing at least some of the weights in Inkwell (Apple's handwriting recognition engine) to 8 bit logarithmic values (expanded on the fly via a small lookup table). We'd periodically discretize them this way during training so that over time the weights could compensate for eachothers' discretization errors (otherwise if you just lop blindly at the end the odds are too high of systematic shifts that add up over large numbers of weights). In general I think the more abstract the representation, the less granularity you need. That is, whether a pixel is brighter than its neighbor or not can require quite a precise measurement, and this can matter, as can precise averages over a large number of noisy pixels; but usually whether something is a dog or a tree, and the relative implications thereof, is relatively high contrast--it's not that borderline cases never exist, but just that they're increasingly rare as you ascend the hierarchy of abstraction. You also generally have far fewer examples of more abstract concepts, so there's often little basis for a high precision tally thereof.
The ReLU activation function is currently in widespread use in AI neural net models; it is considerably simpler to compute compared to the logistic sigmoid, which has been the default in SharpNEAT since its inception.
The simplicity of the ReLU computation will likely result in a significant performance improvement over the logistic sigmoid. Some investigation is required to evaluate the scale of the improvement (with some thought given to SIMD based code also), and what the implications are to the wider efficacy of neuro-evolution when using an activation with such a significant difference from the function that has formed the basis of most neuro-evolution research to date.
Leaky Relus is defined as y = (x if x>0, .01x if x<0), which is similar but possesses a nonzero gradient for negative values so that it can still learn.
SharpNEAT does not currently utilise any of the vector instruction available in modern CPUs. A key issue has been the lack of support for SIMD in the .NET framework. As of .NET 4.6 the new RyuJIT compiler will generate some SIMD instructions when in 64bit mode and when the .NET System.Numerics.Vector classes are in use. There is a limited set of SIMD functionailty being exposed via .NET 4.6, but it is worthy of investigation not least because it ought to be a relatively simple exercise to use the available Vector classes where possible, e.g. in the neural network classes.
To more fully utilise SIMD in modern CPUs (including the coming 512bit wide vector instructions due from Intel CPUs in 2016), it's necessary to run code outside of the .NET managed runtime. The most promising option here appears to be the C++/CLI language/environment, this provides an environment where C++ targeting the native CPU can be written, but interaction with the managed runtime is also possible.
Also worth noting at here is that in recent years server CPUs (e.g. Intel Xeons) have tended to use the increasing transistor count budget to increase core count, with up to 20 cores possible at time of writing. Whereas desktop CPUs have followed a different path of integrating a GPU and other systems previously living on dediciated chips/circuitry off CPU. At this time (2016) it's worth investigating the very significantly higher CPU resources available on server CPUs. This is perhaps not a point that is widely discussed because in the past CPU evolution was mostly on a single path, but now appears to have forked into at leat two two very different paths (server and consumer CPUs).
This idea relates to how we compare the relative merits of changes to the NEAT algorithm. Ultimately we want results in the form of neural networks that implement some desired functionaility, and we want to find these networks as fast as possible.
The idea then is to run NEAT on a some well defined problem domain, e.g. 6-Multiplexer, a large number of times, record the clock time to find the solution, and plot the distribution of clock times. If we now make some change to NEAT, such as changing mutation rates, or how speciation works, etc. we can compare the distributions before and after the change, and therefore have hard empirical evidence regarding the relative efficacy of a high level modification to the NEAT algorithm.
Note that it's possible to plot distributions of other variable, such as number of evaluations or generations, or even how many Joules of energy were consumed to find a solution (e.g. for different CPUs). The appraoch is broad enough for performing many different types of comparison, however, my focus is on the NEAT method and the SharpNEAT code base in particular. This is certainly a technique that applies to changes to the NEAT method, but also to low-level performance optimisations that don't change the NEAT method.
This idea relates to the SharpNEAT interfaces such IExperiment and INetwork. Instances of these interfaces are 'wired-up' to form a working experiment. Currently the object instantiation and wiring-up is performed by custom code within SharpNEAT. This custom code can be replaced with an IoC framework that hopefully simplifies the code, improves readability, and allows people working on the code to use and learn a re-usable skill and code library.
Some of the experiments shipped with SharpNEAT use the box2dx physics engine.
There is a detailed overview of situation regarding use of this 2D physics engine and the rendering library being used with it, at gihub.com/colgreen/box2dx. Ultimately it may be wise to switch to another 2D engine such as the Farseer Physics Engine, although there is also some debate about the status of that project.
The changes requried to make this integration work appear to be relatively modest, and therefore it's worth looking into whether the SharpNEAT core library classes could be re-organised slightly and Unity integration provided as part of the main SharpNEAT project.
Most code bases have room for improvement, and SharpNEAT is no exception. One area I would like to visit is the size of some of the classes in terms of lines of code (LOC) and cyclomatic complexity, and also the size of some methods/functions.
Over the last few years I've adopted a rough heuristic of keeping classes to less than 400 LOC (300 ideally), and methods substantially less than that. This isn't a hard rule, but generally larger classes are a good proxy for poor design choices and defects. So a revisit of the code with these ideas in mind is planned.
It's been a few years since I last performance profiled and tuned SharpNEAT. As such it's probably worth doing again now that a number of underlying factors have changed, such as the .NET CLR, JITter (RyuJIT), garbage collector, operating systems (e.g. Windows 10), CPUs and memory bandwidth, etc.
I have this vague idea that speciation by comparing genomes is possibly flawed in some significant ways; this fits the narrative around the novelty search research, and how following an objective function may not lead you to the desired objective.
For now the ideas under this heading are best covered by a number of fairly rambling blog posts, which I hope to condense into something more concrete at some future time...