The results on my amazingly simple test function () are not great so far, though. Some runs produce a good result, while others get stuck in bad local optima:
best program so far with fitness 4.7958315233127 (integer.dup integer.+)
...
best program so far with fitness 71.917763204878 (integer.dup 231 298 integer./ integer./ integer.+)
...
best program so far with fitness 772.44948974278 (true)
Some of this may be due to using parsimony pressure, which can prioritise suboptimal solutions that are shorter in length. I'll have to read the tome on GP that's sitting upstairs.
Also, I've only implemented very simplistic mutation so far; maybe crossover will work better?
Even for mutation, selecting from the Push 3.0 instruction set with a uniform probability might be bad, since there are so many weird EXEC and CODE instructions, which are probably less likely to be useful than the simple arithmetic and stack instructions like FLOAT.* and INTEGER.DUP.