Wednesday, April 25, 2007

boltzmann in action selection

I've been replicating across some of the functionality of my TD(lambda) bridge player project into Java to test in Tyrrell's world. Last night, I fixed a kind of obvious bug that had slipped me for a week or so and made it work hopefully properly, at least for the TD(0) case.

The performance in Tyrrell's world wasn't all that great though, so I started decreasing the softmax temperature parameter to make it a bit more greedy... here's an interesting result:

with temperature ~=0.8:
Steps: 255 Sexed: 0
Steps: 11 Sexed: 0
Steps: 263 Sexed: 0
Steps: 274 Sexed: 0
Steps: 211 Sexed: 0
Steps: 257 Sexed: 0
Steps: 253 Sexed: 0
Steps: 251 Sexed: 0
Steps: 254 Sexed: 0
Steps: 307 Sexed: 0
Steps: 428 Sexed: 0

with temperature=0.14:
Steps: 391 Sexed: 0
Steps: 392 Sexed: 0
Steps: 393 Sexed: 0
Steps: 393 Sexed: 0
Steps: 392 Sexed: 0
Steps: 391 Sexed: 0
Steps: 394 Sexed: 0
Steps: 391 Sexed: 0
Steps: 394 Sexed: 0
Steps: 391 Sexed: 0
Steps: 392 Sexed: 0
Steps: 393 Sexed: 0

I'd have thought survivability was random if you did practically nothing, but I guess not? Anyway, that's obviously a bit too low.

Monday, April 23, 2007

MiniJavaBackend

Well, that was one hoor of an assignment. We were basically given a month's extension - I had it practically finished except for the extra CALL tile handling - function prologue (callee-save and copying formals) and epilogue (restoring callee-saves and setting up return value) when it was originally due, and left it on hold until a few days ago.

Actually getting CALL working even half-properly was a total headwrecker. The documentation was shit, and there are still unexplained issues that I don't understand - for one, incoming parameters get copied into temporaries that are never referenced... instead, what look like older temporaries are used. Sob.

Monday, April 16, 2007

czt -> community Z tools

Here's a tip for anyone who needs to put a Z specification together quickly.

Never, ever, ever, ever, EVER use CZT. It is the most frustrating, dog-slow piece of shit software package I have used in my life, without exaggeration. With jEdit it's just unstable and crashy enough to warn the user away immediately.

As a plugin for Eclipse, it starts out nice, then after about one page of simple Z schemas, it parses the buffer repeatedly on every single keystroke, even if you tell it not to in the configuration page. It throws syntax errors on things that look right according to any examples of Z I have ((i,j) of a natural x natural type referenced in an axiomatic definition).

It started out in a promising fashion, and the output through latex looks nice, but now I'm in a position where I don't understand the errors it's giving me (after the unbelievable 3 minute delay every time I make a change (ie: hit some keys)) so I can't even convert what I have to tex and work on that directly - I have to delete all of the code that causes errors as quickly as possible so it will parse the buffer successfully.

Never, ever touch this stinking pile of pain. It's just not worth it.

Saturday, April 14, 2007

reinforcement learning with function approximation

This is much harder than I thought. I've got a series of tests (not very modular at all - they're split into only two source files and cover test-first unit tests and empirical performance measures) which examine the components of the system in some way... the neural network seems to be functioning reasonably, although it's less accurate than I'd expected for pure numeric output, unless I format the output as a binary string again... annoying.

Not sure now whether to have the TD net represent its estimated values like this or keep it as an expanded sigmoid for now.

Here's its performance just now on a 30-room random walk test - there are two actions: 0 is left and 1 is right. All rewards are 0 except when entering the terminal state (furthest room to the right) which gives a reward of +100. I kind of expected much better performance ages ago, but I guess it's harder to get these systems right than I thought. Although it completes the task in 15 moves a few times (the minimum amount), it often goes wrong.
learningRate: 0.5
steps taken on run 0: 697556
action 0 chosen: 91.2195%, action 1 chosen 8.78051%
steps taken on run 1: 67
action 0 chosen: 38.806%, action 1 chosen 61.194%
steps taken on run 2: 99
action 0 chosen: 42.4242%, action 1 chosen 57.5758%
steps taken on run 3: 121
action 0 chosen: 43.8017%, action 1 chosen 56.1983%
steps taken on run 4: 115
action 0 chosen: 43.4783%, action 1 chosen 56.5217%
steps taken on run 5: 27
action 0 chosen: 22.2222%, action 1 chosen 77.7778%
steps taken on run 6: 15
action 0 chosen: 0%, action 1 chosen 100%
steps taken on run 7: 35
action 0 chosen: 28.5714%, action 1 chosen 71.4286%
steps taken on run 8: 19
action 0 chosen: 10.5263%, action 1 chosen 89.4737%
steps taken on run 9: 17
action 0 chosen: 5.88235%, action 1 chosen 94.1176%
steps taken on run 10: 17
action 0 chosen: 5.88235%, action 1 chosen 94.1176%
steps taken on run 11: 15
action 0 chosen: 0%, action 1 chosen 100%
steps taken on run 12: 15
action 0 chosen: 0%, action 1 chosen 100%

Sometimes it gets stuck in some weird state (local optima?) and never really learns anything. At this point, the action selection policy is just the action with the best estimated value unless the two actions appear worth the same, in which case it's a 50/50 tossup, or the epsilon-greedy function chooses a random action with probability 0.08.

I'm thinking the next step is to implement a softmax action selection method and maybe another test.. or, I dunno, actually carry on with the functional requirements of the thing. Time runneth short.

Tuesday, April 10, 2007

Bin Ankh

This was on the pavement on Gardiner Street, near my apartment in stabsville, Dublin 1:


I wonder how it came to be...
And while I'm at it, here's a nice one from DCU. Nice!

Japan, America and pacifism

Tangentially, tacked on to this post about consumer robotics in Japan is a very interesting discussion on the degree of pacifism in Japan since WWII, movements (or lack thereof) of American politics over the last 50 years and wherever the rambling trail leads. Read it!

Sunday, April 08, 2007

software for truly lame people: WhiteSmoke

I got an ad from FreshDevices in my email today, and instead of deleting it like 99.9% of the other ones, I had a look and saw mention of a program called WhiteSmoke.

The ad text read:
Write like an English pro using this software. With full-text analysis solution, convert simple English sentences to become more sophisticated automatically.

This premise raised my ire pre-emptively, since if the "simple" English sentences communicate the correct meaning, surely making them artificially "more sophisticated" is either distorting the meaning or at best, fluffing out concise text with superfluous padding. In a sense, de-optimising the text.

So I had a look at the site and found pretty much what I expected.

Before:
He was very interested in our product line, however, he has some issues with our priceing.

After:
He was very interested in our innovative product line, however, he has some significant issues with our pricing.

How does it know whether the product line is innovative, and is it even relevant? Obviously the assumption is that you would always refer to your products as "innovative" whether it is or not, but this just makes the word optional and eliminatable (okay, that's probably not a word, but it should be...).

Ignoring the fact that it corrects the obvious typo in "pricing" (which should be done by another single-purpose tool anyway) it also adds the modifier "significant". This is all clearly in the name of making simple text sound more formal and "clever", but I have two issues with this:

a) People should be able to do it themselves without changing the meaning inappropriately
b) This clearly changes the meaning inappropriately.

If someone doesn't speak enough English to write in an academic tone, they can get by with what they have and use a dictionary/thesaurus to get the meaning across. And a native English speaker has absolutely no excuse to even consider using this thing. Using an automated tool to make your writing look "clever" is the very opposite of clever. It's akin to having a voice filter on your phone that transforms your speech into a different accent (Cockney Skank to marbles-in-the-mouth Rupert). The whole concept is utterly lame and anyone caught buying such a tool should be whipped. Whipped!

Monday, April 02, 2007

TD(lambda) and bridge

I've been doing another all-nighter working on my final year project - a neural-network based TD(λ) reinforcement learning bridge player... I say that, but really I've been hung up on implementing the bridge game model. It's as complex as I worried it would be - that is to say - fairly difficult. And I can't just go to sleep (unless it's for 2 hours... hmm....) as I'm due to demo something at 11:30am - 7 and a half hours from now. Eeehhhh, right. 真不好.... all I have so far is some confusing code and a limited-scope set of testcases. Sleep is sounding better and better!