Wednesday, April 25, 2007

boltzmann in action selection

I've been replicating across some of the functionality of my TD(lambda) bridge player project into Java to test in Tyrrell's world. Last night, I fixed a kind of obvious bug that had slipped me for a week or so and made it work hopefully properly, at least for the TD(0) case.

The performance in Tyrrell's world wasn't all that great though, so I started decreasing the softmax temperature parameter to make it a bit more greedy... here's an interesting result:

with temperature ~=0.8:
Steps: 255 Sexed: 0
Steps: 11 Sexed: 0
Steps: 263 Sexed: 0
Steps: 274 Sexed: 0
Steps: 211 Sexed: 0
Steps: 257 Sexed: 0
Steps: 253 Sexed: 0
Steps: 251 Sexed: 0
Steps: 254 Sexed: 0
Steps: 307 Sexed: 0
Steps: 428 Sexed: 0

with temperature=0.14:
Steps: 391 Sexed: 0
Steps: 392 Sexed: 0
Steps: 393 Sexed: 0
Steps: 393 Sexed: 0
Steps: 392 Sexed: 0
Steps: 391 Sexed: 0
Steps: 394 Sexed: 0
Steps: 391 Sexed: 0
Steps: 394 Sexed: 0
Steps: 391 Sexed: 0
Steps: 392 Sexed: 0
Steps: 393 Sexed: 0

I'd have thought survivability was random if you did practically nothing, but I guess not? Anyway, that's obviously a bit too low.

No comments:

Post a Comment