I've been replicating across some of the functionality of my TD(lambda) bridge player project into Java to test in Tyrrell's world. Last night, I fixed a kind of obvious bug that had slipped me for a week or so and made it work hopefully properly, at least for the TD(0) case.
The performance in Tyrrell's world wasn't all that great though, so I started decreasing the softmax temperature parameter to make it a bit more greedy... here's an interesting result:
with temperature ~=0.8:
Steps: 255 Sexed: 0
Steps: 11 Sexed: 0
Steps: 263 Sexed: 0
Steps: 274 Sexed: 0
Steps: 211 Sexed: 0
Steps: 257 Sexed: 0
Steps: 253 Sexed: 0
Steps: 251 Sexed: 0
Steps: 254 Sexed: 0
Steps: 307 Sexed: 0
Steps: 428 Sexed: 0
with temperature=0.14:
Steps: 391 Sexed: 0
Steps: 392 Sexed: 0
Steps: 393 Sexed: 0
Steps: 393 Sexed: 0
Steps: 392 Sexed: 0
Steps: 391 Sexed: 0
Steps: 394 Sexed: 0
Steps: 391 Sexed: 0
Steps: 394 Sexed: 0
Steps: 391 Sexed: 0
Steps: 392 Sexed: 0
Steps: 393 Sexed: 0
I'd have thought survivability was random if you did practically nothing, but I guess not? Anyway, that's obviously a bit too low.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment