Tuesday, November 15, 2011

Google gets cabbage fronds both ways.

To follow up on an oldish post on the patchy quality of Google Translate, I tried the examples and found that they now translate very well.


Even doing a lengthy trip between very different languages produces quite good results, which is a good sign that the meaning behind text is being properly treated:

English "I like to eat cabbage fronds all day long." -> Japanese "私は長い一日キャベツの葉を食べるのが好き。"
Japanese  -> French "J'aime manger les feuilles de chou toute la journée."
French  -> Kannada "ನಾನು ಎಲೆಕೋಸು ಎಲ್ಲಾ ದಿನ ಎಲೆಗಳನ್ನು ತಿನ್ನಲು ಪ್ರೀತಿ."
Kannada  -> Hindi "मैं सारा दिन गोभी के पत्ते खाने से प्यार है."
Hindi  -> Irish "Is breá liom a ithe na duilleoga cabáiste an lá ar fad."
Irish  -> Korean "하루 종일 배추의 잎을 먹는 사랑 해요."
Korean  -> Polish "Uwielbiam jeść liście kapusty cały dzień."
Polish  -> Welsh "Rwyf wrth fy modd i fwyta'r dail bresych drwy'r dydd."
Welsh  -> English "I love to eat the leaves of cabbage all day."

So our 9-step purple monkey dishwasher train converted "I like to eat cabbage fronds all day long" into "I love to eat the leaves of cabbage all day".

There were a couple of languages (Arabic, Gujarati, Chinese) which mangled or lost some of the information (e.g. "cabbage fronds" into "cabbage" or "all day" into "every day"), but this is pretty impressive, and a tangible improvement in a relatively short time. Good work, lads!

Tuesday, October 18, 2011

Nonsmooth justice: the discontinuous mapping of crimes to sentences

From BBC news:
Jordan Blackshaw, 21, of Northwich, Cheshire, jailed for four years after admitting encouraging a riot on Facebook, which never happened
For comparison, if you were driving at 3 times the speed limit on the wrong side of the road, and you knocked someone down and left them in a braindead coma forever, the maximum sentence you could receive in the UK is 5 years. Until very recently it was 2 years.

This is somehow on a par with "encouraging a riot on Facebook, which never happened" which gained at least two people a 4 year sentence each.
Are you having a laugh? Because I'm not.

The situation is similar to that of a schoolteacher in the US who was sentenced to 200 years (!!) in prison without the possibility of parole, for possession of 20 images of child porn. Sure, collecting child porn is not a particularly nice thing to do, but again, there are murderers who will be sentenced and get out of jail before his sentence is finished. There are people who physically abuse, torture or actually kill children who could get out before him. Doesn't make much sense, does it?

Saturday, September 10, 2011

Google Translate's voice input: phones only?

So apparently Google Translate takes voice input now - although it seems you need Chrome if you're not doing it on an Android or iPhone. The privacy issue of Chrome being able to record from your microphone without your consent is... interesting.

If you do it on a PC*, it only allows you to give voice input in English. But if you do it via the iPhone/Android apps - even though they seem to use HTML5 anyway - you can give input in 15 different languages.

Google, hello? People use laptops and desktops as well - why are they restricted to English-only voice input? I couldn't find an answer to this, or a timeline for when the other languages will be available in the "vanilla" Google Translate.


* Is there a good and concise term for "non-smartphone computer"?

Thursday, August 25, 2011

Gradient-free data bars in Excel 2007?

Data bars are a quick and handy tool for making numbers more visually obvious in Excel. Mostly I just use them for progress bars on long ongoing writing tasks like my the... thes... I can't say the word for some reason, nevermind.

Anyway, the problem* is that Excel 2007 forces the data bars to have gradient fade to white as they reach the top. This makes them look a bit stupid and indistinct, with the apparent rationalisation that it makes it easier to read the numbers in that cell - which is pretty weak TBH, just choose the right colours, bold your text and it won't be a problem.

A ridiculous workaround to get fake-solid bars is to set the bar colour to white and change the background colour of the cell to something with nice contrast that doesn't melt your eyes. Like neon green or piss yellow. Blue works for me...

Anyone know a better way of doing this (other than not using Excel, or upgrading to Office 2010, since then I end up with documents that don't work properly in the ancient versions of Powerpoint installed on DCU machines)?



*Well, one of the problems. The others are that the default range values make almost no sense, and that values at the minimum of the range (or lower) still produces a bar, and that values over the top of the range don't entirely fill the bar. What...

Tuesday, August 16, 2011

Browser tab overload

After a couple of years of using the excellent Opera browser, I switched to Firefox after some crashes and extremely high memory usage. It turns out that Opera was doing pretty well given that I had some 200 tabs open, though... (yes, I find interesting articles quicker than I read them, and if mail myself the links or put them in a "to read" folder, well, they seem to not get read either).

Anyway, with addons and things, I've grown accustomed to Firefox now, and a recent-ish update which forces JS events from background events/timers to be processed at most once per second really helped with CPU usage.
However, it still struggles a bit with some 100 tabs open on this laptop, and actually crashes with an out-of-memory exception on my 32-bit Vista box from time to time. Not that it explains this or anything, it just vanishes and pops up a crash reporter which doesn't seem to explain the crash; I had to run Firefox under WinDbg to get a proper, source-line annotated stack trace.

One workaround for this is to just not be so lazy and to read articles immediately and follow links in a depth-first way, rather than the exponentially disastrous strategy of reading a Wikipedia article and clicking open 12 interesting links into background tabs, then doing the same for each of those tabs if I ever get that far.

Until I have the discipline to do that, though (i.e. never), I found a couple of Javascript bookmarklets are helpful.
So I stole a few bookmarklets to zap plugins, events and timers from here and taped them together into one "Zap all" snippet. Going through the roughly five million open and unread Lifehacker tabs and zapping off the Javascript machinery and other cruft managed to reduce the CPU usage to a respectable ~1% mostly. And pared the RAM usage from 1.5 gigs down to just under 1 gig on that Vista box.

Screw solving the underlying problem when you have bookmarklets!

Obsolescence update: A recent feature in Firefox (at least the Aurora 8.0x releases) allows you to set tabs to load lazily (i.e. a tab won't automatically load after opening Firefox, until you activate that tab)... this is perfect for messy users like myself who end up with more tabs than they can eat - it seems to work really well.

Friday, June 17, 2011

Closed windows from a crashed Firefox: Dig [out of] your own grave and save!

Well, I don't know what happened. The Windows box went down for a reboot due to the usual bundle of patches for Windows bugs. Somewhere along the line, Firefox managed to close the window with all my (way too many) tabs, and when I rebooted, it opened up on the Mozilla homepage. Burn.

No problem, going to about:sessionrestore displays a list of... empty. Oh.

Ok then. A quick look in the profile dir (%APPDATA%\Mozilla\Firefox\Profiles\*.default) shows that the sessionstore.js and backup are over 4 megs in size - that looks promising. So I copy them and quit Firefox.
Inside sessionstore.js is a gigantic JSON string. Inside that, I find a closed tab (the session restore tab which I had given up on), with apparently another full JSON string in the #sessionData field. After decoding that, I find an empty list of windows, and a list of _closedWindows including my main window with many many tabs. Yay, I can just swap them then.

Here's a little program in Lua which might help resurrect your tabs/windows if something similar happens to you. It uses the very fast dkjson library (the whole read/decode/futz/encode/write process for a 4.5mb file takes about 1.25 seconds on my run-of-the-mill desktop using LuaJIT).

In this case, there was an open window with one closed tab containing an about:sessionrestore form...
  json = require('dkjson')

  print('Reading file...')
  local session = assert(io.open('sessionstore.js', 'r'))
  local session_data = session:read('*all')
  print('Parsing ('..#session_data..' bytes)...')
  local obj, pos, err = json.decode(session_data, 1, json.null)
  if err then error(err) end
  print('Success! table size: '..#obj, pos, err)

  -- I closed the session restore tab since the list was empty.
  -- Turns out it still had the closed window data, so
  -- we can extract that. You may have to change this.
  local data = obj.windows[1]._closedTabs[1].state.
      entries[1].formdata['#sessionData']

  -- parse it, swap open/closed windows
  local data_p = json.decode(data, 1, json.null)
  print('Decoded, swapping closed and open windows')
  local old_windows = data_p.windows
  data_p.windows = data_p._closedWindows
  data_p._closedWindows = old_windows

  print('Encoding')
  local outf = assert(io.open('sessionstore-rec-fixed.js', 'w'))
  outf:write(json.encode(data_p))
  outf:close()

In this case, the open windows had suddenly become closed windows... (this just happened now, the hell Aurora?)
  json = require('dkjson')
  print('Reading file...')
  local session = assert(io.open('sessionstore.js', 'r'))
  local session_data = session:read('*all')
  print('Parsing ('..#session_data..' bytes)...')
  local obj, pos, err = json.decode(session_data, 1, json.null)
  if err then error(err) end

  print('Success! table size: '..#obj, pos, err)
  print('Decoded, swapping closed and open windows')
  local old_windows = obj.windows
  obj.windows = obj._closedWindows
  obj._closedWindows = old_windows

  print('Encoding')
  local outf = assert(io.open('sessionstore-rec-fixed.js', 'w'))
  outf:write(json.encode(obj))
  outf:close()

Monday, March 28, 2011

Rachota... oh no you di'n't!

Started using Rachota a couple of days ago to keep track of how much time I'm spending on my research, which currently consists of reading a big stats book.

Clicked into the "Analytics" tab just now and was informed of the following:
* You don't categorize your tasks enough. Try to assign some keyword to as many tasks as you can. This helps to track how much time your projects consume.
* You seem to spend too much time on private tasks or off the computer. Either minimize working on private stuff or don't leave your computer often without measuring such activity.
* It looks like your tasks are either very short or too long. Try to consolidate the short ones or divide the complex tasks. This helps to identify where your time really goes.
* You don't prioritize your tasks correctly. Use different priorities to distinguish important tasks from the low priority ones. This helps to keep focus on your real objectives.
* You don't use regular tasks properly. This might indicate you often work on a task that is not set as regular or there is a regular task that in fact occurs very rarely.
* It seems you leave your tasks open forever. Instead, close it once you are done with each task in reality to make your daily ToDo list shorter. Or don't you really finish anything?


...

Tuesday, March 01, 2011

Pointing float

> =(1/0.1)
10
> =(1/(1-0.9))
10

> =math.ceil(1/0.1)
10
> =math.ceil(1/(1-0.9))
11

....FFFFFFFUUUUUUUUU

Monday, February 14, 2011

Gluap: an attempt at PushGP in Lua

Eventually got around to implementing a basic genetic programming system in Lua, using the subset of the Push 3.0 language as explained in the previous post.

The results on my amazingly simple test function () are not great so far, though. Some runs produce a good result, while others get stuck in bad local optima:

best program so far with fitness 4.7958315233127 (integer.dup integer.+)
...
best program so far with fitness 71.917763204878 (integer.dup 231 298 integer./ integer./ integer.+)
...
best program so far with fitness 772.44948974278 (true)

Some of this may be due to using parsimony pressure, which can prioritise suboptimal solutions that are shorter in length. I'll have to read the tome on GP that's sitting upstairs.

Also, I've only implemented very simplistic mutation so far; maybe crossover will work better?
Even for mutation, selecting from the Push 3.0 instruction set with a uniform probability might be bad, since there are so many weird EXEC and CODE instructions, which are probably less likely to be useful than the simple arithmetic and stack instructions like FLOAT.* and INTEGER.DUP.

Tuesday, February 08, 2011

Mini-side-procrastination-project: Gluap (PushGP in Lua)

Rather than procrastinate uselessly this week, I decided to implement a barebones interpreter and library for Push 3.0, a stack-based language intended for use in evolutionary computation. For me, that means genetic programming. Never having tried Forth, this will be the first stack-based language I've ever used... the idea is interesting, especially with the gimmicky exec and code stacks, bringing up all sorts of weird possibilities for code which evolves its own control flow.

The language du jour is Lua, which feels a bit like a mix between Ruby and Python and maybe Tcl, and has a ravishingly fast JITted interpiler called LuaJIT on most x86/x64 systems.

It's going to be a mess, but hopefully a fun mess. After a few hours' work, I've got it to the point where it can do... almost nothing! It can interpret programs that consist of single number literals, though...

Edit, 2 groggy hours later

Finally, it can parse and evaluate very simple Push programs like this:
local result =
gluap.eval_program '( 5 1.23 + (4) - 5.67 FLOAT.*)'
assert_equal(1, result.pop('integer'))
assert_equal(6.9741, result.pop('float'))

If I can find time on Wednesday, the rest of the basic logic (I haven't got beyond basic arithmetic instructions yet) and maybe the beginnings of actual GP... or sleep, whichever.

Edit, 1 day and not enough sleep later

Instead of sleeping, it was more fun to add to the interpreter so that it can now execute each of the Push 3.0 simple examples correctly, including such madness as "( ARG FLOAT.DEFINE EXEC.Y ( ARG FLOAT.* 1 INTEGER.- INTEGER.DUP 0 INTEGER.> EXEC.IF ( ) EXEC.POP ) )", in some 346 lines of clumsy Lua. However, there's an intimidatingly long "type dictionary" which contains a great number of instructions of varying confusingness. Meh, maybe I can leave most of them for now and get with the GP part.

Tuesday, January 18, 2011

Building LuaSocket for LuaJIT on Windows with MinGW. Oh my!

Lua is a lovely little language, somewhat minimalistic like Scheme, with a powerful and concise syntax somewhat like Ruby and Python, but it was never "batteries included", meaning that significant effort must be expended to accomplish some seemingly-trivial tasks.
Those defensive of Lua will counter that this is precisely the point of Lua - to be small and carry as few dependencies as possible. This may be true, but it's nice to offer some easily-installed extras, especially for basic tasks. It would be quite painful if everyone had to rewrite and link to C code just to get the time in milliseconds or open a socket.

Lua is very popular with the embedded and scripting crowd; especially game scripting, with WoW and a bunch of other games allowing users to write little Lua programs with some API to access game functions. Even NMap also allows Lua scripting now.

I'm interested in the use of Lua for more general programming tasks and projects, where re-using existing libraries is important to cut down on extra work. This is especially the case for Lua newbies like myself, and given the minimalism present in Lua's design and organisation, even quite trivial tasks require the use of external libraries.
One example of this is getting the current time in milliseconds - this is useful for writing simple benchmarking scripts, for example. Since Lua tries to stick almost exclusively (apart from some dynamic linking aspects, according to the canonical reference tome, PiL) to the ANSI C specification, you can't get access to time information at finer than per second resolution.

There is a networking library named LuaSocket which happens to provide such a function by calling into the standard C library:
require 'socket'
> =socket
table: 0x0026b680
> =socket.gettime
function: 0x0026ba18
> =socket.gettime()
1295358928.145

Rather than compile the library from source, which can be problematic - especially on Windows and double-especially for those not familiar with C - there are two promising software repository efforts for Lua: LuaRocks and LuaDist.

Unfortunately, neither of these managed to successfully install LuaSocket:
  • LuaDist failed to find anything at all on its repository - hopefully a temporary problem.

  • LuaRocks first failed outright due to wget not being present on my Windows system. After installing a Gnuwin port and copying it into the LuaRocks dir, some faffing about was required to have it use our network proxy, and it finally downloaded the appropriate .lua and prebuilt .dll files. The DLLs crash LuaJIT, the JITted interpiler I'm using.

Apparently the dynamic libraries provided by such extensions must be compiled against your interpreter's lua51.dll.

To get LuaRocks to use a network proxy, you must add an (undocumented) entry to the config.lua file:
proxy = "http://proxy:port"


Since the prebuilt DLLs didn't work, I tried to build luasocket from source, linking against LuaJIT's lua51.dll. For this, LuaRocks tried to use what I assume are Visual Studio commands:
...
Extracting luasocket-2.0.2\test\testclnt.lua
Extracting luasocket-2.0.2\test\testsrvr.lua
Extracting luasocket-2.0.2\test\testsupport.lua

Everything is Ok

Folders: 6
Files: 89
Size: 477216
Compressed: 552960
'msbuild' is not recognized as an internal or external command,
operable program or batch file.
cp: src/mime.dll: No such file or directory
cp: src/socket.dll: No such file or directory

Error: Build error: Failed building.


Oh well, it made a pretty decent effort. There was an option to install LuaRocks using the MinGW compiler toolchain, and to use LuaJIT as the interpreter. This crashed. So I tried to build LuaSocket from source myself, but was stymied because the provided Makefiles/solution files are for Linux or assume Visual Studio is available on Windows.

After an lengthy period of head-banging, the following Makefile (in luasocket's src dir) produced DLLs that work in LuaJIT for me (after calling mingw32-make, the resulting mime.dll and socket.dll can be moved into luajit-dir\mime\core.dll and lj-dir\socket\core.dll).

#------
# LuaSocket makefile configuration
#

#------
# Output file names
#
EXT=dll
SOCKET_V=2.0.2
MIME_V=1.0.2
SOCKET_SO=socket.$(EXT)
MIME_SO=mime.$(EXT)

CC="C:\MinGW\bin\mingw32-gcc.exe"
CINC=-I"C:\MinGW\include"

LD=$(CC)
LDFLAGS=-L "C:\MinGW\lib" -lmingw32 -lkernel32 -lcrtdll -lwsock32 -shared
CFLAGS= $(LUAINC) $(CINC) $(DEF) -pedantic -Wall -O2
#------
# Lua includes and libraries

LUAINC=-I"C:\Program Files\Lua\5.1\include"
LUADLL="C:\code\luajit2\lua51.dll"

SOCKET_OBJS:= \
luasocket.o \
timeout.o \
buffer.o \
io.o \
auxiliar.o \
options.o \
inet.o \
tcp.o \
udp.o \
except.o \
select.o \
wsocket.o

#------
# Modules belonging mime-core
#
#$(COMPAT)/compat-5.1.o \

MIME_OBJS:=\
mime.o

all: $(SOCKET_SO) $(MIME_SO)

$(SOCKET_SO): $(SOCKET_OBJS)
$(LD) -o $@ $(SOCKET_OBJS) $(LUADLL) $(LDFLAGS)

$(MIME_SO): $(MIME_OBJS)
$(LD) -o $@ $(MIME_OBJS) $(LUADLL) $(LDFLAGS)

#------
# List of dependencies
#
auxiliar.o: auxiliar.c auxiliar.h
buffer.o: buffer.c buffer.h io.h timeout.h
except.o: except.c except.h
inet.o: inet.c inet.h socket.h io.h timeout.h wsocket.h
io.o: io.c io.h timeout.h
luasocket.o: luasocket.c luasocket.h auxiliar.h except.h timeout.h \
buffer.h io.h inet.h socket.h wsocket.h tcp.h udp.h select.h
mime.o: mime.c mime.h
options.o: options.c auxiliar.h options.h socket.h io.h timeout.h \
wsocket.h inet.h
select.o: select.c socket.h io.h timeout.h wsocket.h select.h
tcp.o: tcp.c auxiliar.h socket.h io.h timeout.h wsocket.h inet.h \
options.h tcp.h buffer.h
timeout.o: timeout.c auxiliar.h timeout.h
udp.o: udp.c auxiliar.h socket.h io.h timeout.h wsocket.h inet.h \
options.h udp.h
wsocket.o: wsocket.c socket.h io.h timeout.h wsocket.h

clean: rm -f $(SOCKET_SO) $(SOCKET_OBJS)
rm -f $(MIME_SO) $(UNIX_SO) $(MIME_OBJS) $(UNIX_OBJS)
#------
# End of makefile configuration
#

I'm hopeful that LuaDist (if it's not defunct - the webpage is not encouraging) and LuaRocks will improve and become more useful on Windows systems soon. Hopefully the same process will be easier when I try it at home on my old Macbook... but argh, all of this just to get the time in milliseconds!

Monday, January 10, 2011

Minigotcha: escaping path strings for cmd.exe

There's a very handy plugin for (g)Vim called netrw, which provides the capability to edit files over ssh/scp/etc. To set it up with Putty on Windows, the suggested lines to add to $MYVIMRC were:
let g:netrw_cygwin = 0
let g:netrw_scp_cmd = "\"C:\\Program Files\\PuTTY\\pscp.exe\" -pw mypasswd "


I'm not sure why this worked for them and not me, but perhaps I'm using an older (or newer) version of netrw, or the cmd.exe behaves differently in Vista. In any case, it didn't work and gave the infuriating output of:
'C:\Program' is not recognized as an internal or external command,
operable program or batch file.

This is annoying since the path for pscp is obviously escaped. However, it turns out that cmd.exe follows a slightly strange, arbitrary protocol for escaping space pathstrings - it uses another " character (ack, ugly and confusing... why? The argument is already in quotes...):
C:\Windows\system32\cmd.exe /c "C:\Program" Files\PuTTY\pscp.exe"

PuTTY Secure Copy client
Release 0.60

Okay then! So this works:
let g:netrw_scp_cmd ="\"C:\\Program\" Files\\PuTTY\\pscp.exe\""