Sunday, December 21, 2008

Dr. Markov

Last week, in CS106X, I programmed Boggle.

Boggle is played with a set of cubes that have letters on each face. You shake them up, put them in a grid, and try to find words made of adjacent dice faster than the other players can.

Computer Boggle is different. It gives you time, letting you find as many words as you can, and then beats your score by an order of magnitude or so by finding all the remaining words on the board that are in the (huge) Scrabble player's dictionary. You can also play 5x5 "BigBoggle" instead of the standard 4x4, in which case the computer generally beats you even more thoroughly.

One of the cool things about CS106 is that they don't just want solutions, though. It's good style to extend them in some way, and solve some additional problems that weren't in the assignment. For Boggle, I wanted to write code that would find boards with unusually few (or many) words, using only the standard cubes.

So I tried Markov chain Monte Carlo, which I'd Wikied a long, long time ago when I was really into raytracing.

In a nutshell: the algorithm tries to find a global minimum of a function that can be sampled and which is roughly continuous, but which may be in many dimensions and may not have any other nice properties.

You start with a guess.
At each step, you find a distribution centered at the current guess and randomly sample it. If you know what the global minimum should be (e.g. zero) the variance of the distribution will vary with the cost (the value of the function at the current guess), so that if you're far away from the minimum, you're sampling over a wide area, while if you're getting close, you make more incremental guesses. Each 'probe' sampled this way is evaluated, and then you randomly decide whether to accept or reject it, depending on its cost. If the probe cost is significantly lower than the current cost, then you almost certainly accept it, and if the cost is significantly higher, then you almost always reject; in between, it's a toss-up: if you accept, then that probe becomes your current guess, and if you reject, then you keep probing.

Doing all this with Boggle boards took a bit of approximation and rule-of-thumb coding because I don't know a way to actually define a probability distribution over the set of possible Boggle boards. Instead, I defined 'swapping' as taking two cubes at random, switching their places, and randomly rotating each one. I started with a standard, randomly-generated board and, after each evaluation, did a number of swaps proportional to my cost function, with a maximum of about ten. When the guesses started getting so good that I wasn't switching any cubes, I picked single cubes and rotated them randomly to create new probes. The idea is similar: the closer you get to a global minimum, the closer your probes are to your current guess.

I made two cost functions, one preferring boards with lots of words and and the other preferring boards with few words. I already had a function that could quickly search a Boggle board and find all the words on it.

The result: it worked. I was surprised.

BigBoggle boards tend to have around 150 words on them. Here's one with no words at all:

On regular boggle boards, the computer will score ~90. Here, it scored 917:

The code is private because of academic policy, but if you want to play with the program, it's here.

Markov chain Monte Carlo rules.

Thursday, December 4, 2008


The internet makes lots of tasks infinitely easier than they used to be. Want an executive summary of the War of 1812? Check Wikipedia. Want to know more? Keep reading. If you're really hardcore, the article lists eighty-two scholarly references. Need an obscure book printed by some shop in Khazakstan? Amazon can probably ship it to you for free in ten days or so. Want to convert a kilowatt hour into BTUs? Google "a kilowatt hour in btus". I have no experience here, but I bet that before the web, lots of these tasks would have taken minutes or hours rather than seconds.

Isn't it strange, though, that such a powerful tool for getting stuff done gets used so extensively for the purpose of not getting stuff done? I can't think of another way to explain the fifty-or-so bits of chatlist spam that land in my inbox every day, the SuperPokes, the pirates, the ninjas, the lolcats, and everything else along those lines. Even respectable, otherwise interesting sites like Digg and Youtube are filled with mildly funny, mildly shocking, and mildly sexual banalities.
A while ago, I posted about Twitter, the flagship time-waster-in-chief of all web 2.0 apps. Never before has humanity been this effective at wasting time.

I don't think that lost time is the real problem, though. When the novelty wears off and everything else they could have done with all that time sinks in, people will stop. I'm worried about money.

Parallel to this Cambrian explosion of "web 2.0" apps, replete with tweets, pokes, nudges, posts, threads, and lists, there have been all sorts of new revenue models, further and further removed from the sale of actual goods and services to consumers. Take Slide, Inc, which recently got a $500m valuation from some venture capitalists. It makes slideshow widgets (and little gems like "SuperPoke Pets!") which it plans to monetize by integrating targeted ads into users' slideshows. Many of the ads I see (and ignore) all the time are already advertise other ad-supported sites that don't actually sell anything. It's only a matter of time before I see some ads for Slide.

I think that the system by which dotcoms are funded today looks suspiciously like the way they were funded in the nineties. People are spending way too much money building castles in the air.

Someday, people will tire of spending so much time connected to each other, wasting each others time. The torrent of tweets, pokes, FWD:fwd:Fwd:Fwd:funny pics, and all their unholy brethren will start slowing down as people take back the time that used to be theirs. Maybe someday, sending pointless emails to a chatlist will have the same social stigma that shouting across a crowded room might have now. Maybe the parents of 2015 will tell their kids to finish their homework before they comment on each other's blogs. Maybe they'll warn their kids that too much Twitter rots your brain, or that Facebook bullies are probably just looking for attention and are best left ignored. Maybe someday, I'll log into Facebook and nobody will care what my hooker name would be. I can't wait for that day to come!

Let me skip back for a moment, to the first stock I ever owned ~ two shares of Microsoft, back in 1998 when I was in second grade. My best friend's dad thought that a market in which grade-schoolers own stock couldn't possibly be sustainable. He was right. Today, we've got a market in which investors bet billions on web startups with Rube Goldberg business plans, with almost no connection to reality. They're betting on people barely older than myself trying to capitalize on the time-wasting habits of people, often a lot younger than me, by showing them ads. As people become savvier and the web becomes fully integrated into our culture, my guess is that many dotcoms will be winnowed out in a second crash.