Thursday, December 25, 2008

The end of the world

This is just beautiful. It's Pink Floyd's "The Great Gig in the Sky" set to a Discovery Channel simulation of a 500km asteroid impact.

(Watch it in HD. "But how? It's Youtube!" Trust me. This is cool.)

Could it really happen? Probably not anytime soon. 99942 Apophis has the record for highest Torino scale risk estimate so far... at two out of ten. This was back in 2004 when NASA estimated a 1-in-300 chance of collision in 2029. It's known now that that won't happen, but if it goes through the exact right ~600-meter "gravitational keyhole" in 2029 will be on a collision course for 2036. NASA gives that 1 in 45000 odds. If it does happen, though, it will release as much energy as 880 megatons of exploding TNT. The Mt. Krakatoa explosion -- the biggest known volcanic explosion -- was equivalent to about 200 MT TNT and caused unusually cold, long winters for three years in 1883. The biggest man-man explosion so far was "Tsar Bomba", which was a 50 MT hydrogen bomb. For a few nanoseconds in 1961, it lit up an island north of Siberia with about 1% the power output of the sun.

Nevada atomic bomb test in 1951

In other news, it's Christmas Day. I've just one-upped the Grinch.

Sunday, December 21, 2008

Dr. Markov in da house

Last week, in CS106X, I programmed Boggle.

Boggle is played with a set of cubes that have letters on each face. You shake them up, put them in a grid, and try to find words made of adjacent dice faster than the other players can.

Computer Boggle is different. It gives you time, letting you find as many words as you can, and then beats your score by an order of magnitude or so by finding all the remaining words on the board that are in the (huge) Scrabble player's dictionary. You can also play 5x5 "BigBoggle" instead of the standard 4x4, in which case the computer generally beats you even more thoroughly.

One of the cool things about CS106 is that they don't just want solutions, though. It's good style to extend them in some way, and solve some additional problems that weren't in the assignment. For Boggle, I wanted to write code that would find boards with unusually few (or many) words, using only the standard cubes.

So I tried Markov chain Monte Carlo, which I'd Wikied a long, long time ago when I was really into raytracing.

In a nutshell: the algorithm tries to find a global minimum of a function that can be sampled and which is roughly continuous, but which may be in many dimensions and may not have any other nice properties.

You start with a guess.
At each step, you find a distribution centered at the current guess and randomly sample it. If you know what the global minimum should be (e.g. zero) the variance of the distribution will vary with the cost (the value of the function at the current guess), so that if you're far away from the minimum, you're sampling over a wide area, while if you're getting close, you make more incremental guesses. Each 'probe' sampled this way is evaluated, and then you randomly decide whether to accept or reject it, depending on its cost. If the probe cost is significantly lower than the current cost, then you almost certainly accept it, and if the cost is significantly higher, then you almost always reject; in between, it's a toss-up: if you accept, then that probe becomes your current guess, and if you reject, then you keep probing.

Doing all this with Boggle boards took a bit of approximation and rule-of-thumb coding because I don't know a way to actually define a probability distribution over the set of possible Boggle boards. Instead, I defined 'swapping' as taking two cubes at random, switching their places, and randomly rotating each one. I started with a standard, randomly-generated board and, after each evaluation, did a number of swaps proportional to my cost function, with a maximum of about ten. When the guesses started getting so good that I wasn't switching any cubes, I picked single cubes and rotated them randomly to create new probes. The idea is similar: the closer you get to a global minimum, the closer your probes are to your current guess.

I made two cost functions, one preferring boards with lots of words and and the other preferring boards with few words. I already had a function that could quickly search a Boggle board and find all the words on it.

The result: it worked. I was surprised.

BigBoggle boards tend to have around 150 words on them. Here's one with no words at all:

On regular boggle boards, the computer will score ~90. Here, it scored 917:

The code is private because of academic policy, but if you want to play with the program, it's here.

Markov chain Monte Carlo rules.

Friday, December 19, 2008

Penetration-testing national security

The TSA is ridiculous. Bruce Schneier just posted an article about bypassing airport security. Apparently, contractors, repairmen and others enter airports -- sometimes with utility vans -- and get tarmac access, often without going through security.

Meanwhile, I wait in line to have my coke taken away because it's more than 3.5 ounces of liquid, and send my my Croc sandals through an astoundingly expensive scanner. The last time I went through security at Salt Lake International, I counted twenty-two TSA workers. It was early in the morning and there was no line. Four of them were working and the other eighteen were chatting in two big groups. All part of what Schneier calls security theatre.

All this security (which is dubious to begin with) is more than canceled out by negligent (or simply stupid) attitudes towards the simpler and more mundane risks that airports face. I'm sure that these security-nullifying attitudes vary widely from airport to airport, and even from employee to employee. I'm referring to bad habits that become part of the work culture -- like not asking for unknown people's ID (@Swede) and letting vans onto the tarmac with nothing more than a contractor logo (@WarLord).

Here's what I wish the government would do -- not just the TSA, but the NSA and others as well: pen-test national security. Send security contractors posing as contractors to airports and see if they can get in. Can they get a piece of unchecked luggage onto a plane? Can they leave without getting caught?

Clearly, there would need to be a robust vetting process to avoid costly lockdowns when the intrusions do get caught. Maybe, the first thing airports would do with a suspected breach would be to call directly to a national authority (TSA, for example) and check if it's a test. This extra complexity would add some cost and some potential security flaws.

Still, I think it would be a vastly better use of money than most of what the TSA is doing now. It would decrease overall risk significantly because airports would have a real, month-by-month financial and legal incentive to ensure real security. Most importantly, it would bring some accountability to airports and to the security programs that operate in them. There would be actual data on how frequently and severely trained, funded attackers can compromise various systems at airports. Security performance could be compared between airports, between security programs, and over time.

Nothing screams "wasteful theatre" louder than a $4.18 billion (FY2008) TSA budget and no real, quantitative evaluation of its effectiveness. Let's pen test airports and find out how well the security systems really work ... or if they work at all.

Thursday, December 4, 2008


The internet makes lots of tasks infinitely easier than they used to be. Want an executive summary of the War of 1812? Check Wikipedia. Want to know more? Keep reading. If you're really hardcore, the article lists eighty-two scholarly references. Need an obscure book printed by some shop in Khazakstan? Amazon can probably ship it to you for free in ten days or so. Want to convert a kilowatt hour into BTUs? Google "a kilowatt hour in btus". I have no experience here, but I bet that before the web, lots of these tasks would have taken minutes or hours rather than seconds.

Isn't it strange, though, that such a powerful tool for getting stuff done gets used so extensively for the purpose of not getting stuff done? I can't think of another way to explain the fifty-or-so bits of chatlist spam that land in my inbox every day, the SuperPokes, the pirates, the ninjas, the lolcats, and everything else along those lines. Even respectable, otherwise interesting sites like Digg and Youtube are filled with mildly funny, mildly shocking, and mildly sexual banalities.
A while ago, I posted about Twitter, the flagship time-waster-in-chief of all web 2.0 apps. Never before has humanity been this effective at wasting time.

I don't think that lost time is the real problem, though. When the novelty wears off and everything else they could have done with all that time sinks in, people will stop. I'm worried about money.

Parallel to this Cambrian explosion of "web 2.0" apps, replete with tweets, pokes, nudges, posts, threads, and lists, there have been all sorts of new revenue models, further and further removed from the sale of actual goods and services to consumers. Take Slide, Inc, which recently got a $500m valuation from some venture capitalists. It makes slideshow widgets (and little gems like "SuperPoke Pets!") which it plans to monetize by integrating targeted ads into users' slideshows. Many of the ads I see (and ignore) all the time are already advertise other ad-supported sites that don't actually sell anything. It's only a matter of time before I see some ads for Slide.

I think that the system by which dotcoms are funded today looks suspiciously like the way they were funded in the nineties. People are spending way too much money building castles in the air.

Someday, people will tire of spending so much time connected to each other, wasting each others time. The torrent of tweets, pokes, FWD:fwd:Fwd:Fwd:funny pics, and all their unholy brethren will start slowing down as people take back the time that used to be theirs. Maybe someday, sending pointless emails to a chatlist will have the same social stigma that shouting across a crowded room might have now. Maybe the parents of 2015 will tell their kids to finish their homework before they comment on each other's blogs. Maybe they'll warn their kids that too much Twitter rots your brain, or that Facebook bullies are probably just looking for attention and are best left ignored. Maybe someday, I'll log into Facebook and nobody will care what my hooker name would be. I can't wait for that day to come!

Let me skip back for a moment, to the first stock I ever owned ~ two shares of Microsoft, back in 1998 when I was in second grade. My best friend's dad thought that a market in which grade-schoolers own stock couldn't possibly be sustainable. He was right. Today, we've got a market in which investors bet billions on web startups with Rube Goldberg business plans, with almost no connection to reality. They're betting on people barely older than myself trying to capitalize on the time-wasting habits of people, often a lot younger than me, by showing them ads. As people become savvier and the web becomes fully integrated into our culture, my guess is that many dotcoms will be winnowed out in a second crash.