AlphaGo – 6/6/2016

Back from a Beijing wedding. Observations of Beijing could probably be another WYL. Beijing is like a polar bear that fell asleep on a raft and woke up on the coast of England. It is proud, feisty, obsessed over its skin color, but mostly confused and trying to understand its surroundings.


Life Savoring: AlphaGo

One of my predictions in the last WYL actually turned out relevant!! So indeed in 5 years (more like 0, 4, or even -1 months) since my last prediction, an AI has beaten / already beat one of the world’s top Go professionals (the ambiguities mostly depend on whether you count Fan Hui as “top professional”)! I went to different showings for each game (even commentated in a couple), and it was a ton of fun.

  • similar to my other experience with different “tribes” in the world, the number one way of predicting which side a person P would predict to win the match was if the person P had more contact with Go or more contact with computers. Despite the different fancy rationalizations people may have, exposure to a field seems to be the main indicator of the person’s belief in the powers of that field. A “computer person” predicts that AlphaGo would win instead of Lee Sedol because he/she knows firsthand the power of computers (as opposed to Go players, who knows firsthand the power of Go professionals).
  • for the record, I’m not excluding myself from this heuristic; it applies well to me, who bet on AlphaGo probably because my exposure to programming/AI (on the order of 10000 hours) is greater than my exposure to Go (on the order of 1000 hours).
  • the above is in stark contrast with which side a person would cheer for, which surprisingly had more variance (as opposed to sports, where there is very little variance). A lot of nerds are less likely than normal sports-fans to signal tribalism. This is because the culture of being contrarian is “cooler” among nerds due to its self-reflective nature. (there are also meta-contrarians or meta-meta-contrarians, but each degree is an order of magnitude smaller)
  • for an example of good design, here’s a super cool visualization of strong Go players by time (spoiler: related to the topic): https://www.youtube.com/watch?v=oRvlyEpOQ-8 I think visualizations like this would have been really cool for education. Spinoff ideas would include empires fighting each other for space, or economic empires fighting each other for money.
  • top 3 things to know that many outlets don’t talk about:
    • It is fairly widely-agreed that Lee Sedol’s “Godly move” in game 4 was a mistake. I have a kind of lengthy explanation on why the “mistake” was effective against AlphaGo in particular, but I don’t feel like putting it in the margins of this post.
    • If I’m reading the paper correctly, all the learning AlphaGo did was on its own games and high-level amateur games. In particular, no pro games were used. So arguments like “it is unfair that AlphaGo saw Lee Sedol’s games but not vice-versa” aren’t founded.
    • In the latter half of Game 4, AlphaGo played a lot of “spazzy” moves that “made no sense” to humans, and many have suspected is some bug. A lot of this has to do with the fact that if you are in a losing situation and you model your opponent as probabilistic, then the moves with the highest probability of winning are the moves for which your opponent has the fewest number of correct responses. Interestingly, using the word “desperation” is at least half-correct to describe this kind of action, as I think for humans it basically means the same thing — except humans also use other heuristics besides number of responses, such as the “difficulty” or “trickiness” of the response (which AlphaGo does not have features for)
  • mysteries about the paper (won’t define all the terms. Sorry.):
    • I find it strange that the “dumb” policy network was more useful for rollouts than the “smart” policy network. Some of this is usually explained away by the analogy that it is more important to learn to play against things that are not you, but wouldn’t it still give you better responses? (i.e. you don’t want to miss the best possible moves against you, because really strong humans/opponents can still find those)
    • One discussion that came up at a paper-reading session was how you may want to “prepare” for a particular player, such as Lee Sedol. A dumb idea is just to have a “Lee Sedol policy network” based on his games only, and then add weights to average it in when you do rollouts (it seems silly to only use it for rollouts, as many of his moves will still look “normal” so no need to throw away data). I wonder if this kind of thing actually gives better responses?
    • if I ever see in a paper something like “we found \lambda=0.5 to be the right weight to allocate between two systems” to make a hybrid expert out of two experts, I’m assuming that the only things they’ve tried are the two systems by themselves and a 0.5 mixture (and maaaaybe 0.25 and 0.75 weights). I wonder how right I am about this. =D

Misc:

  • (Call for Heroes, especially physical therapists): In the past, I’ve frequently gotten a lower-back pain that after sprinting downhill (which I no longer do) or playing too much basketball (which I do want to continue doing). Hypothesis on common cause: landing after a jump? It is usually alleviated by pulling on the glutes and sucking the butt in (and un-arching the back as a result). Usually this pain lasts about 2 days, but the current iteration has lasted about three weeks, which worries me. Is there an exercise I can do or postures/motions to avoid in the future to prevent this?
  • It may help some of you (doing advertisements, self-promotion, what-have-you) that the number of responses I had to the last WYL was about 3 times higher than usual! Some of it must be people missing me, and some of it is due to the WYL having been short. I think I have a pretty solid idea which one it really is. ;D thanks for the good wishes regardless.
  • Many of you had really good things to say about rent vs buying. I think two of the main points that I really liked were
    • (a) Y. giving a rule of thumb about using how long you have to live in a place as a factor (this is similar to the Ski Rental Problem);
    • (b) S. reminding me that home maintenance is a hidden cost if you are doing buying instead of renting. Now, I can definitely see myself rationalizing skills being learned as part of home maintenance (go hoarding!), but it is good to not weigh that too high.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s