SUBSCRIBE! If you REALLY enjoy these videos, please consider supporting on Patreon! / nicksibicky COMPLETE NICK SIBICKY GO LECTURE PLAYLIST: • Nick Sibicky Go Lectur...
I'd really love to see AlphaGo's responses to being dropped into various points in old famous games. Or at least let it give its win favor % move-by-move to those games. I'd love to see what it thought about the board state before and after the Ear-Reddening Move, for example, and what it would have done at that point...
As a (western) chess player, who loves go - it's fascinating to watch the parallels between this and the way that computers are so entrenched now in all aspects of chess for a couple of decades. Chess also has many heuristics developed by humans over centuries that computers can surprisingly often disregard - using intricate tactics to magically make their moves work... If this this happens in an opening or ending, that a human can internalise and make use of - then humans can make very effective use of the unusual moves. Whereas copying the computer idea at other times is a really bad idea for a human - as the resulting tight-rope can be far-too difficult. Will be very interesting to see which ideas are revealed that become useful to human players..
if someone tells their Go teacher "but alpha go played it" in my opinion the response should be "Okay. if you can tell me why this is a good move I will let you play it"
But with those opening moves it is almost always the case that they are played because masters are playing and winning with them. There is no better reason.
How will you learn if you don't play it? What I mean is that I played avalanche and taisha without understanding all variations but it was fine because it was a joseki. Now this is also a joseki. If someone plays something to punish me, eventually I will learn how to defend from that. If I don't play a joseki (alpha go or not), I will never understand why it is good.
I feel that Alpha Go has an amazing artistic view of how it wins by being so pragmatic. Imagine this for a human; from the time you enter school to the time graduate college you pass every exam with precisely 90%, just enough to get an A. It's like saying, O I need to get an A, what does that really mean. Imagine the emotional and intellectual discipline and the command of the field it would take for human to do that. It's zen like.
I loved this lecture...on so many levels! And exciting to hear that you and Andrew will be playing for our entertainment/edification in a few weeks!! :)
I got confused when you showed the joseki at 18:50. You say "Alphago is white now", which seems logical because you said earlier that Alphago does not play the pincer. But then at 19:20 you emphasize that Alphago does not connect at C15, but this is a move for black and not white. Can anyone explain?
You're describing how AlphaGo connects below instead of directly if other people pincer AG. That doesnt really make sense because in this Situation black is still the one pincering. 19:20
I don't understand what you are trying to say. AlphaGo is white here. We just agreed that "other people" aka black pincers AlphaGo aka white. What you write only seem to reconfirm what was said. How does that not make sense?
Nick starts by saying Alpha go is white, and this is how he answers when players pincer it, but then the 'twist' is blacks continuation of the joseki, ie. the pincering player.. u know?
AlphaGo is 10-Dan. The first (and last) human to ever win a game against it was Lee Sedol in Game 4 of his 5-game match. The ONLY time a human has ever won against it. And it is getting better. And it will only get even better. The next "real" match against a Pro I think it won't matter how many hours of time the human gets. AlphaGo will win. I'm convinced now that it is indeed a full 10-Dan
James H I partially agree but you are not taking consideration that the closer you get to perfection the less you improve. Maybe AG is already very close to perfect and a billion more self-play games will only improve it by 1 stone. This would mean the best humans are only a few stones off perfection when they play at their best.
I think it's interesting to try the AlphaGo moves, but without understanding why AlphaGo plays the moves it does, it will be very difficult to play those moves in the right situation. This is why I think it's a bit silly to justify your moves with by saying, "AlphaGo played it" unless the entire board position is identical.
I bet you the amount of 4-4 approach kick-extend-tenuki will go up on Tygem now. Just when I was getting out of the ranks where people play that :( On the other hand the gote end result that's supposed to be the refutation of it always felt fishy to me...
Brett Castellanos I think trying out the moves at least provides an avenue of experimentation towards understanding the principles behind these plays. Why does it work in that situation but not this one? Repeating this one could gradually refine emulation into an approximation of the "thought process" behind AlphaGo. Obviously I still wouldn't call that correct, but like you said it's interesting
the reason that alphago doesn't do these massive killer moves and try to win by many points, is that alphago plays against itself on every move it plays - it assumes that the humans next move would be its next move if it were to play that move. So it assumes it's playing a master and doesnt have any superior knowledge
An important part of AlphaGo's programming is that not only does it not care about winning by the most points, it also doesn't really care about making strictly the best move. It actually will find several moves with around the same probability and then randomly pick from one of those, encouraging creative/new moves. So in the endgame, if a move won't make it lose, it might play it even though it literally gives the other player a point. This is what causes the .5 point wins even though they were actually far from close games.
Thanks for the update on the intriguing AlphaGo! Your comments made me think that "How to Punish Unfinished Joseki" video would be particularly useful to DDK players (me!)... If anyone else agrees, please like this post!
alphago was trained with 200,000 human games and probably 100,000,000 games against itself. that is the reason it feels alien. It has almost no humanity left in it.
Alpha-go's "urgency" evaluation takes into consideration a lot more future positions. It can tenuki, with the knowledge there will be some probability it will be able to come back later and finish the joseki, fix shape, or get compensation elsewhere. It plays very fast and lose, optimizing the probabilities it will come out ahead over the most "reasonable" outcomes that arise out of a position (Super-Intuition). This enables Alpha-Go to have a more "global" sense of time.
To the issue that comes up at 24:30, I think sooner or later we will get more direct access to the system behind these moves, with the bot producing commentaries on its game.
A few comments about why AlphaGo does what it does. One reason humans play some moves and avoid others is the reading necessary to know what to do next. Reading is one thing AlphaGo is really good at. It can play a different response and tenuki because it knows what a good response to all moves its opponent can make in that area. As to the tendency to win by very little, AlphaGo is playing to minimized the lines that do not lead to a win. Similar to what my football coach said, "If they don't score, we don't lose.". AlphaGo will gladly give up a large play to remove the opportunity of a small gain by the opponent. It is focused almost exclusively on minimizing risk. A key feature is flexibility. AlphaGo can find a good response, or at least a profitable exchange. It is all about a path to victory and it only needs 0.5 points to win.
The end of this video the students talk about how much hardware it takes for alphago to play at the level it does. In that regard, I think humans are still ahead - our performance per Watt is significantly better still at playing go.
Separate comment on other things from the video. DeepMind is developing the ability to apply Machine Learning to many problems. One of their goals is to be able to develop what we used to call Expert Systems to be used for medical diagnosis. If they have the same success, it will be tremendous for improving outcomes for patients. I played chess through school. I was paying attention when Deep Blue won. Chess is still alive and well. There are as many players as there has ever been. There are still books published and classes held. I look forward to seeing how this will help the development of go.
Nick I also believe alpha go has complete confidence in its fighting. so it knows if the player attacks its 2 or 3 stones in the corner, talking about around 29:00. Alpha go is counting on that humans will not take the time to read every move ahead of time or let alone have the time to. It has a built in time limit. it didn't play 7 or 8 seconds because it was calculating it was 7 or 8 seconds because thats show long they set the timer for it. My theory is that if you remove the time limit of the game you see humans win.
hello Nick, what happens when Alpho go Shoulder hits And you ignore the shoulder hit and attack their corner that has 1 stone in the top right corner. How does alpha go respond?
Ke Jie knew that it was alpha go because they told him but asked him not to tell anyone. So considering this it seems that google would says that it was alpha go even if it lost a game. It is hard to keep a secret like that.
Teng Weixing reviewed his game on weiqi tv and he specifically mentionned he had known it was AlphaGo before playing. Someone posted a translation on /r/baduk, it's well worth the watch. It's around 2 minutes, but the whole video is excellent. Maybe Nick could ask for confirmation through pro players and clear it up in his next video. My impression is that most, and perhaps all, players knew it was AlphaGo from the start, but maybe it was different for the first few games.
It's important to remember how convolutional neural networks work when talking about AlphaGo, especially since they're such a simple concept. All they are are a series of convolutions that reduce the current board state to 361 outputs that each give a weighted signal based on the internal structure of the network. AlphaGo just picks the highest value move. One thing to remember about this kind of system is that the whole thing is highly deterministic and very parallel. Each convolution and each neuron takes the same amount of work to process any board state. This means you should expect results to take a very consistent amount of time. A picture of the entire board propagates through the system each time it needs to make a move. The "learning" mechanisms are also extremely interesting... but they're not like human learning. You can think of convolutional neural networks as pattern recognition schemes for computers (which have traditionally been very bad at pattern recognition). "Training" such a mechanism basically involves giving it example after example, and for each of those examples, finding some way to give each of the neurons (including both the 361 output neurons AND the hidden neurons) some value. It may be that this kind of scheme cannot solve go in the sense of "finding God's move". Ultimately, it can only learn from us and from itself. Another thing to keep in mind is that if it really is "just" a convolutional neural network, it's probably not sitting there trying to formulate strategies. It's just looking at the board and picking the best move. The current state of the board is fed into the system, and once all the output neurons turn on, it makes the move that corresponds to the highest value among them. There may not even be any temporal sense at all! That kind of thing needs to be added deliberately by the programmers, and really, why would they? If any of you want to learn more about this, the Computerphile channel did some excellent (layman accessible) videos on the topic of kernal convolutions. The first video is going to be about kernal filters, but you need to know about those in order to understand the input end of the neural network (at least... ONE way that it might work). I've been flagged as spam before, so I'm just going to show the part after youtube dot com: /playlist?list=PLl6o3lt9y_cCCUBIvbpYanLkdN9BdzFKH
It is quite intresting, to me it looks like alpha go is actually prefering older playing styles. Instead of keeping opponent corner invadeable for later, alpha go liked to immediatly shoulderhit/surround opponent corner from the outside. With the exception if it where to calculate a corner invasion would be needed based on the rest of the board. Alpha go also seems to leave small stone formations and move on to next place, compareable to older playstyles where you make alot of smaller bases. It also likes the large knight enclosures which where also in favour in the past, while small knight enclosure is more modern. For an example, if you go back the your previous video about the "oldest game" neither players seemed eager to invade corner that had large knight enclosure and instead surrouned the outside. Imagine how eager those old style players would be to surround a corner with a small knight enclosure, makin the corner one space smaller.
Because They've used a database of games to train AlphaGo, and probably most games were older. It is good strategy, it was working in the past, it is still working, but it will make the human player think more.
Chongee AlphaGo played itself millions of times. Much more than it’s database of human games. Why would Google feed it older games? They want it to be good.
alphago's pragmatism is embodied by the go proverb that rich men should not pick fights... the one possible weakness of alphago might be its predictability... if it plays the same move in the same position a human will be able to prepare a refutation. I suspect therefore that from time to time with longer time limits the top pros might be able to defeat alphago - though this will be very and increasingly rare.
No, it’s just that humans are that much worse. You can make a lot of really bad moves and beat a toddler. AlphGo could learn some new things and end up liking pincers again.
We can consider that alphago was trained with a "human" style and it develops a strategy against all "human" games. Maybe, if everyone starts learning the alphago way, it will adapt again and play differently. What I mean by that is maybe there isn't a perfect style to reach.
You are overestimating the "training" and comparing it to human learning. Machines are not intelligent yet. The training is only for computing the probability of different moves before you compute their actual value, the main computation for evaluating is the monte-carlo evaluation, just like every other go program of decent strength. The "learning" part is finite and the least important thing, and has probably already been optimized to the best possible local evaluation, and can't improve further.
There's a very simple algorithmic reason behind the tendency of alphago not finishing joseki/playing tenuki : Consider a situation with 2 computers (computer 1 and computer 2, computer 1 to play) and move A being joseki, move B being tenuki. Both computers are building a tree for the situation (I suppose you know the basics of Monte-Carlo Tree Search) Let's talk about computer 2 (which is waiting for computer 1 move): Since move A is joseki, it is the most likely to be explored by the algorithm. Let's say 90% of the tree is about move A, and only 1% is evaluating move B (because it's tenuki). If computer 1 plays A, then computer 2 has already thought a lot about the move since 90% of the tree can be used, the other 10% are thrown away though => 90% of the thinking time was actually useful. If on the other hand computer 1 choose to play B, then 99% of the tree turns out to be total garbage for computer 2 = only 1% of the thinking time was useful. Basically, if computer 1 tenukis, then computer 2 has to think the position from scratch. Now let's talk about computer 1 (which has to play): If computer thinks the same as computer 2, there's little to say because it will build a similar tree. But if computer 1 has a tendency to tenuki in that position, then instead of thinking about move A 90% of the time and move B 1%, maybe it'll be like 20% for A and 20% for B. It's not that good if you want to play A (because computer 2 has been thinking a lot about the move), but if you choose to play B, you had way more time than your opponent to think about what's next (20% against 1%), resulting in an higher chance of winning the game. Hence, if alphago plays tenuki, it doesn't mean the move is actually the best, but simply that the move is good enough considering the fact it has now a huge lead in term of reflection. Of course, it's only relevant against computer since human players won't forget instantly everything they thought earlier if they are surprised. As a consequence, here's a tip if you want to beat a computer : think 1 hour about a sequence then play forcing/timesuji move before starting the sequence, you will be 1 hour of thinking time ahead.
Vincent Richard This makes sense. But, do we know that AlphaGo doesn't remember it's previous reading in areas of the board that haven't changed significantly?
Very interesting. I think the slack-looking moves are also probably just helping AlphaGo because they reduce uncertainty. Once it kicks a corner approach stone, that cuts out a lot of invasion possibilities, and this lighter load on its forward reading might more than make up for a small loss in future potential. As with your point, perhaps AlphaGo got into this habit just because it led to slightly better results in its self-play games - not from raw strength but from paring down the future game tree faster than its opponent.
According to Deepmind Natures publication 1 year ago, it was not the case at the time, and there are few reasons to think it hasn't changed since then. If you want to recycle some part of the tree (previous reading), you have to find an algorithm that can tell quickly which part can be saved and which part can't. This is a very complex problem since stones far away may have a huge impact on the orientation of plays. The simplest example is friendly stones. When it comes to run or attack, the presence of a single stone may change the outcome of the game (eg. ladder breaker). Interestingly enough, recycling trees is a field where human is way more skillful than computers (end of the joke). I'm convinced the deeper reason behind this fact is the strategical approach of the game by human. When you play the game, you first choose what you want to do, a strategy (build a wall, take territory, remove ennemy base,...), then try to find a way to do it, a sequence. Of course, you may reconsider the strategy if no sequence is satisfying enough. This strategical approach helps you to cut the board in pieces you can think about separately before sticking them back together. Indeed, once you've chosen a strategy for one of the pieces (eg. build a wall), the actual outcome (eg. the actual shape of the wall) is not that relevant for a part of the board far away (another piece). Thus strategy is the glue you can use to stick back the piece to the rest of the board. A computer doesn't bother with strategy (first because it has its own flaws, second and mainly because it's too hard) and directly think about sequences. Hence, what you may consider as similar moves, because doing the same thing (same strategy), are considered as totally different from the computer point of view. Without this magical glue, it's unlikely for a computer to recycle parts of the tree in a fair amount of time.
Vincent Richard I suggest you research on YT. Watch and listen carefully to experts about neural nets. Once you get the gist of neural nets you be awed, shocked, and alarmed about the future effect of AI on human society.
The part at 19:45 doesn't really make sense to me. On the one hand, alpha go never plays the one space pincer, on the other hand it plays the "connect under" variation of it? That's a choice the pincering player gets to make ?!?
is it possible that some of alphago's early moves are wrong but it makes up for it in mid game because of its superior brute force calculation abilities? alphago doesn't have high level strategies like human players
chris r Or those openings might be the most suitable to that particular in some way Also, just because it doesn't strategize doesn't mean it can't have tendencies that higher level strategies could be extrapolated from.
its what we see in ai chess, if its a tactical opening then it's no problem for the computer but if its a long closed position then it needs an opening book to play correct opening's.
It really seems that way, Chris, doesn't it? It's almost like it can afford to be haphazard because it will work out ways to just make garbage work later. However, let's look at what a strategy really is, fundamentally. All a strategy is is simply how advantages in resources are to be exploited against vulnerabilities in barriers to reach objectives. That's it. So AlphaGo is strategizing constantly. And, it appears, the moves it makes prove to be of value many moves later. If every move, it's evaluating vulnerabilities and how to apply resources to reach objectives, it's constantly strategizing. What's your evidence that AlphaGo doesn't have high level strategies?
One way to look at it is that for a bot especially a strong neural net bot strategy emerges from near perfect play. In a sense strategy is implicit in the play of a bot that plays near perfection.
I think AlphaGo doesn't just make exchange moves to make shape and take opportunities, but it is trying to make time work against the human opponent. Making stupid moves, like tenuki in the middle of the joseki, is serving this purpose too. It makes the opponent confused, uncomfortable, whereas AlphaGo is comfortable in those situations, so by making quick moves and using the opponents time by trying to make the opponent confused, even if just for a move or two, there will be a mistake, and in such situations AlphaGo will know that it was a mistake, because it's comfortable in those situations. Plus it makes long jumps because it either works, then he wins points, or some stones will be sacrificed and wins points elsewhere and a lot of thinking makes the opponent tired, but the machine doesn't care, as the game goes on it will be much better for the computer, there are less moves to choose from. I think that the AI was programmed to do this, it was designed to play against human weaknesses and make use of it's own strength. This takes away the beauty of GO and I find it unfair too especially because AlphaGo could not win against any of these Pro's if there is no Pro game database to feed into these neural networks. It is a very good job done by Google, I'm not saying that AlphaGo is not strong, but it's also unfair from Google to do deep learning on pro player games and then don't let them deep think, especially if they didn't know who they play with. (Unfair even if they did know) One thing that we can definitely learn from this, is to pay attention what kind of data are we producing and letting companies like Google or Facebook have, because there are these powerful tools like machine learning and all they need is data, plus they seem to play unfair, so I'm careful, but the problem is, that if they don't get my data, they still have enough from others.
Chongee It learned by playing games against itself millions of times. Its not playing mind games. It thinks a human opponent will play the same as it does. It’s not managing the game clock like a football coach lol.
Jesus i kept having to skip ahead in the video whenever that annoying slow talking guy would start droning on with stupid ideas. Some of the best commentary about alphago tho. (from nick)
It would be interesting to see Alphago playing against Alphago... Makes me wonder how many games out of a hundred would end up by white winning... Just a playful thought.
I'd really love to see AlphaGo's responses to being dropped into various points in old famous games. Or at least let it give its win favor % move-by-move to those games.
I'd love to see what it thought about the board state before and after the Ear-Reddening Move, for example, and what it would have done at that point...
i think i heard it didn't like it lol
As a (western) chess player, who loves go - it's fascinating to watch the parallels between this and the way that computers are so entrenched now in all aspects of chess for a couple of decades. Chess also has many heuristics developed by humans over centuries that computers can surprisingly often disregard - using intricate tactics to magically make their moves work...
If this this happens in an opening or ending, that a human can internalise and make use of - then humans can make very effective use of the unusual moves. Whereas copying the computer idea at other times is a really bad idea for a human - as the resulting tight-rope can be far-too difficult.
Will be very interesting to see which ideas are revealed that become useful to human players..
"Mom, I gotta move to Seattle."
"Why?"
"Nick Sibicky told me to."
"Then go, my son."
"Exactly."
if someone tells their Go teacher "but alpha go played it" in my opinion the response should be "Okay. if you can tell me why this is a good move I will let you play it"
But with those opening moves it is almost always the case that they are played because masters are playing and winning with them. There is no better reason.
Playing the move is a great way to learn more about it. Especially with a whole community trying it and trying to understand it.
How will you learn if you don't play it? What I mean is that I played avalanche and taisha without understanding all variations but it was fine because it was a joseki. Now this is also a joseki. If someone plays something to punish me, eventually I will learn how to defend from that. If I don't play a joseki (alpha go or not), I will never understand why it is good.
I feel that Alpha Go has an amazing artistic view of how it wins by being so pragmatic. Imagine this for a human; from the time you enter school to the time graduate college you pass every exam with precisely 90%, just enough to get an A. It's like saying, O I need to get an A, what does that really mean. Imagine the emotional and intellectual discipline and the command of the field it would take for human to do that. It's zen like.
I loved the cut from the epic intro music to Nick just coughing.
Loved the definitely-not-StarWars music :-)
I loved this lecture...on so many levels! And exciting to hear that you and Andrew will be playing for our entertainment/edification in a few weeks!! :)
I got confused when you showed the joseki at 18:50. You say "Alphago is white now", which seems logical because you said earlier that Alphago does not play the pincer. But then at 19:20 you emphasize that Alphago does not connect at C15, but this is a move for black and not white. Can anyone explain?
derpepe0 thought im the only one noticed it :))
i think he's showing what alphago thinks comes out of the pincer with optimal moves for both sides
Good work as always, thanks for the review! :)
You're describing how AlphaGo connects below instead of directly if other people pincer AG. That doesnt really make sense because in this Situation black is still the one pincering. 19:20
I don't understand what you are trying to say. AlphaGo is white here. We just agreed that "other people" aka black pincers AlphaGo aka white. What you write only seem to reconfirm what was said. How does that not make sense?
Nick starts by saying Alpha go is white, and this is how he answers when players pincer it, but then the 'twist' is blacks continuation of the joseki, ie. the pincering player.. u know?
Hope you're all right, Nick! get better soon
AlphaGo is 10-Dan. The first (and last) human to ever win a game against it was Lee Sedol in Game 4 of his 5-game match. The ONLY time a human has ever won against it. And it is getting better. And it will only get even better. The next "real" match against a Pro I think it won't matter how many hours of time the human gets. AlphaGo will win.
I'm convinced now that it is indeed a full 10-Dan
James H I partially agree but you are not taking consideration that the closer you get to perfection the less you improve. Maybe AG is already very close to perfect and a billion more self-play games will only improve it by 1 stone. This would mean the best humans are only a few stones off perfection when they play at their best.
Great video, thanks.
Just a note, there were no ASICs mentioned in the public paper in Nature. Just a couple thousand GPUs and CPUs.
I think it's interesting to try the AlphaGo moves, but without understanding why AlphaGo plays the moves it does, it will be very difficult to play those moves in the right situation. This is why I think it's a bit silly to justify your moves with by saying, "AlphaGo played it" unless the entire board position is identical.
We do the same thing with pros as well you know
Even then, you would have to know the correct follow ups to make that move work in different lines.
I bet you the amount of 4-4 approach kick-extend-tenuki will go up on Tygem now. Just when I was getting out of the ranks where people play that :(
On the other hand the gote end result that's supposed to be the refutation of it always felt fishy to me...
Pros explain their lines of tought
Brett Castellanos I think trying out the moves at least provides an avenue of experimentation towards understanding the principles behind these plays. Why does it work in that situation but not this one? Repeating this one could gradually refine emulation into an approximation of the "thought process" behind AlphaGo.
Obviously I still wouldn't call that correct, but like you said it's interesting
"Not that endgames aren't interesting" kyu endgames are the most interesting
the reason that alphago doesn't do these massive killer moves and try to win by many points, is that alphago plays against itself on every move it plays - it assumes that the humans next move would be its next move if it were to play that move. So it assumes it's playing a master and doesnt have any superior knowledge
"Maybe this is the spontaneous emergence of modesty.' :D
That might be the only comment from Dan I've aver appreciated.
yeah I waited for this vid!!!
An important part of AlphaGo's programming is that not only does it not care about winning by the most points, it also doesn't really care about making strictly the best move. It actually will find several moves with around the same probability and then randomly pick from one of those, encouraging creative/new moves. So in the endgame, if a move won't make it lose, it might play it even though it literally gives the other player a point. This is what causes the .5 point wins even though they were actually far from close games.
Thanks for the update on the intriguing AlphaGo! Your comments made me think that "How to Punish Unfinished Joseki" video would be particularly useful to DDK players (me!)... If anyone else agrees, please like this post!
Yes please, it's so frustrating knowing joseki only to have your oponent break it or tenuki and not know how to capitalise
alphago was trained with 200,000 human games and probably 100,000,000 games against itself. that is the reason it feels alien. It has almost no humanity left in it.
Great video, Nick! I liked the idea that 9d might need 3 stones against God. Maybe AlphaGo just needs 2.
The Go community needs to get together in order to defeat these damn Go Bots!
they kind of did
I barely know how to play go, (I focused on chess as a board game), but as other people have opined this was a great lecture.
Alpha-go's "urgency" evaluation takes into consideration a lot more future positions. It can tenuki, with the knowledge there will be some probability it will be able to come back later and finish the joseki, fix shape, or get compensation elsewhere. It plays very fast and lose, optimizing the probabilities it will come out ahead over the most "reasonable" outcomes that arise out of a position (Super-Intuition). This enables Alpha-Go to have a more "global" sense of time.
awesome commentary as usual: thanks a lot :)
at 15:11 > where can I find these games?
To the issue that comes up at 24:30, I think sooner or later we will get more direct access to the system behind these moves, with the bot producing commentaries on its game.
A few comments about why AlphaGo does what it does.
One reason humans play some moves and avoid others is the reading necessary to know what to do next. Reading is one thing AlphaGo is really good at. It can play a different response and tenuki because it knows what a good response to all moves its opponent can make in that area.
As to the tendency to win by very little, AlphaGo is playing to minimized the lines that do not lead to a win. Similar to what my football coach said, "If they don't score, we don't lose.". AlphaGo will gladly give up a large play to remove the opportunity of a small gain by the opponent. It is focused almost exclusively on minimizing risk.
A key feature is flexibility. AlphaGo can find a good response, or at least a profitable exchange. It is all about a path to victory and it only needs 0.5 points to win.
The end of this video the students talk about how much hardware it takes for alphago to play at the level it does. In that regard, I think humans are still ahead - our performance per Watt is significantly better still at playing go.
Good point. 5 lbs of gray matter doing so much more than just playing go, too.
More on alpha go PLZ ^^
Separate comment on other things from the video.
DeepMind is developing the ability to apply Machine Learning to many problems. One of their goals is to be able to develop what we used to call Expert Systems to be used for medical diagnosis. If they have the same success, it will be tremendous for improving outcomes for patients.
I played chess through school. I was paying attention when Deep Blue won. Chess is still alive and well. There are as many players as there has ever been. There are still books published and classes held. I look forward to seeing how this will help the development of go.
Nick I also believe alpha go has complete confidence in its fighting. so it knows if the player attacks its 2 or 3 stones in the corner, talking about around 29:00. Alpha go is counting on that humans will not take the time to read every move ahead of time or let alone have the time to. It has a built in time limit. it didn't play 7 or 8 seconds because it was calculating it was 7 or 8 seconds because thats show long they set the timer for it. My theory is that if you remove the time limit of the game you see humans win.
Alphago just clicking buttons randomly 39:49
hello Nick, what happens when Alpho go Shoulder hits And you ignore the shoulder hit and attack their corner that has 1 stone in the top right corner. How does alpha go respond?
i felt sad for lee sedol last year , still can t believe this ever happened
if pros start using the alphago style then they will get even better!
Ke Jie knew that it was alpha go because they told him but asked him not to tell anyone. So considering this it seems that google would says that it was alpha go even if it lost a game. It is hard to keep a secret like that.
Teng Weixing reviewed his game on weiqi tv and he specifically mentionned he had known it was AlphaGo before playing. Someone posted a translation on /r/baduk, it's well worth the watch. It's around 2 minutes, but the whole video is excellent.
Maybe Nick could ask for confirmation through pro players and clear it up in his next video. My impression is that most, and perhaps all, players knew it was AlphaGo from the start, but maybe it was different for the first few games.
nice intro!
What you say when they say alpha go played it. Can you read the follow up?
It's important to remember how convolutional neural networks work when talking about AlphaGo, especially since they're such a simple concept. All they are are a series of convolutions that reduce the current board state to 361 outputs that each give a weighted signal based on the internal structure of the network. AlphaGo just picks the highest value move.
One thing to remember about this kind of system is that the whole thing is highly deterministic and very parallel. Each convolution and each neuron takes the same amount of work to process any board state. This means you should expect results to take a very consistent amount of time. A picture of the entire board propagates through the system each time it needs to make a move.
The "learning" mechanisms are also extremely interesting... but they're not like human learning. You can think of convolutional neural networks as pattern recognition schemes for computers (which have traditionally been very bad at pattern recognition). "Training" such a mechanism basically involves giving it example after example, and for each of those examples, finding some way to give each of the neurons (including both the 361 output neurons AND the hidden neurons) some value. It may be that this kind of scheme cannot solve go in the sense of "finding God's move". Ultimately, it can only learn from us and from itself.
Another thing to keep in mind is that if it really is "just" a convolutional neural network, it's probably not sitting there trying to formulate strategies. It's just looking at the board and picking the best move. The current state of the board is fed into the system, and once all the output neurons turn on, it makes the move that corresponds to the highest value among them. There may not even be any temporal sense at all! That kind of thing needs to be added deliberately by the programmers, and really, why would they?
If any of you want to learn more about this, the Computerphile channel did some excellent (layman accessible) videos on the topic of kernal convolutions. The first video is going to be about kernal filters, but you need to know about those in order to understand the input end of the neural network (at least... ONE way that it might work).
I've been flagged as spam before, so I'm just going to show the part after youtube dot com:
/playlist?list=PLl6o3lt9y_cCCUBIvbpYanLkdN9BdzFKH
A lot of what you said is correct however AG does do some tree searching.
why does the go board co-ordinates not have i?
If I write I10, you can't tell if it's a capital 'i' or a 'l'. Removing one of the two avoid any ambiguity. By convention, it's 'i'.
me too !
It is quite intresting, to me it looks like alpha go is actually prefering older playing styles.
Instead of keeping opponent corner invadeable for later, alpha go liked to immediatly shoulderhit/surround opponent corner from the outside.
With the exception if it where to calculate a corner invasion would be needed based on the rest of the board.
Alpha go also seems to leave small stone formations and move on to next place, compareable to older playstyles where you make alot of smaller bases.
It also likes the large knight enclosures which where also in favour in the past, while small knight enclosure is more modern.
For an example, if you go back the your previous video about the "oldest game" neither players seemed eager to invade corner that had large knight enclosure and instead surrouned the outside.
Imagine how eager those old style players would be to surround a corner with a small knight enclosure, makin the corner one space smaller.
Because They've used a database of games to train AlphaGo, and probably most games were older. It is good strategy, it was working in the past, it is still working, but it will make the human player think more.
Chongee AlphaGo played itself millions of times. Much more than it’s database of human games. Why would Google feed it older games? They want it to be good.
does alpha go has a style?
Nick skillfully parrying goofy questions should tip you off to his Go level.
alphago's pragmatism is embodied by the go proverb that rich men should not pick fights... the one possible weakness of alphago might be its predictability... if it plays the same move in the same position a human will be able to prepare a refutation. I suspect therefore that from time to time with longer time limits the top pros might be able to defeat alphago - though this will be very and increasingly rare.
Someone in your class watches too much Black Mirror my man, hahaha.
I hope you can get back your double-colored stones ;)
they decided that the single color ones look better on camera
They're louder hitting the board, but it's easier to see which player is coming next based on what's in his hand.
starts at 4:02
me tooooooo!
fun fact : there was some mistake caused by human error for inputing wrong position from what alphago's suggestion. but yeah, alphago still win.
They predicted a lil corona lmao.
What website do you play online go on?
Popular servers are online-go.com and www.gokgs.com
If you mean Nick specifically I think he plays mostly on www.tygemgo.com
AlphaGo is a nuclear missile
If it's been misled and it's winning every game, is it really misled?
its misleading you to think that you misled it
No, it’s just that humans are that much worse. You can make a lot of really bad moves and beat a toddler. AlphGo could learn some new things and end up liking pincers again.
Nick you're great. But what is going on with this beard my man? lol
Baby.
Nick's evil twin
It doesn't look bad..
As he said: Baby. It saves precious time especially because the chin area requires more care and thus time.
We can consider that alphago was trained with a "human" style and it develops a strategy against all "human" games. Maybe, if everyone starts learning the alphago way, it will adapt again and play differently.
What I mean by that is maybe there isn't a perfect style to reach.
There is a perfect style to reach since the game has perfect information and doesn't have randomness, so there is a "perfect game" to be played.
Fair enough. It's like computing each combinations which leads to a solved game (tic-tac-toe)
Yeah any game with deterministic moves and perfect information has an optimal strategy.
You are overestimating the "training" and comparing it to human learning. Machines are not intelligent yet. The training is only for computing the probability of different moves before you compute their actual value, the main computation for evaluating is the monte-carlo evaluation, just like every other go program of decent strength. The "learning" part is finite and the least important thing, and has probably already been optimized to the best possible local evaluation, and can't improve further.
Who is constantly playing with stones loudly, 😕
There's a very simple algorithmic reason behind the tendency of alphago not finishing joseki/playing tenuki :
Consider a situation with 2 computers (computer 1 and computer 2, computer 1 to play) and move A being joseki, move B being tenuki. Both computers are building a tree for the situation (I suppose you know the basics of Monte-Carlo Tree Search)
Let's talk about computer 2 (which is waiting for computer 1 move):
Since move A is joseki, it is the most likely to be explored by the algorithm. Let's say 90% of the tree is about move A, and only 1% is evaluating move B (because it's tenuki). If computer 1 plays A, then computer 2 has already thought a lot about the move since 90% of the tree can be used, the other 10% are thrown away though => 90% of the thinking time was actually useful. If on the other hand computer 1 choose to play B, then 99% of the tree turns out to be total garbage for computer 2 = only 1% of the thinking time was useful. Basically, if computer 1 tenukis, then computer 2 has to think the position from scratch.
Now let's talk about computer 1 (which has to play):
If computer thinks the same as computer 2, there's little to say because it will build a similar tree. But if computer 1 has a tendency to tenuki in that position, then instead of thinking about move A 90% of the time and move B 1%, maybe it'll be like 20% for A and 20% for B. It's not that good if you want to play A (because computer 2 has been thinking a lot about the move), but if you choose to play B, you had way more time than your opponent to think about what's next (20% against 1%), resulting in an higher chance of winning the game.
Hence, if alphago plays tenuki, it doesn't mean the move is actually the best, but simply that the move is good enough considering the fact it has now a huge lead in term of reflection. Of course, it's only relevant against computer since human players won't forget instantly everything they thought earlier if they are surprised.
As a consequence, here's a tip if you want to beat a computer : think 1 hour about a sequence then play forcing/timesuji move before starting the sequence, you will be 1 hour of thinking time ahead.
Vincent Richard This makes sense. But, do we know that AlphaGo doesn't remember it's previous reading in areas of the board that haven't changed significantly?
Very interesting. I think the slack-looking moves are also probably just helping AlphaGo because they reduce uncertainty. Once it kicks a corner approach stone, that cuts out a lot of invasion possibilities, and this lighter load on its forward reading might more than make up for a small loss in future potential.
As with your point, perhaps AlphaGo got into this habit just because it led to slightly better results in its self-play games - not from raw strength but from paring down the future game tree faster than its opponent.
According to Deepmind Natures publication 1 year ago, it was not the case at the time, and there are few reasons to think it hasn't changed since then.
If you want to recycle some part of the tree (previous reading), you have to find an algorithm that can tell quickly which part can be saved and which part can't. This is a very complex problem since stones far away may have a huge impact on the orientation of plays. The simplest example is friendly stones. When it comes to run or attack, the presence of a single stone may change the outcome of the game (eg. ladder breaker).
Interestingly enough, recycling trees is a field where human is way more skillful than computers (end of the joke). I'm convinced the deeper reason behind this fact is the strategical approach of the game by human. When you play the game, you first choose what you want to do, a strategy (build a wall, take territory, remove ennemy base,...), then try to find a way to do it, a sequence. Of course, you may reconsider the strategy if no sequence is satisfying enough. This strategical approach helps you to cut the board in pieces you can think about separately before sticking them back together. Indeed, once you've chosen a strategy for one of the pieces (eg. build a wall), the actual outcome (eg. the actual shape of the wall) is not that relevant for a part of the board far away (another piece). Thus strategy is the glue you can use to stick back the piece to the rest of the board. A computer doesn't bother with strategy (first because it has its own flaws, second and mainly because it's too hard) and directly think about sequences. Hence, what you may consider as similar moves, because doing the same thing (same strategy), are considered as totally different from the computer point of view. Without this magical glue, it's unlikely for a computer to recycle parts of the tree in a fair amount of time.
Vincent Richard I suggest you research on YT. Watch and listen carefully to experts about neural nets. Once you get the gist of neural nets you be awed, shocked, and alarmed about the future effect of AI on human society.
Has AlphaGo ever gotten shoulder hit early in the game? I wonder how it would react
Probably many thousands of times in it's self-play learning.
May sound a bit small-minded; but we're eight minutes in, and there's not a stone on the board.
The part at 19:45 doesn't really make sense to me. On the one hand, alpha go never plays the one space pincer, on the other hand it plays the "connect under" variation of it? That's a choice the pincering player gets to make ?!?
is it possible that some of alphago's early moves are wrong but it makes up for it in mid game because of its superior brute force calculation abilities? alphago doesn't have high level strategies like human players
chris r Or those openings might be the most suitable to that particular in some way
Also, just because it doesn't strategize doesn't mean it can't have tendencies that higher level strategies could be extrapolated from.
its what we see in ai chess, if its a tactical opening then it's no problem for the computer but if its a long closed position then it needs an opening book to play correct opening's.
It really seems that way, Chris, doesn't it? It's almost like it can afford to be haphazard because it will work out ways to just make garbage work later. However, let's look at what a strategy really is, fundamentally. All a strategy is is simply how advantages in resources are to be exploited against vulnerabilities in barriers to reach objectives. That's it. So AlphaGo is strategizing constantly. And, it appears, the moves it makes prove to be of value many moves later. If every move, it's evaluating vulnerabilities and how to apply resources to reach objectives, it's constantly strategizing. What's your evidence that AlphaGo doesn't have high level strategies?
One way to look at it is that for a bot especially a strong neural net bot strategy emerges from near perfect play. In a sense strategy is implicit in the play of a bot that plays near perfection.
I think AlphaGo doesn't just make exchange moves to make shape and take opportunities, but it is trying to make time work against the human opponent. Making stupid moves, like tenuki in the middle of the joseki, is serving this purpose too. It makes the opponent confused, uncomfortable, whereas AlphaGo is comfortable in those situations, so by making quick moves and using the opponents time by trying to make the opponent confused, even if just for a move or two, there will be a mistake, and in such situations AlphaGo will know that it was a mistake, because it's comfortable in those situations. Plus it makes long jumps because it either works, then he wins points, or some stones will be sacrificed and wins points elsewhere and a lot of thinking makes the opponent tired, but the machine doesn't care, as the game goes on it will be much better for the computer, there are less moves to choose from. I think that the AI was programmed to do this, it was designed to play against human weaknesses and make use of it's own strength. This takes away the beauty of GO and I find it unfair too especially because AlphaGo could not win against any of these Pro's if there is no Pro game database to feed into these neural networks. It is a very good job done by Google, I'm not saying that AlphaGo is not strong, but it's also unfair from Google to do deep learning on pro player games and then don't let them deep think, especially if they didn't know who they play with. (Unfair even if they did know)
One thing that we can definitely learn from this, is to pay attention what kind of data are we producing and letting companies like Google or Facebook have, because there are these powerful tools like machine learning and all they need is data, plus they seem to play unfair, so I'm careful, but the problem is, that if they don't get my data, they still have enough from others.
Chongee It learned by playing games against itself millions of times. Its not playing mind games. It thinks a human opponent will play the same as it does. It’s not managing the game clock like a football coach lol.
Jesus i kept having to skip ahead in the video whenever that annoying slow talking guy would start droning on with stupid ideas. Some of the best commentary about alphago tho. (from nick)
the audience talks wayyyyyyy to much
It's a class. They're students. Discussing. Learning.
then the video should be called Go discussion.
It would be interesting to see Alphago playing against Alphago... Makes me wonder how many games out of a hundred would end up by white winning... Just a playful thought.