Neural Net Architecture is not a playground for those that demand instant gratification. The endless trials with slight variations of one parameter or another (made ever so much worse when you don’t take rigorous notes, head slap) provide feedback only when they are good and ready. So it is much like watching water boil. While we’re waiting. Lets break out Python’s cProfile and see what is going on under the hood, just to pass the time, of course.
That’s a lot of time to be spending with the binomial method. Now something suddenly makes sense. Several weeks ago in early testing on AWS there seemed to be little difference between training on a CPU optimized instance and a GPU optimized one. That in itself did not seem to surprising as there isn’t much in a neural net that lends itself to parallelization. The backpropogation at any given point depeneds entirely on every calculation that comes before it.
However, in a recent of fit watching the water boil, I splurged and gave the GPU another go, this time with an identical architecture to one running on the CPU opt. instance. And magically … a 10-fold increase! What changed? Aha! During the first look at GPU’s I was testing regularization (so I had neuron dropout turn off) to prevent overfitting. But since then I had gone back to dropout experiments (and hence regularization) turned off.
So what does binomial have to do with all this? That is the function that creates the random mask over the neurons to turn some of them off for a particular pass. This however is something that can be parallelized. Of course, the GPU will be making hay with this. Now I just need to decide if dropout is worth that much time. Heh, I guess that’s what bigger computers are for.
Thanks Hobson for the insight on this one.