Tidepool

Plumbing Deep Learning in the Shallows

Smaller and Smaller and Troublesome

• Machine Learning, Neural Net, Python, MNIST, and Scikit-Image

A brand new rabbit hole of something to learn. And I’ll wisely choose to side-step this one for the moment, but it is interesting. How do you hold on to as much information as possible while downsizing an image to a size you want to work with?

Handwritten Digits

As part of a larger experiment with a custom neural net, I put together this implementation of Finnegan, to take a sample of a user’s handwriting (specifically a digit, 0-9) and store it, so I could run it against a pre-trained neural network, with the longer term goal of analyzing what is going on in the internal structures of the neural net. What exactly is it learning? But that is for a different post.

To get the the sample, I used Sketch.js and picked a size that seemed reasonable on the screen. Something you could easily draw in with the mouse or your finger (on mobile). I started at 194px x 194px, which was relatively easy to work with on the screen but otherwise, completely arbitrary. The problem is, the MNIST dataset I used to train the network is made up of images which are greyscale values in a 28 x 28 matrix. Now, 28 x 28 is way too small to draw in, so that wasn’t practical. And I turned my attention to resizing.

Scikit-Image is a wonderful library of tools for dealing with images in Python and the resize method from transform fit the bill neatly.

from skimage.transform import resize

def downsize(img_data, start_size, target_size):
    img_data = np.array(img_data).reshape(start_size, start_size)
    return resize(img_data, (target_size, target_size))

visualization(downsize(vector, 194, 28))

But quickly the problem asserted itself. Information was being lost in translation. A hand drawn ONE:

Drawn One

And what the resized version looked like (left) v. a sample one from MNIST:

Images of Ones

You can still make out the relation, but all those gaps! That is not doing the neural net any favors. So, steps to fix it: Low-Hanging Fruit = make the pen stroke bigger. Let’s go from 4 pixels to 18 pixels wide. That helps. But that alone doesn’t get us into the realm of the MNIST. The next step: RABBIT HOLE.

So, I started to research interpolation, image sampling, anti-aliasing. All things which were clearly affecting the outcome of the resize. And with other things more pressing, I resorted to some simple trial and error and settled unceremoniously on:

def downsize(img_data, start_size, target_size):
    img_data = np.array(img_data).reshape(start_size, start_size)
    return resize(img_data, (target_size, target_size), order=3, preserve_range=True)

visualization(downsize(vector, 174, 28))

The ‘order=3’ sets the interpolation to bi-cubic, examining the 8 pixels around a central core and creating the new pixel based on a weighted average of those surrounding it in the large image (or at least that is what I think is happening). The preserve_range=True did not seem entirely necessary, but it shouldn’t hurt as the original canvas element is recording values from 0 - 255 (actually the alpha channel of rgba from the canvas) which is the same range of greyscale found in the MNIST data.

As a last note, I also resized the drawing field to 174 x 174 as I found people were more comfortable, in general, drawing a slightly smaller digit than the canvas was providing, but as the MNIST is mostly regularized toward using a good deal of the 28 x 28, having the input more in line with that was helpful, pending a more robust neural net, anyway. An affine transformed training set is the next step and would solve the user input problem, but I’m not entirely sure this network can learn those kinds of transformations. Need to go to a convolutional net? More work to do, clearly.

Until then, happy learning.

comments powered by Disqus