Training not on MNIST

Yaroslav Bulatov made a dataset of glyphs available that he called notMNIST. I briefly ran my implementation for deep networks on the dataset and got about 3.8% error rate on the test set. For the inverted data (binary pixel inversion) I get a similar error, about 4% (inverted MNIST is more difficult than the original version, at least when one does not take precaution, e.g. see a nice recent report that talks about MNIST and binary-binary RBMs.)

The network is a 784-1024-1024-1024-1024-10 neural network. All layers except the last one are pretrained as a binary-binary RBM: 50 epochs of minibatch gradient descent with a learning rate of 1e-3. Gradient computation for the RBMs is done via CD-1. Afterwards, the deep network is finetuned for 500 epochs with a learning rate of 2.5e-3, again with minibatch gradient descent. The input is not preprocessed.

Important note: I didn't use any validation set to tune the learning rate(s) and the other hyperparameters: I chose the learning rate for the fine tuning part such that a smooth progress on the training set is achieved. Stoping after 500 epochs is completely arbitrary — actually, I didn't want to wait longer. If time permits I will do a more thorough set of experiments using a validation set to determine hyperparameters.