Image classification is a pretty complicated task. The best methods tend to be Convolutional Neural Networks (CNNs). CNNs pass filters over subsets of the image to extract the edges, translate the edges into some features, and then, if designed correctly, find patterns within those edge features and link those patterns to the features you are aiming to predict.

Designing an effective CNN is much easier said than done. You have to pick the kernel size of the filter, the number of filters, the types of intermediate hidden layers, the size of the intermediate hidden layers, if or where to add dropout or max pooling layers, and on and on. You don’t need to be a deep learning expert to understand that there are endless combinations of designs, but even experts on deep learning have very little intuition for how changes in a CNN’s architecture will affect its performance.

We designed a very basic CNN with 1 convolutional layer, two dense layers, one max pooling layers, and one dropout layer. We had a sense of the role that each of these layers would play in achieving the CNN’s goal, but beyond that, the order and other hyperparameters were more or less randomly chosen.

Table of CNN layers and parameters

Results were poor. Accuracy was a bit below 75%. Furthermore, it took nearly four hours to train this model for 5 epochs, and we had no intuition for what to change to improve performance.

This is where transfer learning comes in. At a high level, transfer learning is simply taking the knowledge gained solving one problem and applying it to a related problem.

ImageNet is a database of over 1 million human-labeled images. Every year, there is an ImageNet challenge where teams attempt to train and then correctly classify those images into 1000 categories, from cats and dogs to cars and planes. Teams around the world take weeks designing deep neural networks with many layers, experimenting with different combinations of layers and hyperparameters. Many of these teams have nearly unlimited computing budgets, which we did not have. In recent years, accuracies of the winning teams have approached 97%, which may be better than a human.

ImageNet examples

Many of the architectures of the winning models are publicly available. You can download several through TensorFlow and Keras, including the pretrained weights. We took the pretrained weights and architecture of one of the better models, VGG16, and added two dense layers to the back end that are trained on our data. The final layer is shaped to return 10 classes (9 different parasites plus the negative class).

You might ask, why would a model trained on cats and dogs work on parasites? The answer is that the convolutional layers and weights from VGG16 were designed to efficiently detect edges and find patterns within those edges. That process is not unique to the cats and dogs of ImageNet.

Indeed, the results were extremely good. Accuracy with the VGG16 model exceeds 97%, even with cross validation. Without employing transfer learning, we would likely have been stuck indefinitely trying endless combinations of hyperparameters with our original ineffective CNN. Transfer learning gave us the structure and baseline we needed to focus our efforts on more productive tasks. There are still hyperparameters to tweak (optimizers, learning rate, additional back-end layers, etc.), but the scale of solving those problems is much more manageable than our original task of designing a CNN from scratch. Transfer learning helped get ParasiteID off the ground.




Nathaniel Schub