Cassava Leaf Disease Classification — Final Models

6 min readMar 21, 2021

Work conducted by: Enmin Zhou, Yangyin Ke, Huaqi Nie

Introduction

When people expect to get answer from the AI, there is no one algorithm is 100% better than others. In the previous blog, we discuss our initial implementations on Cassava Leaf Disease Classification dataset by three different methods in different levels.

The AutoML is an auto machine learning integration by AI on Google AI Platform to find the optimal model architecture; the EfficientNet combines a new uniformly scaling method and AutoML to largely improve the training speed and accuracy; ResNet applies mathematics method to learn the underlying functions behind data more easily by minimizing the residual to zero.

This blog would continue to show the final version of our three models and then evaluate their performance by different metrics.

AutoML

As we discussed in our last blog, we have tried an AutoML model with 1000 images and the resulting accuracy is around 0.74. This time we are building the whole dataset of 21397 images. This is a little bit tricky as Chrome Browser Interface is hard to upload 21397 files at once. In other way around, we choose to use “gstuil” , a google cloud sdk tool, to upload our dataset from Kaggle to AutoML with correct authentication, project_id and storage bucket id. Meanwhile, we generate a single-label metadata csv file for this dataset. AutoML will automatically divide the dataset into train, valid and test with ratio 0.8, 0.1, 0.1. Also, AutoML will perform basic data augmentation for image classification task. The only hyper-parameter we need to set is the trade-off between precision and recall.

Model Architecture:

Compared with previous model we used, AutoML also choose to use a residual block of 3 CNN layers (Conv2D — DepthWiseConv2D — Conv2D). The structure is a triangle as the number of filters increases as the model go deeper. The model ends up using a global average pooling layer and a dense layer with 5 units activated by “softmax” function.

You can see the visualization of the model structure here: https://drive.google.com/file/d/1v720jpRx0kX0BoRkLJeeV_WBGhcvibMY/view?usp=sharing

Evaluation:

Ideally, the model reach 87.43% for precision and 80.75% for recall. The accuracy for the model is 0.9015, with an accuracy 0.527 for the minor class and 0.98 for the major class.

The model is exported using tf.lite format, which took out loss and optimization information as they are useless for inference. The input of the model is 224*224*3 in the data type of tf.uint8, which help to shrink the weights parameter in the model to make model smaller. The input also needs to go through quantization to re-scale to tf.uint8 from original RGB value. The output is a 5-number array and we use the argmax of the array as our prediction label.

EfficientNet

EfficientNetB3 and EfficientNetB7 are tried in the training process. The different of them is the depth of neural network. EfficientB7 contains more parameters to train with more complex model architecture, and EfficientNetB3 with less parameters would have advantages in computational time.

Model Architecture:

The final version of EfficientNet is using EfficientNetB7 with the imaginet retrained weight as the base model. By removing the top layers, we add a dropout layer after flatten the output, and then add several fully-connected layers with dropout layers, to prevent overfitting and make a more generalized model.

We set batch size to be 16*number of replicas to adjust the batch size based on the number of training images. The initial learning rate is set to be 0.001 to let the model learn from the model more stable.

Also, we add some callbacks mechanisms to control the training processes. The early_stopping parameter would stop the model when there is no update on validation loss with 2 patience steps; the checkpoint could help the model to save current best model; and the reduce_lr could help the model to update the learning rate as settings.

Evaluation:

The final results are shown in the plots. As more epochs involved in the training process, the test set accuracy and loss follow the trend of training set, and the both the accuracy score and the loss tend to converge after 5 iterations. It indicates that the model could find the optimal point and converge well in the training process.

ResNet

Hyperparameter selection: Data loading process and model architecture here are constructed following Jonny Lee’s notebook, which provides a clear tutorial on how to decode tfrecord files to extract images, feeding them into ResNet transfer learning with TPU.

Model Architecture:

This model is built with transfer learning, using ResNet50 as base model. Base layers are not frozen during the training process, so that the model can better fit the dataset used in this project. Following the base model is a flatten layer. The flatten outputs are fed into two dense layers with ReLU activation. Both of these two fully connected layers come with dropout and batch normalization to prevent overfitting. Finally, the output layer splits into 5 branches, following the number of classes. The output dense layer is activated by softmax function. Adam is used as the optimizer for this model, with an exponential decay learning rate to efficiently approach the minima. The loss function for this model is specified as sparse_categorical_crossentropy, because, in this case, target variables are not transformed into dummy variables. Therefore, the tracked metric here is sparse_categorical_accuracy.

Evaluation:

Observing from the loss plot, we can see that both train loss and validation loss keep decreasing in later epochs, converging to the minima of loss function. The accuracy plot below indicates that as more epochs involved, both train and validation accuracy increase. After 25 epochs, the validation accuracy is close to 80%, which means the model architecture and training process fit this problem.

Kaggle Entry:

Possible next steps

AutoML reaches an accuracy of 0.9 while ResNet and EfficientNet do not, because it trains a Residual Neural Network from scratch using our dataset. As a result, model parameters fit very well into this leaf-disease dataset. In the future, we can try to build some part of the ResNet and EfficientNet to see whether it will increase the performance. On the other hand, we can merge anchor box into our model for disease location or leaf location detection. The detection box will help the model to focus on the essential part in the image, which theoretically provides more accurate insights.

Reference

AutoML references: https://medium.com/@sriramgopal10792/image-classification-using-google-automl-tutorial-part-1-dc2beb6908f6)
EfficientNet: https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html
ResNet: https://stackoverflow.com/questions/58689997/why-does-the-global-average-pooling-work-in-resnet

Cassava Leaf Disease Classification — Final Models

Introduction

AutoML

EfficientNet

ResNet

Kaggle Entry:

Possible next steps

Reference

Written by Huaqi Nie