Cassava Leaf Disease Classification — TFRecord and Updated Models

7 min readMar 16, 2021

Work conducted by: Enmin Zhou, Yangyin Ke, Huaqi Nie

Introduction

In the previous blog, we go through the EDA and set baseline model of Cassava Leaf Disease Classification dataset. This blog is aimed to explore the application of TFRecord with three updated models, AutoML, EfficientNet, ResNet.

Load data from TFRecord files

TFRecord is a special format which is designed for TensorFlow to store data, which allows us to read and write quickly through the tf.dataset API. We want to use this format to improve our data loading efficiency.

For this Cassava Leaf Disease dataset, each TFRecord file compresses multiple images and their corresponding labels into one file. Instead of reading more than 20,000 images in the train and test files, we could only read 16 TFRecord files in train set and one file in test set.

16 files in the train set were split into train and validation sets.

By loading data from TFRecord files, we need to specify the records of image data and corresponding labels. Both of them are stored in the sequence format, so that the image decoding is required. The original image size in the TFRecord files is 512*512. We resize the image into 128*128 for future model training.

Image shape after decoding and resizing.

Visualizing the resized images from TFRecord file. Each image is with its corresponding label

AutoML

AutoML is an auto machine learning integration by AI on Google AI Platform. It utilizes Google cloud storage, Google cloud services and Google machine learning technologies. The capacity of AutoML is verified in some Kaggle competitions that human groups need to achieve the performance of AutoML in three days while AutoML only spend a few hours on it.

For Kaggle cassava leaf disease competition, we use the Automl Vision since it is an image classification task.

Set Up: To use AutoML, we have to set up a Google AI Platform account, which initially provides you with $300 free credits for cloud services. The IAM setup is required for better management of your projects.

The project ID of our task is “DPU-2040”

Initialize: Then we need to initialize a cloud storage and upload our datasets labels and images. Since the AutoML Vision requires the storage to be set up in US-Central1 with standard modes, we do not choose other regions for cloud storage.

The three folders we make for dataset labels, images and AutoML exported models

The AutoML dataset labels have specified formats that users need to follow for their tasks, including single-label, multi-labels, non-labels and labels-assigned-to-a-set. Our task is single-label image classification so the format we should follow is:

gs://my-storage-bucket-vcm/flowers/images/img100.jpg,daisy

While the url before the comma is the location of an image in cloud storage (You can find it by clicking your image in cloud storage!) and the content behind the comma is the label. After we have a well-formatted csv file for dataset labels, we upload it to cloud storage for training usage.

AutoML Vision: Now in the AutoML Vision, we only need to import the dataset label csv file as it specifies the google storage address for images to use and AutoML will import it automatically.

In this page, we can see every image we have and their labels as well as the statistics of the dataset. Also, you can label the images manually if you want.

Training Model: In the training part, we have different choices to trade-off. Here, we use the best balance between precision and recall. Then you can set your expected training hours and start training! Usually, Google will suggest a best training time period.

Our models has the best trade-off between Precision and Recall is when Precision is 0.74 and Recall is 0.65.

Evaluation: The platform also shows the evaluation of the model. We can reach a best precision score of 0.91 if recall is not taken into account. The confusion matrix is also below to show the predictions.

The platform contains the Precession-Recall Curve and Confusion Matrix.

Export Model: After the AutoML finishes training, we can deploy the model online for predictions with provided API or export it to local PC. AutoML provides 5 options and Container is designed for a python environment.

We export both TF Lite file and container file for the model architecture and the TensorFlow graph. The following graphs only show part of the architecture with the automatically hyperparameter selection of AutoML. You can find the full graphs in our Google Drive.

EfficientNet

EfficientNet is a new method based on a new uniformly scaling method and AutoML, which largely improve the training speed and accuracy.

“Unlike conventional approaches that arbitrarily scale network dimensions, such as width, depth and resolution, our method uniformly scales each dimension with a fixed set of scaling coefficients.”

Hyperparameter selection: Our EfficientNet is built on the open source EfficientNetB3 on TensorFlow, with the imaginet retrained weight and Average Polling. All of them use ReLU/LeReLU as activation functions to avoid vanishing gradient. By removing the top of EfficientNetB3, we add several full-connected layers after flattening its output. Drop out rates are all set to 0.25 to break model symmetry, and l2 regularization and batch normalization are set to prevent overfitting. The final layer contains 5 nodes which refer to the 5 classes with softmax activation. The model is compiling with Adam as optimizer and sparse_categorical_accuracy as metrics.

Result: The best accuracy score is 0.7504.

Evaluation: We only train 5 epochs due to the limit computational power and limit RAM, but the model performance does reach to the level of AutoML. The losses for both train and validation set keeps decreasing and the accuracy scores are continuous increasing. Thus, we could expect, with larger epochs, the model could provide better result in the future experiments.

Loss plot and accuracy plot from TensorBoard

ResNet

Some times we realized that the deeper neural network cannot make the model perform better. The aim of ResNet is to solve the problem of network degradation, to let the model be more sensitive to learn the underlying function and update the parameters.

Hyperparameter selection: This model is built with transfer learning, using ResNet50 as base model. The output of base model is fed to a global average pooling to make feature maps smaller and more manageable. Also, pooling is capable of protecting our model from small variations in position of local features. Finally, the pooling outputs are handled by an output layer with 5 units, activated by softmax function. Adam is used as the optimizer for this model, with an exponential decay learning rate to efficiently approach the minima. The loss function for this model is specified as sparse_categorical_crossentropy, because, in this case, it is more accurate without using dummy variables. Therefore, the tracked metric here is sparse_categorical_accuracy.

Result: The best accuracy score onis 0.8717.

Evaluation: From the plots you could see that as more epochs involved in the training process, the test set accuracy and loss follow the trend of training set, and the both the accuracy score and the loss tend to converge after 10 iterations. It indicates that the model could find the optimal point and converge well in the training process.

Kaggle Entry

We put our best model result on Kaggle here.

Next Steps

We would keep trying to improve the performances of our three models.

Implement AutoML: Now we have the best result from AutoML. We will further modify the model structure based on this architecture and improve our model.

Improve Model: For EfficientNet, we need to improve the epochs numbers to find the best model. Also, redesign the self-designed top layers to find a better model architecture is needed for both EfficientNet and ResNet.

References

How to train a Keras model on TFRecord files: https://keras.io/examples/keras_recipes/tfrecord/
Image resize: https://www.tensorflow.org/api_docs/python/tf/image/resize
Read images with various sizes from TFRecord: https://github.com/tensorflow/tensorflow/issues/10492
AutoML references: https://medium.com/@sriramgopal10792/image-classification-using-google-automl-tutorial-part-1-dc2beb6908f6)
EfficientNet: https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html
ResNet: https://keras.io/api/applications/resnet/#resnet50-function https://www.kaggle.com/wuliaokaola/getting-started-tpus-new-tfrecords