ObjectNet

ObjectNet Challenge Documentation

View the Project on GitHub abarbu/objectnet-challenge-doc-ibm-dev

ObjectNet logo

Challenge Portal

Visit the ObjectNet Challenge Portal to register for the challenge.

ObjectNet Support

Experiencing a problem or just have a general question, see ObjectNet Support

ObjectNet Challenge:

Creating your Docker image from the TensorFlow template

These instructions describe how to build a docker image using the TensorFlow deep learning framework for the ObjectNet Challenge. It assumes you already have a pre-trained TensorFlow model which you intend to submit for evaluation to the ObjectNet Challenge.

If your model is built using a different framework, go to the relevant documentation:

These instructions are split into two sections:

Section 1: ObjectNet competition example model and code

The following section provides example code and a baseline model for the ObjectNet Challenge. The code is structured such that most existing TensorFlow models can be plugged into the example with minimal code changes necessary.

Note: The example code uses batching and parallel data loading to improve inference efficiency. If you are building your own customized docker image with your own code it is highly recommended to use similar optimized inferencing techniques to ensure your submission will complete within the time limit set by the challenge organisers.

1.1 Requirements

The following libraries are required to run this example and must be installed on the local test machine. The same libraries will be automatically installed into the Docker image when the image is built.

For example, you could set up a conda environment with the necessary requirements with a few simple lines. This environment would be named objectnet_env.

  conda create -n objectnet_env python=3.7 cudatoolkit=11.0
  conda activate objectnet_env
  pip install --upgrade pip
  pip install tensorflow pillow

Alternatively, you can follow the instructions here to start running TensorFlow in a docker image.

1.2 Install NVIDIA drivers

If your local machine has NVIDIA-capable GPUs and you want to test your docker image locally using these GPUs then you will need to ensure the NVIDIA drivers have been installed on your test machine.

Instructions on how to install CUDA toolkit and NVIDIA drivers can be found here, as well as instructions for cuDNN here. Be sure to match the versions of CUDA/NVIDIA installed with the version of TensorFlow and CUDA used to build your docker image - see Building the docker image.

1.3 Clone git repository containing example

Clone the following git repo to a machine which has docker installed:

$ git clone https://github.com/abarbu/objectnet-template-tensorflow.git

This repo comes with python scripts to perform batch inference using a sample model, validate and score the inferences and also contains a set of test images (input/images) and a file containing ground truth data for those images (input/answers/answers-test.json). You will need to download the sample model (ResNet50) used in this example (see 1.6 Testing the example)

1.4 Running objectnet_eval.py

objectnet_eval.py is the main entry point for running this example; it essentially performs batch inference against all images in a supplied input directory (images-dir). Full help is available using objectnet_eval.py --help:

usage: objectnet_eval.py [-h] [--gpus N] [--workers N] [--batch_size N]
  [--softmax T/F] [--convert_outputs_mode N]
  images-dir output-file model-class-name
  model-checkpoint

Evaluate a TensorFlow model on ObjectNet images and output predictions to a
CSV file.

positional arguments:
images-dir            path to dataset
output-file           path to predictions output file
model-class-name      model class name in model_description.py
model-checkpoint      path to model checkpoint

optional arguments:
-h, --help            show this help message and exit
--gpus N              number of GPUs to use
--workers N           number of data loading workers (default: total num
 CPUs)
--batch_size N        mini-batch size (default: 64), this is the batch size
 of each GPU on the current node when using Data
 Parallel or Distributed Data Parallel
--softmax T/F         apply a softmax function to network outputs to convert
 output magnitudes to confidence values (default:True)
--convert_outputs_mode N
 0: no conversion of prediction IDs, 1: convert from ImageNet prediction IDs to ObjectNet
 prediction IDs (default:1)
Note: The default values for `workers` and `batch_size` are tuned for this example. Please do not modify these properties when making an ObjectNet submission using the sample code.

1.5 Code structure

There follows a description of the code structure used in this repo.

./objectnet_eval.py:

./objectnet_iterator.py:

Inside of the model directory: (This is the only code that you will have to modify):

./model/model_description.py:

./model/data_transform_description.py:

./input/images:

./input/answers/answers-test.json:

1.6 Testing the example

Before executing the example for the first time you must download the sample model as shown below:

# Download the model:
$ cd objectnet-template-tensorflow
$ mkdir downloads
$ cd downloads
$ wget https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5
$ cp resnet50_weights_tf_dim_ordering_tf_kernels.h5 ../model
$ cd ..

Note: The downloads/ directory is used to store downloaded models so they only need to be downloaded once. If you want to use a model checkpoint which is in downloads/, make sure to copy it to model/ as shown in the second last line above. This way, model/ can be kept with only one active model at once, and downloads/ can be used as storage for all models.

Use DemoResNet50 as the model-class-name argument and model/resnet50_weights_tf_dim_ordering_tf_kernels.h5 as the model-checkpoint argument to the objectnet_eval.py script to test the example model:

# Perform batch inference:
$ python3 objectnet_eval.py input/images output/predictions.csv DemoResNet50 model/resnet50_weights_tf_dim_ordering_tf_kernels.h5

**** params ****
images input/images
output_file output/predictions.csv
model_class_name DemoResNet50
model_checkpoint model/resnet50_weights_tf_dim_ordering_tf_kernels.h5
gpus 2
workers 1
batch_size 64
softmax True
convert_outputs_mode 1
****************

Number of devices: 2
Model: "resnet50"

Done. Number of predictions:  10

Results will be written to the predictions.csv file in the output/ directory. Check the output conforms to the format expected by the ObjectNet Challenge.

1.7 Modifying the code to use your own TensorFlow model

You can plugin your own existing TensorFlow model into the template. There are a few considerations to keep in mind, which are listed below.

1.7.1 Requirements

If you want to use a python package that is not included in the default TensorFlow Docker container, then it needs to be listed in the requirements.txt file so that it is 'pip installed' when the docker image is built. Include it as follows:

# This file specifies python dependencies which are to be installed into the Docker image.
# List one library per line (not as a comment)
# e.g.
#numpy
scipy

1.7.2 Template changes

The only code changes necessary when incorporating your TensorFlow model should be in the model/ directory.

  1. Before downloading your model checkpoint file, remove the existing checkpoint file from the model/ directory. For example:
    $ rm -rf model/resnet50_weights_tf_dim_ordering_tf_kernels.h5
  2. Download your model checkpoint file and copy into model/. For example:
  3. $ cp my_model.h5 /model
    
    Note: When your docker image is submitted to the challenge for evaluation the image will not have internet access and as such will not be able to download model checkpoints from the internet. For this reason it is essential that your model is included in the built docker image.

  4. Add your model description as a class to model/model_description.py. The class name will be used as the model-class-name argument to objectnet_eval.py.
  5. Amend the following parameters in data_transformation_description.py to match those that your model was trained on.
  6. Test your model's inference using the test images and ground-truth data provided in the objectnet-template-TensorFlow:
  7. $ python3 objectnet_eval.py input/images output/predictions.csv MyModel model/my_model.h5
    
      **** params ****
      images input/images
      output_file output/predictions.csv
      model_class_name MyModel
      model_checkpoint model/my_model.h5
      workers 16
      gpus 2
      batch_size 96
      softmax True
      convert_outputs_mode 1
      ****************
    
      Number of devices: 2
      Model: "MyModel"
    
      Done. Number of predictions:  10
    
    Note: If you want to run inference again or with another model, you will first have to delete the predictions output file.
    $ rm output/predictions.csv
    

1.8 Validating the predictions of your model

In order to ensure that the predictions.csv file is structured according to the ObjectNet Challenge specifications, it is important to validate the output using the validate_and_score.py script provided in the objectnet-template-tensorflow repo. Once your model has successfully executed run the following command to validate your output:

$ python3 validate_and_score.py -a input/answers/answers-test.json -f output/predictions.csv

Note the usage of the -a and -f flags as specified in validate_and_score.py --help below.

usage: validate_and_score.py [-h] --answers ANSWERS --filename FILENAME
                              [--no-range-check]
  optional arguments:
    -h, --help            show this help message and exit
    --answers ANSWERS, -a ANSWERS
                          ground truth/answer file
    --filename FILENAME, -f FILENAME
                          users result file
    --no-range-check, -n  allow entries that have out-of-range label indices

Proceed to Section 2 if you receive an output of "prediction_file_status": "VALIDATED".

If you received an error in running this command ensure that you have entered the correct file locations for the answer file as well as the result file. For clarification on result file structure refer to the evaluation criteria on the challenge page.


Section 2: Building the docker image

2.1 Install the Docker engine

To build and test a docker image locally you will first need to install the docker engine. Follow the instructions on installing docker, along with a quick start guide.

2.2 Install NVIDIA drivers

Prior to uploading the docker image to the competition portal for evaluation you should test your docker image locally. If your local machine has NVIDIA-capable GPUs and you wish to test inference using GPUs then you will first need to install the NVIDIA drivers on your machine. See section 1.2 Install NVIDIA drivers above.

2.3 Add your model & supporting code

Ensure you have been able to successfully test your model on the local host using the objectnet_eval.py example code - see section 1.8 Validating the predictions of your model for more details.

2.4 Build the docker image

Docker images are built from a series of statements contained in a Dockerfile. A template Dockerfile is provided for models built using the TensorFlow deep learning framework.

The TensorFlow docker image template for the ObjectNet Challenge uses one of the official TensorFlow docker images as its base image. These TensorFlow images come with built-in GPU support and with python 3 pre-loaded.

Note: Docker images submitted to the ObjectNet Challenge must be based on a GPU enabled base image and use GPUs for inferencing.

To improve performance the example code batches up inferencing of the ObjectNet images and execute a number of streams (or workers) in parallel.

You can further customise the build of you docker container by specifying the following arguments at docker build time:

A bash script, build-docker-submission.sh, has been created to build the Docker image for you. The script has the following inputs:

This command runs builds your model into a Docker Image
  Docker Image will be set to IMAGE:TAG
  
  Default
  TAG="latest"
  TENSORFLOW_VERSION="2.3.0"
  
  options:
  -h, --help				show brief help
  -v, --tensorflow-version=TF_VERSION	specify a tensorflow version to use
  -n, --model-class-name=NAME		specify a model class name to use
  -c, --model-checkpoint=CHECKPOINT	specify the path to a model checkpoint to use
  -i, --image=IMAGE			specify your Docker image
  -t, --tag=TAG			        specify your Docker image tag
  -nc, --no-cache			bypass cache for docker build/pre>

Create your image by running:

./build-docker-submission.sh -i IMAGE -t TAG -n NAME -c CHECKPOINT -v TF_VERSION

For example, to build a docker image (called 'my_model' with a tag of 'version1') containing the model parameters specified above:

Note: To save space in the built docker image:

Once the build is complete your newly built docker image can be listed using the command:

$ docker images

If the docker was built without version tagging it is given a default tag of latest.

2.5 Testing the docker image locally

Test the docker image locally before submitting it to the challenge. For example, a docker image called my-model:version1 is run by:

# First remove the output file
$ rm output/predictions.csv

# Now run the docker image
$ docker run -ti --rm --gpus=all -v $PWD/input/images:/input/ -v $PWD/output:/output my_model:version1

**** params ****
images /input
output_file /output/predictions.csv
model_class_name MyModel
model_checkpoint /workspace/model/my_model.h5
workers 16
gpus 2
batch_size 96
softmax True
convert_outputs_mode 1
****************

initializing model ...
loading pretrained weights from disk ...
Done. Number of predictions:  10

The -v $PWD/input/images:/input mounts a directory of test images from the local path into /input within the docker container. Similarly, -v $PWD/output:/output mounts an output directory from the local path into /output of the container. Add the --gpus=all parameter to the docker run command in order to utilise your GPUs.

A successful run will result in a predicitions.csv file written to the $PWD/output path.

2.6 Debugging your docker image locally

If there are errors during the previous step then you will need to debug your docker container. If you make changes to your code there is no need to rebuild the docker container. To quickly test your new code, simply mount the root path of this repo as a volume when you run the container. For example:

$ docker run -ti --rm --gpus=all -v $PWD:/workspace -v $PWD/input/images:/input/ -v $PWD/output:/output --entrypoint /bin/bash my-model:version1

When the docker container is run, the local $PWD will be mounted over /workspace directory within the docker image which effectively means any code/model changes made since the last docker build command will be contained within the running container.

2.7 Validating the predictions

In order to ensure that the predictions.csv file is structured according to the ObjectNet Challenge specifications, it is important to validate it against the validate_and_score.py script. Run the following command:

$ python3 validate_and_score.py -a input/answers/answers-test.json -f output/predictions.csv

{
  "accuracy": 20.0,
  "images_scored": 10,
  "prediction_file_errors": [],
  "prediction_file_status": "VALIDATED",
  "top5_accuracy": 40.0,
  "total_images": 10
}

A correctly formatted predictions.csv file will result in an output of "prediction_file_status": "VALIDATED". Otherwise, refer back to 1.8 Validating the predictions to handle any errors.

Once the output of your model has been validated, you are ready to submit the docker image to the ObjectNet Challenge.