2.4. Training for Keypoint Detection¶
This tutorial covers training a neural network model on a GPU server to perform keypoint detection. Specifically, we will train a single-person pose estimation model.
2.4.1. Preparation¶
2.4.1.1. Dataset¶
The COCO dataset is available from the official website. Although COCO dataset is for multi-person pose estimation challenge track. We customize this dataset to train a single-person pose estimation task.
Blueoil supports only 1 format for keypoint detection.
MSCOCO format
For details, please refer to MSCOCO_2017 keypoint detection section in Prepare training dataset
2.4.2. Generate a configuration file¶
Note: Below instructions assume your current path is blueoil root path and the MSCOCO_2017 dataset is successfully at /storage/dataset/MSCOCO_2017
. .
Generate your model configuration file interactively by running the blueoil init
command.
$ docker run --rm -it \
-v /storage/dataset/MSCOCO_2017:/home/blueoil/MSCOCO_2017 \
-v $(pwd)/config:/home/blueoil/config \
blueoil_$(id -un):{TAG} \
blueoil init -o config/keypoint_detection_demo.py
The {TAG}
value must be set to a value like v0.20.0-11-gf1e07c8
that can be obtained with the docker images
command.
This value depends on your environment.
Below is an example of initialization.
#### Generate config ####
your model name (): keypoint_detection_demo
choose task type: keypoint_detection
choose network: LmSinglePoseV1Quantize
choose dataset format: Mscoco for Single-Person Pose Estimation
training dataset path: /home/blueoil/MSCOCO_2017/
set validation dataset? (if answer no, the dataset will be separated for training and validation by 9:1 ratio.): yes
validation dataset path: /home/blueoil/MSCOCO_2017/
batch size (integer): 4
image size (integer x integer): 160x160
how many epochs do you run training (integer): 100
select optimizer: Adam
initial learning rate: 0.001
choose learning rate schedule ({epochs} is the number of training epochs you entered before): '3-step-decay-with-warmup' -> warmup learning rate 1/1000 in first epoch, then train the same way as '3-step-decay'
enable data augmentation? (Y/n): Yes
Please choose augmentors: done (6 selections)
-> select Blur, Brightness, Color, Contrast, FlipLeftRight, Hue
apply quantization at the first layer? (Y/n): no
If configuration finishes, the configuration file is generated in the keypoint_detection_demo.py
under config directory.
2.4.3. Train a network model¶
Train your model by running blueoil train
with a model configuration.
$ docker run --rm \
--gpus '"device=0"' \
-e DATA_DIR=/home/blueoil \
-e OUTPUT_DIR=/home/blueoil/saved \
-v /storage/dataset/MSCOCO_2017:/home/blueoil/MSCOCO_2017 \
-v $(pwd)/config:/home/blueoil/config \
-v $(pwd)/saved:/home/blueoil/saved \
blueoil_$(id -un):{TAG} \
blueoil train -c config/keypoint_detection_demo.py
Just like init, set the value of {TAG}
to the value obtained by docker images
.
Change the value of --gpus '"device=<gpu indices>"'
according to your environment.
For detail check the official documentation.
When training has started, the training log and checkpoints are generated under ./saved/{MODEL_NAME}
.
The value of {MODEL_NAME}
will be {Configuration file}_{TIMESTAMP}
.
Training runs on the TensorFlow backend. So you can use TensorBoard to visualize your training process.
$ docker run --rm \
-p 6006:6006 \
-v $(pwd)/saved:/home/blueoil/saved \
blueoil_$(id -un):{TAG} \
tensorboard --logdir=saved/{MODEL_NAME}
loss / metrics

Note: loss consists of global_loss and refine_loss. global_loss is for all keypoints, refine_loss is for some difficult keypoints.
input / labels_heatmap / output_heatmap

visualize_labels, visualize_output

2.4.4. Convert training result to FPGA ready format.¶
Convert trained model to executable binary files for x86, ARM, and FPGA. Currently, conversion for FPGA only supports Intel Cyclone® V SoC FPGA.
$ docker run --rm \
-e CUDA_VISIBLE_DEVICES=-1 \
-e OUTPUT_DIR=/home/blueoil/saved \
-v $(pwd)/saved:/home/blueoil/saved \
blueoil_$(id -un):{TAG} \
blueoil convert -e {MODEL_NAME}
blueoil convert
automatically executes some conversion processes.
Converts Tensorflow checkpoint to protocol buffer graph.
Optimizes graph.
Generates source code for executable binary.
Compiles for x86, ARM and FPGA.
CUDA_VISIBLE_DEVICES=-1 means converting with CPU
If conversion is successful, output files are generated under
./saved/{MODEL_NAME}/export/save.ckpt-{Checkpoint No.}/{Image size}/output
.
output
├── fpga
│ ├── preloader-mkpimage.bin
│ ├── soc_system.dtb
│ └── soc_system.rbf
├── models
│ ├── lib
│ │ ├── libdlk_aarch64.so
│ │ ├── libdlk_arm.so
│ │ ├── libdlk_aarch64.a
│ │ ├── libdlk_arm.a
│ │ ├── libdlk_fpga.a
│ │ ├── libdlk_x86.a
│ │ ├── libdlk_fpga.so
│ │ ├── libdlk_x86.so
│ │ ├── lm_aarch64.elf
│ │ ├── lm_arm.elf
│ │ ├── lm_fpga.elf
│ │ └── lm_x86.elf
│ ├── meta.yaml
│ └── minimal_graph_with_shape.pb
├── python
│ ├── lmnet
│ │ ├── common.py
│ │ ├── data_augmentor.py
│ │ ├── data_processor.py
│ │ ├── __init__.py
│ │ ├── nnlib.py
│ │ ├── post_processor.py
│ │ ├── pre_processor.py
│ │ ├── tensorflow_graph_runner.py
│ │ ├── utils
│ │ │ ├── box.py
│ │ │ ├── config.py
│ │ │ ├── demo.py
│ │ │ ├── image.py
│ │ │ ├── __init__.py
│ │ │ └── output.py
│ │ └── visualize.py
│ ├── motion_jpeg_server_from_camera.py
│ ├── README.md
│ ├── requirements.txt
│ ├── run.py
│ └── usb_camera_demo.py
└── README.md
2.4.5. Run inference script on x86 Linux (Ubuntu 16.04)¶
Prepare images for inference (not included in the training dataset)
You can find test imgaes on Creative Commons. Sample
$ wget https://live.staticflickr.com/4117/4881737773_0e3ab67b6f_b.jpg
Run the inference script.
Explore into the
output/python
directory, and runrun.py
and the inference result is saved in./output/output.json
.Note: If you run the script for the first time, you have to setup a python environment (2.7 or 3.5+) and required python packages.
$ cd {output/python directory} $ sudo pip install -r requirements.txt # for the first time only $ python run.py \ -i {inference image path} \ -m ../models/lib/libdlk_x86.so \ -c ../models/meta.yaml
Tips: The default confidence_threshold for keypoint detection is
0.1
. If only a part of a human body is in inference_image, but there are some redundant keypoints visulized. Please consider enlarging this value inoutput/models/meta.yaml
.POST_PROCESSOR: - GaussianHeatmapToJoints: confidence_threshold: 0.1
Check inference results. An example output file is shown below.
{
"benchmark": {
"inference": 0.12950640000000008,
"post": 0.002501737000000004,
"pre": 0.001524048,
"total": 0.13353218500000008
},
"classes": [
{
"id": 0,
"name": "person"
}
],
"date": "2019-12-17T08:54:58.835100+00:00",
"results": [
{
"file_path": "4881737773_0e3ab67b6f_b.jpg",
"prediction": {
"joints": [
308,
376,
376,
274,
239,
376,
274,
479,
...
]
}
}
],
"task": "IMAGE.KEYPOINT_DETECTION",
"version": 0.2
}