YOLOv4 on Edge TPU

Prepare int8 tflite

Supported Operation: https://coral.ai/docs/edgetpu/models-intro/#supported-operations

Use YOLOv4-Tiny because YOLOv4 model size is too large.
Use relu for activation. YOLOv4-Tiny uses leaky-relu, but this is an unsupported operation on the Edge TPU.
Set the desired input_size before converting the model to tflite.

Training

Ref: https://wiki.loliot.net/docs/lang/python/libraries/yolov4/python-yolov4-training#full-script

from tensorflow.keras import callbacks, optimizers
from yolov4.tf import SaveWeightsCallback, YOLOv4
import time

yolo = YOLOv4(tiny=True)
yolo.classes = "/content/drive/My Drive/Hard_Soft/NN/coco/coco.names"
yolo.input_size = 608
yolo.batch_size = 32

yolo.make_model(activation1="relu")

...

Convert to tflite

2021-02-21
OS: Ubuntu 20.04
TF: v2.4.1
yolov4: v3.1.0

Use yolov4-tiny-relu-tpu.cfg

위험

Don't confuse yolov4-iny-relu.cfg and yolov4-tiny-relu-tpu.cfg

from yolov4.tf import YOLOv4, YOLODataset, save_as_tflite

yolo = YOLOv4()

yolo.config.parse_names("test/coco.names")
yolo.config.parse_cfg("config/yolov4-tiny-relu-tpu.cfg")

yolo.make_model()
yolo.load_weights(
    "/home/hhk7734/NN/yolov4-tiny-relu.weights", weights_type="yolo"
)

dataset = YOLODataset(
    config=yolo.config,
    dataset_list="/home/hhk7734/NN/val2017.txt",
    image_path_prefix="/home/hhk7734/NN/val2017",
    training=False,
)

save_as_tflite(
    model=yolo.model,
    tflite_path="yolov4-tiny-relu-int8.tflite",
    quantization="full_int8",
    dataset=dataset,
)

$ edgetpu_compiler -sa yolov4-tiny-relu-int8.tflite
Edge TPU Compiler version 15.0.340273435

Model compiled successfully in 758 ms.

Input model: yolov4-tiny-relu-int8.tflite
Input size: 5.91MiB
Output model: yolov4-tiny-relu-int8_edgetpu.tflite
Output size: 5.98MiB
On-chip memory used for caching model parameters: 5.83MiB
On-chip memory remaining for caching model parameters: 1.78MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 2
Total number of operations: 51
Operation log: yolov4-tiny-relu-int8_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 46
Number of operations that will run on CPU: 5

Operator                       Count      Status

RESIZE_BILINEAR                1          Operation version not supported
DEQUANTIZE                     4          Operation is working on an unsupported data type
CONV_2D                        21         Mapped to Edge TPU
QUANTIZE                       8          Mapped to Edge TPU
PAD                            2          Mapped to Edge TPU
CONCATENATION                  7          Mapped to Edge TPU
LOGISTIC                       2          Mapped to Edge TPU
SPLIT                          3          Mapped to Edge TPU
MAX_POOL_2D                    3          Mapped to Edge TPU

rsync yolov4-tiny-relu-int8_edgetpu.tflite coco.names yolov4-tiny-relu-tpu.cfg \
    mendel@<tpu ip>:~

from yolov4.tf import YOLOv4

yolo = YOLOv4(tiny=True, tpu=True)

yolo.classes = "coco.names"
yolo.input_size = (512, 384) # width, height
yolo.make_model(activation1="relu")

yolo.load_weights("yolov4-tiny-relu.weights", weights_type="yolo")

dataset = yolo.load_dataset(
    "train2017.txt",
    training=False,
    image_path_prefix="/home/hhk7734/NN/train2017"
)

yolo.save_as_tflite(
    "yolov4-tiny-relu-int8.tflite",
    quantization="full_int8",
    data_set=dataset,
    num_calibration_steps=400
)

$ edgetpu_compiler -sa yolov4-tiny-relu-int8.tflite
Edge TPU Compiler version 15.0.340273435

Model compiled successfully in 1211 ms.

Input model: yolov4-tiny-relu-int8.tflite
Input size: 5.94MiB
Output model: yolov4-tiny-relu-int8_edgetpu.tflite
Output size: 6.33MiB
On-chip memory used for caching model parameters: 6.06MiB
On-chip memory remaining for caching model parameters: 1.66MiB
Off-chip memory used for streaming uncached model parameters: 3.38KiB
Number of Edge TPU subgraphs: 3
Total number of operations: 132
Operation log: yolov4-tiny-relu-int8_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 101
Number of operations that will run on CPU: 31

Operator                       Count      Status

PAD                            2          Mapped to Edge TPU
SPLIT                          7          Mapped to Edge TPU
ADD                            6          Mapped to Edge TPU
SUB                            6          Mapped to Edge TPU
QUANTIZE                       27         Mapped to Edge TPU
QUANTIZE                       6          Operation is otherwise supported, but not mapped due to some unspecified limitation
DEQUANTIZE                     6          Operation is working on an unsupported data type
EXP                            6          Operation is working on an unsupported data type
CONCATENATION                  9          Mapped to Edge TPU
MAX_POOL_2D                    3          Mapped to Edge TPU
CONV_2D                        21         Mapped to Edge TPU
LOGISTIC                       2          Mapped to Edge TPU
MUL                            18         Mapped to Edge TPU
RESIZE_BILINEAR                1          Operation version not supported
SPLIT_V                        12         Operation not supported

rsync yolov4-tiny-relu-int8_edgetpu.tflite coco.names mendel@<tpu ip>:~

Run on Edge TPU

Install the Edge TPU runtime

Ref: https://coral.ai/docs/accelerator/get-started/#1-install-the-edge-tpu-runtime

Install just the TensorFlow Lite interpreter

Ref: https://www.tensorflow.org/lite/guide/python#install_just_the_tensorflow_lite_interpreter

Run example script

edge_yolov4_tiny_video_test.py
import cv2

from yolov4.tflite import YOLOv4

yolo = YOLOv4()

yolo.config.parse_names("coco.names")
yolo.config.parse_cfg("yolov4-tiny-relu-tpu.cfg")

yolo.summary()

yolo.load_tflite("yolov4-tiny-relu-int8_edgetpu.tflite")

yolo.inference(
    "/dev/video1",
    is_image=False,
    cv_apiPreference=cv2.CAP_V4L2,
    cv_frame_size=(640, 480),
    cv_fourcc="YUYV",
)

edge_yolov4_tiny_video_test.py
from yolov4.tflite import YOLOv4
import cv2

yolo = YOLOv4(tiny=True, tpu=True)

yolo.classes = "coco.names"

yolo.load_tflite("yolov4-tiny-relu-int8_edgetpu.tflite")

yolo.inference(
    "/dev/video1",
    is_image=False,
    cv_apiPreference=cv2.CAP_V4L2,
    cv_frame_size=(640, 480),
    cv_fourcc="YUYV",
)

python3 edge_yolov4_tiny_video_test.py

Prepare int8 tflite​

Training​

Convert to tflite​

Run on Edge TPU​

Install the Edge TPU runtime​

Install just the TensorFlow Lite interpreter​

Run example script​

Prepare int8 tflite

Training

Convert to tflite

Run on Edge TPU

Install the Edge TPU runtime

Install just the TensorFlow Lite interpreter

Run example script