mlpack

๐Ÿ”— YOLOv3 and YOLOv3Tiny

The YOLOv3 and YOLOv3Tiny classes implement the models from the paper โ€œYOLOv3: An Incremental Improvementโ€. YOLOv3 is a simple object detection algorithm that takes in an image and predicts multiple bounding boxes in a single forward pass of the neural network.

NOTE: At the current time, only prediction is supported by the YOLOv3 and YOLOv3Tiny classes. Support for training and fine-tuning is in progress.

Simple usage example:

NOTE: You must define the MLPACK_ENABLE_ANN_SERIALIZATION macro to load a YOLOv3 model from disk.

// Download: https://models.mlpack.org/yolo/yolov3-320-coco-f64.bin
mlpack::YOLOv3 model;
mlpack::Load("yolov3-320-coco-f64.bin", model);

// Download: https://models.mlpack.org/yolo/dog.jpg
arma::mat inputImage, outputImage;
mlpack::ImageOptions opts;
mlpack::Load("dog.jpg", inputImage, opts);

// Preprocess the `inputImage`, predict bounding boxes using `YOLOv3`
// and draw them onto `outputImage`.
model.Predict(inputImage, opts, outputImage, true);

// Save to "output.jpg".
mlpack::Save("output.jpg", outputImage, opts);

More examples...

dog, bicycle and truck

See also:

๐Ÿ”— Constructors

Construct a YOLOv3 object using one of the constructors below. Defaults and types are detailed in the

๐Ÿ”— Predicting Bounding Boxes

Once the weights are loaded, you can compute likely object bounding boxes with with Predict().

name type description default
image arma::mat Input image. n/a
opts ImageOptions Image metadata. n/a
output arma::mat Output: either an output image with bounding boxes or raw outputs of the model, depending on drawBoxes. n/a
drawBoxes bool If true, copy image to output and draw bounding boxes; otherwise, simply return raw bounding boxes and class probabilities in output. false
ignoreThreshold double Minimum confidence to have the corresponding bounding box drawn onto output, if drawBoxes is true. 0.7

name type description
input MatType Input image. See example for details on preprocessing the input image
output MatType Raw outputs of the model

๐Ÿ”— Other Functionality

๐Ÿ”— YOLOv3Tiny

YOLOv3Tiny is a smaller object detection model based off of YOLOv3 but for low-resource machines. YOLOv3Tiny has an identical API to YOLOv3 so it can be used as a drop-in replacement. YOLOv3 has ~60 million parameters, while YOLOv3Tiny has ~8 million.

Pretrained weights are also included for YOLOv3Tiny.

Simple example usage of YOLOv3Tiny

// Step 1: load the pretrained `YOLOv3Tiny` weights.
// Download: https://models.mlpack.org/yolo/yolov3-tiny-416-coco-f64.bin
mlpack::YOLOv3Tiny model;
mlpack::Load("yolov3-tiny-416-coco-f64.bin", model);

// Step 2: load the image.
// Download: https://models.mlpack.org/yolo/dog.jpg
arma::mat image, outputImage;
mlpack::ImageOptions opts;
mlpack::Load("dog.jpg", image, opts);

// Step 3: Preprocess the input image, detect bounding boxes and
// draw them onto `outputImage`.
model.Predict(image, opts, outputImage, true);

// Step 4: Save to "output.jpg".
mlpack::Save("output.jpg", outputImage, opts, true);

๐Ÿ”— Pretrained weights

Because training a YOLOv3 model from scratch is time-consuming, a number of pretrained models are available for download.

The pretrained weights were trained on the COCO dataset.

The format for the name of each YOLOv3 pretrained model is <model name>-<image size>-<finetuned dataset name>-<matrix type>.bin.

arma::fmat weights (e.g 32-bit precision) are also available. These weights need custom template behaviour.

An increased image size means the model will be able to better detect smaller objects at the cost of speed. Similarly, smaller matrix types allow for faster loading of models and faster inference times.

When using YOLOv3, different image sizes will affect how many possible boxes the model can output. For example, yolov3-320-coco-f64.bin will output 6300 possible boxes. Below is a table with all the pretrained models and how many possible boxes they output.

Model Image Size Number of Boxes
yolov3 320 6300
yolov3 416 10647
yolov3 608 22743
yolov3-tiny 416 2535

The pretrained models available were all finetuned on the COCO dataset. A link to all the COCO class names is available too.

๐Ÿ”— Simple Examples

See also the simple usage example for a trivial usage of the YOLOv3 class.

NOTE: You must define the MLPACK_ENABLE_ANN_SERIALIZATION macro to serialize and deserialize models that use arma::mat as the data type.

Simple example loading the image, passing it to Predict() and saving the output.

// Step 1: load the pretrained weights.
// Download: https://models.mlpack.org/yolo/yolov3-320-coco-f64.bin
mlpack::YOLOv3 model;
mlpack::Load("yolov3-320-coco-f64.bin", model);

// Step 2: load the image.
// Download: https://models.mlpack.org/yolo/dog.jpg
arma::mat inputImage, rawOutput;
mlpack::ImageOptions opts;
mlpack::Load("dog.jpg", inputImage, opts);

// Step 3: Preprocess the `inputImage`, predict bounding boxes using `YOLOv3`.
// Set `drawBoxes` to false in order to store raw outputs in `rawOutput`.
model.Predict(inputImage, opts, rawOutput, false);

// Step 4: Inspect the first possible bounding box.
std::cout << "First bounding box: [" << (size_t) rawOutput(0, 0) << ", "
    << (size_t) rawOutput(1, 0) << ", " << (size_t) rawOutput(2, 0) << ", "
    << (size_t) rawOutput(3, 0) << "]." << std::endl;

Example of doing manual preprocessing on the input image, and getting raw output of the model.

// Step 1: load the pretrained model.
// Download: https://models.mlpack.org/yolo/yolov3-320-coco-f64.bin
mlpack::YOLOv3 model;
mlpack::Load("yolov3-320-coco-f64.bin", model);

// Step 2: load the image.
// Download: https://models.mlpack.org/yolo/dog.jpg
arma::mat inputImage, preprocessedImage, rawOutput;
mlpack::ImageOptions opts;
mlpack::Load("dog.jpg", inputImage, opts);

// Step 3: preprocess the image.
// Normalize pixel values to be between 0-1.
preprocessedImage = inputImage / 255.0;

// Change the dimensions of the image to the model's input dimensions while
// keeping the aspect ratio of the original image using `LetterboxImages`.
mlpack::ImageOptions preprocessedOpts = opts;
const size_t imgSize = model.ImageSize();
const double greyValue = 0.5;
LetterboxImages(preprocessedImage, preprocessedOpts, imgSize, imgSize, greyValue);

// Change the layout of the channels such that they're grouped.
preprocessedImage = GroupChannels(preprocessedImage, preprocessedOpts);

// Step 4: detect objects in the image.
// Get raw output from model and store in `rawOutput`.
model.Predict(preprocessedImage, rawOutput);

// Step 5: Inspect the first possible bounding box.
std::cout << "First bounding box: [" << (size_t) rawOutput(0, 0) << ", "
    << (size_t) rawOutput(1, 0) << ", " << (size_t) rawOutput(2, 0) << ", "
    << (size_t) rawOutput(3, 0) << "]." << std::endl;

Example of predicting and drawing with multiple images simultaneously. NOTE: in this example, each image must have the same dimensions.

// Step 1: load the pretrained model.
// Download: https://models.mlpack.org/yolo/yolov3-320-coco-f64.bin
mlpack::YOLOv3 model;
mlpack::Load("yolov3-320-coco-f64.bin", model);

// Step 2: load the images.
// Download: https://models.mlpack.org/yolo/dog.jpg
// Download: https://models.mlpack.org/yolo/cat.jpg
// Download: https://models.mlpack.org/yolo/fish.jpg
arma::mat inputImages, outputImages;
mlpack::ImageOptions opts;

std::vector<std::string> inputFiles = {"dog.jpg", "cat.jpg", "fish.jpg"};
mlpack::Load(inputFiles, inputImages, opts);

// Step 3: Preprocess each `inputImages`, detect bounding boxes and
// draw them onto each `outputImages`.
// Each column is a seperate image.
model.Predict(inputImages, opts, outputImages, true);

// Step 4: Save each image.
std::vector<std::string> outputFiles = {"1.jpg", "2.jpg", "3.jpg"};
mlpack::Save(outputFiles, outputImages, opts);
dog, bicycle and truck cat fish

๐Ÿ”— Advanced Functionality: Template Parameters

The YOLOv3 and YOLOv3Tiny classes also support using different element types to represent weights and predictions. The full signature of YOLOv3 is

YOLOv3<MatType>

The example below uses YOLOv3 using the arma::fmat weights.

NOTE: You must define the MLPACK_ENABLE_ANN_SERIALIZATION_FMAT macro to serialize and deserialize models that use arma::fmat as the data type.

// Step 1: load the pretrained arma::fmat weights.
// Download: https://models.mlpack.org/yolo/yolov3-320-coco-f32.bin
mlpack::YOLOv3<arma::fmat> model;
mlpack::Load("yolov3-320-coco-f32.bin", model);

// Step 2: load the image into an arma::fmat.
// Download: https://models.mlpack.org/yolo/dog.jpg
// Note: the image type must also be `arma::fmat`
arma::fmat image, outputImage;
mlpack::ImageOptions opts;
mlpack::Load("dog.jpg", image, opts);

// Step 3: Preprocess the input image, detect bounding boxes and
// draw them onto `outputImage`.
model.Predict(image, opts, outputImage, true);

// Step 4: Save to "output.jpg".
mlpack::Save("output.jpg", outputImage, opts, true);

Since YOLOv3Tiny has an identical API, we can use the arma::fmat weights for faster inference times.

// Download: https://models.mlpack.org/yolo/yolov3-tiny-416-coco-f32.bin
mlpack::YOLOv3Tiny<arma::fmat> model;
mlpack::Load("yolov3-tiny-416-coco-f32.bin", model);