Skip to content

SpeciesNet

In this example we will process trail cam videos, detect animals in them using the SpeciesNet model and store the frames with detections for further use.

Requirements#

Requires the following libraries:

Or simply install image-dataset-converter-all>=0.0.8:

pip install "image-dataset-converter-all>=0.0.8"

Since we are using Redis as a backend for exchanging images and predictions, you either have to have it installed on your system or you can run it in a Docker container as follows:

docker run --net=host --name redis-server -d redis

Either one of the SpeciesNet Docker images:

NB: If you haven't used Docker before, then have a look at this tutorial.

The following directory structure below your current directory:

  .
  |
  +- cache
  |
  +- config
  |
  +- input
  |
  +- output 

With input containing one or more video files in AVI format, and output for storing the predictions in OPEX format.

NB: MP4 files can be processed as well.

SpeciesNet model#

Start the SpeciesNet model in Redis mode:

  • CUDA
docker run --rm --gpus=all --shm-size 8G --net=host \
  -u $(id -u):$(id -g) -e USER=$USER \
  -v `pwd`:/workspace \
  -v `pwd`/cache:/.cache \
  -v `pwd`/config:/.config \
  -v `pwd`/cache:/.torch \
  -it waikatodatamining/speciesnet:4.0.1_cuda12.1
  speciesnet_predict_redis \
    --redis_in images \
    --redis_out predictions \
    --verbose
  • CPU
docker run --rm --shm-size 8G --net=host \
  -u $(id -u):$(id -g) -e USER=$USER \
  -v `pwd`:/workspace \
  -v `pwd`/cache:/.cache \
  -v `pwd`/config:/.config \
  -v `pwd`/cache:/.torch \
  -it waikatodatamining/speciesnet:4.0.1_cpu \
  speciesnet_predict_redis \
    --redis_in images \
    --redis_out predictions \
    --verbose

Processing single video file#

Determine changes between frames#

If you want to skip similar frames, e.g., leaves moving in the wind, it pays to look at the frame changes in a video using the calc-frame-changes filter. However, if you are looking for small animals like rodents, you may need to process all frames to avoid missing these.

The following command processes the file VID0003.AVI and calculates the changes between frames as a ratio (0-1) with a minimum change of 0.000001 required:

idc-convert -l INFO \
   from-video-file \
     -l INFO \
     -i "./input/VID0003.AVI" \
     -t od \
   calc-frame-changes \
     -t 0.000001

This will output a histogram similar to this:

+1.09e-06 - +3.22e-06  [57]  ████████████████████████████████████████
+3.22e-06 - +5.35e-06  [29]  ████████████████████▍
+5.35e-06 - +7.49e-06  [13]  █████████▏
+7.49e-06 - +9.62e-06  [14]  █████████▉
+9.62e-06 - +1.18e-05  [15]  ██████████▌
+1.18e-05 - +1.39e-05  [10]  ███████
+1.39e-05 - +1.60e-05  [11]  ███████▊
+1.60e-05 - +1.82e-05  [ 7]  ████▉
+1.82e-05 - +2.03e-05  [ 3]  ██▏
+2.03e-05 - +2.24e-05  [ 2]  █▍
+2.24e-05 - +2.46e-05  [ 2]  █▍
+2.46e-05 - +2.67e-05  [ 2]  █▍
+2.67e-05 - +2.88e-05  [ 1]  ▊
+2.88e-05 - +3.10e-05  [ 1]  ▊
+3.10e-05 - +3.31e-05  [ 5]  ███▌
+3.31e-05 - +3.52e-05  [ 0]
+3.52e-05 - +3.74e-05  [ 0]
+3.74e-05 - +3.95e-05  [ 0]
+3.95e-05 - +4.16e-05  [ 2]  █▍
+4.16e-05 - +4.38e-05  [ 1]  ▊

Using the threshold 5.35e-06, you will skip frames that would fall into the top two bins of the above histogram.

Extracting frames#

With the threshold for our video determined, we can now extract relevant frames and push them through the SpeciesNet model.

The following command processes VID0003.AVI from the input directory (from-video-file), determines frames that differ enough from each other (skip-similar-frames) to send to the model (redis-predict-od), removes any predictions with a score lower than 0.8 (metadata-od), discards any results from the model with no predictions (discard-negatives) and stores any images with predictions in the output directory in OPEX format:

idc-convert -l INFO \
  from-video-file \
    -l INFO \
    -i "./input/VID0003.AVI" \
    -t od \
  skip-similar-frames \
    -l INFO \
    -t 5.35e-06 \
  redis-predict-od \
    -l INFO \
    -o images \
    -i predictions \
    -t 10 \
  metadata-od \
    -l INFO \
    -f score \
    -c gt \
    -v 0.8 \
    -a keep \
  discard-negatives \
    -l INFO \
  to-opex-od \
    -l INFO \
    -o ./output

Processing multiple video files#

The from-video-file reader is not limited to just processing a single file, e.g., you can supply multiple file names to the -i option or use a glob like *.AVI. However, with different videos having different lighting and therefore different changes between frames, it makes more sense returning only every nth frame to reduce the processing time. You can either use -n/--nth_frame regardless of the frame-rate of the video or -f/--fps_factor which calculates the actual nth frame to return based on the frame-rate of the video multiplied by this factor. The command below processes all .AVI files from the input directory and retrieves one frame for every second of video footage (-f 1):

idc-convert -l INFO \
  from-video-file \
    -l INFO \
    -i "./input/*.AVI" \
    -f 1 \
    -t od \
  redis-predict-od \
    -l INFO \
    -o images \
    -i predictions \
    -t 10 \
  metadata-od \
    -l INFO \
    -f score \
    -c gt \
    -v 0.8 \
    -a keep \
  discard-negatives \
    -l INFO \
  to-opex-od \
    -l INFO \
    -o ./output

Visualizing the predictions#

If you want to quickly generate some composite images, you can use the following command. It will create a new directory overlays in which it will save the generated images.

idc-convert -l INFO \
  from-opex-od \
    -l INFO \
    -i "./output/*.json" \
  add-annotation-overlay-od \
    -l INFO \
  to-data \
    -l INFO \
    -o ./overlays