Frigate object detection on the Dragon Q6A's DSP

Tech Arch1tect
7 min read

I had a play with object detection on the Radxa Dragon Q6A. This was a quick-and-dirty style 'project' (just an evening with a new SBC and an AI chatbot!).

The goal was to run Frigate with its object detection running on the board's DSP instead of the CPU. In my (limited) testing the DSP is around 10x faster for this than the CPU. I observed a single inference reducing from ~90ms (CPU) to ~8ms (DSP).

What the Q6A needs first

  • Armbian (I'm using edge 7.0.10-edge-qcs6490 with Ubuntu 26.04 userspace)
  • FastRPC support (built into Armbian)

Both DSPs should be running:

$ for r in /sys/class/remoteproc/remoteproc*; do echo "$(cat $r/name)=$(cat $r/state)"; done
adsp=running
cdsp=running

The custom Frigate image is built to match the host's cDSP firmware version:

$ strings /lib/firmware/qcom/qcm6490/cdsp.mbn | grep QC_IMAGE_VERSION_STRING
QC_IMAGE_VERSION_STRING=CDSP.HT.2.5.c3-00134-KODIAK-1

The model

I'm using YOLOv8 in ONNX format, quantised to w8a8. I used Qualcomm's AI Hub for this (maybe I'll cover this in another post)

The layout

Here's a tree view of the files involved:

.
├── config
│   ├── coco-80.txt            # generated from the model's labels.txt
│   └── config.yml
├── detectors
│   └── qnn.py
├── docker-compose.yml
├── Dockerfile
└── model
    └── yolov8_det-onnx-w8a8   # from Ai Hub
        ├── labels.txt
        ├── metadata.json
        ├── yolov8_det.data
        └── yolov8_det.onnx

The detector plugin

detectors/qnn.py is the custom frigate detector that actually runs the model on the DSP, through ONNX Runtime's QNN execution provider. This detetor was almost entirely generated by AI, with my input purely being high level guidance (providing docs, detector examples, etc) and minor corrections.

import json
import logging
import os

import numpy as np
from typing import Literal

from frigate.detectors.detection_api import DetectionApi
from frigate.detectors.detector_config import BaseDetectorConfig

import onnxruntime as ort
import onnxruntime_qnn as qnn

logger = logging.getLogger(__name__)

DETECTOR_KEY = "qnn"


class QnnDetectorConfig(BaseDetectorConfig):
    type: Literal[DETECTOR_KEY]


class QnnDetector(DetectionApi):
    type_key = DETECTOR_KEY

    def __init__(self, detector_config: QnnDetectorConfig):
        super().__init__(detector_config)
        model_path = detector_config.model.path

        ort.register_execution_provider_library(
            "QNNExecutionProvider", qnn.get_library_path()
        )
        qnn_devices = [
            d for d in ort.get_ep_devices() if d.ep_name == "QNNExecutionProvider"
        ]
        if not qnn_devices:
            raise RuntimeError("QNN EP not discovered after registration")

        so = ort.SessionOptions()
        so.log_severity_level = 3
        so.add_provider_for_devices(
            qnn_devices,
            {
                "backend_path": qnn.get_qnn_htp_path(),
                "htp_performance_mode": "burst",
                "htp_arch": "68",
            },
        )
        self.session = ort.InferenceSession(model_path, sess_options=so)
        self.input_name = self.session.get_inputs()[0].name
        self.output_names = [o.name for o in self.session.get_outputs()]
        self.qparams = self._load_qparams(model_path)
        self.iou = 0.45
        logger.info(
            "QNN detector ready on %s (input=%s, outputs=%s)",
            self.session.get_providers(),
            self.input_name,
            self.output_names,
        )

    @staticmethod
    def _load_qparams(model_path):
        md = os.path.join(os.path.dirname(model_path), "metadata.json")
        params = {}
        if os.path.exists(md):
            with open(md) as f:
                data = json.load(f)
            for v in data.get("model_files", {}).values():
                for name, spec in v.get("outputs", {}).items():
                    params[name] = spec.get("quantization_parameters")
        return params

    def _dequant(self, name, arr):
        qp = self.qparams.get(name)
        arr = np.asarray(arr)
        if qp and arr.dtype.kind in "ui":
            return (arr.astype(np.float32) - qp.get("zero_point", 0)) * qp.get(
                "scale", 1.0
            )
        return arr.astype(np.float32)

    def _nms(self, boxes, scores, iou_thr):
        if len(boxes) == 0:
            return []
        x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
        areas = np.clip(x2 - x1, 0, None) * np.clip(y2 - y1, 0, None)
        order = scores.argsort()[::-1]
        keep = []
        while order.size > 0:
            i = order[0]
            keep.append(i)
            xx1 = np.maximum(x1[i], x1[order[1:]])
            yy1 = np.maximum(y1[i], y1[order[1:]])
            xx2 = np.minimum(x2[i], x2[order[1:]])
            yy2 = np.minimum(y2[i], y2[order[1:]])
            w = np.clip(xx2 - xx1, 0, None)
            h = np.clip(yy2 - yy1, 0, None)
            inter = w * h
            iou = inter / (areas[i] + areas[order[1:]] - inter + 1e-9)
            order = order[1:][iou <= iou_thr]
        return keep

    def detect_raw(self, tensor_input):
        outs = self.session.run(None, {self.input_name: tensor_input})
        out = dict(zip(self.output_names, outs))

        boxes = self._dequant("boxes", out["boxes"]).reshape(-1, 4)
        scores = self._dequant("scores", out["scores"]).reshape(-1)
        classes = np.asarray(out["class_idx"]).reshape(-1).astype(np.int32)

        detections = np.zeros((20, 6), np.float32)

        keep = scores >= self.thresh
        boxes, scores, classes = boxes[keep], scores[keep], classes[keep]
        if scores.size == 0:
            return detections

        kept = []
        for c in np.unique(classes):
            m = np.where(classes == c)[0]
            for j in self._nms(boxes[m], scores[m], self.iou):
                kept.append(m[j])
        kept.sort(key=lambda i: -scores[i])

        h, w = float(self.height), float(self.width)
        for row, i in enumerate(kept[:20]):
            x1, y1, x2, y2 = boxes[i]
            detections[row] = [
                float(classes[i]),
                float(scores[i]),
                np.clip(y1 / h, 0.0, 1.0),
                np.clip(x1 / w, 0.0, 1.0),
                np.clip(y2 / h, 0.0, 1.0),
                np.clip(x2 / w, 0.0, 1.0),
            ]
        return detections

The image

The Dockerfile builds a custom Frigate image (based from the official image). It compiles the FastRPC libs from source and pulls the version-matched DSP shell from the upstream sources at build time, then bakes the detector in. The CDSP_VER build arg is the firmware version from QC_IMAGE_VERSION_STRING earlier. This must match the hosts version.

ARG CDSP_VER=CDSP.HT.2.5.c3-00134-KODIAK-1

FROM debian:bookworm AS fastrpc
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
        git ca-certificates build-essential autoconf automake autoconf-archive \
        libtool pkg-config libyaml-dev libbsd-dev libmd-dev \
    && rm -rf /var/lib/apt/lists/*
RUN git clone --depth 1 https://github.com/qualcomm/fastrpc.git /src/fastrpc
WORKDIR /src/fastrpc
RUN ./gitcompile --prefix=/usr && make install DESTDIR=/out
RUN mkdir -p /libs && \
    find /out \( -name 'lib?dsprpc.so*' -o -name 'lib?dsp_default_listener.so*' \) \
        -exec cp -av {} /libs/ \; && \
    ls -l /libs

FROM debian:bookworm AS dspshell
ARG CDSP_VER
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends git ca-certificates \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /hdb
RUN git clone --filter=blob:none --no-checkout \
        https://github.com/linux-msm/hexagon-dsp-binaries.git . && \
    D="qcm6490/Thundercomm/RB3gen2/${CDSP_VER}" && \
    git sparse-checkout set --no-cone "$D" && \
    git checkout && \
    test -f "$D/fastrpc_shell_unsigned_3" || { \
        echo "ERROR: no fastrpc_shell_unsigned_3 for ${CDSP_VER} in hexagon-dsp-binaries."; \
        echo "Check your host firmware version and set --build-arg CDSP_VER accordingly."; \
        exit 1; } && \
    mkdir -p /cdsp && cp -av "$D/." /cdsp/

FROM ghcr.io/blakeblackshear/frigate:stable

RUN apt-get update && apt-get install -y --no-install-recommends \
        libbsd0 libyaml-0-2 libmd0 libatomic1 \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --break-system-packages --no-cache-dir \
        onnxruntime==1.24.4 onnxruntime-qnn==2.1.1

COPY --from=fastrpc  /libs/ /usr/lib/
COPY --from=dspshell /cdsp/ /usr/lib/dsp/cdsp/

RUN ln -sf libcdsprpc.so /usr/lib/libxdsprpc.so && \
    PKG="$(python3 -c 'import onnxruntime_qnn,os;print(os.path.dirname(os.path.abspath(onnxruntime_qnn.__file__)))')" && \
    cp -f "$PKG/libQnnHtpV68Skel.so" /usr/lib/dsp/cdsp/ && \
    printf '%s\n/usr/lib\n' "$PKG" > /etc/ld.so.conf.d/onnxruntime-qnn.conf && \
    ldconfig

ENV ADSP_LIBRARY_PATH=/usr/lib/dsp/cdsp

COPY detectors/qnn.py /opt/frigate/frigate/detectors/plugins/qnn.py

Wiring up Frigate

The model ships labels.txt with one class name per line, but Frigate wants an indexed labelmap - each line <index> <label>, zero-based. Quick awk to create the indexed labelmap:

awk '{printf "%d  %s\n", NR-1, $0}' model/yolov8_det-onnx-w8a8/labels.txt > config/coco-80.txt

config/config.yml

mqtt:
  enabled: false

detectors:
  qnn:
    type: qnn

model:
  path: /config/model/yolov8_det-onnx-w8a8/yolov8_det.onnx
  labelmap_path: /config/coco-80.txt
  width: 640
  height: 640
  input_tensor: nchw
  input_dtype: int
  input_pixel_format: rgb

cameras:
  camera_1:
    ffmpeg:
      inputs:
        - path: rtsp://USER:PASS@CAMERA_IP/stream1
          roles: [record]
        - path: rtsp://USER:PASS@CAMERA_IP/stream2
          roles: [detect]
    detect:
      width: 640
      height: 360

detect:
  enabled: true
  fps: 5

objects:
  track: [person]

record:
  enabled: true

snapshots:
  enabled: true

version: 0.17-0

Running it

docker-compose.yml builds the image (passing the firmware version as a build arg), passes the FastRPC and dma_heap device nodes through to the container, and mounts the config and model in:

services:
  frigate:
    container_name: frigate
    build:
      context: .
      args:
        CDSP_VER: CDSP.HT.2.5.c3-00134-KODIAK-1
    image: frigate-qnn:local
    restart: unless-stopped
    stop_grace_period: 30s
    shm_size: "256mb"
    devices:
      - /dev/fastrpc-cdsp:/dev/fastrpc-cdsp
      - /dev/fastrpc-cdsp-secure:/dev/fastrpc-cdsp-secure
      - /dev/fastrpc-adsp:/dev/fastrpc-adsp
      - /dev/dma_heap/system:/dev/dma_heap/system
      - /dev/dma_heap/reserved:/dev/dma_heap/reserved
      - /dev/dma_heap/default_cma_region:/dev/dma_heap/default_cma_region
    volumes:
      - ./config:/config
      - ./model:/config/model:ro
      - ./frigate-storage:/media/frigate
      - /etc/localtime:/etc/localtime:ro
    ports:
      - "8971:8971"
      - "5000:5000"
      - "8554:8554"

Build the image and bring it up:

docker compose up --build -d

Then confirm detection is actually running on the DSP and didn't quietly fall back to the CPU. The best I can do for this right now is two fold:

  1. The plugin logs when it's ready on the QNN provider (specifically mentions it is ready on QNNExecutionProvider (DSP) and CPUExecutionProvider (CPU).:
$ docker compose logs | grep "QNN detector ready"
frigate  | 2026-05-31 11:54:39.151524200  [2026-05-31 11:54:39] frigate.detectors.plugins.qnn  INFO    : QNN detector ready on ['QNNExecutionProvider', 'CPUExecutionProvider'] (input=image, outputs=['boxes', 'scores', 'class_idx'])
  1. Reported inference speed. Frigate's /api/stats reports: ~6-8ms means it's on the DSP; ~50+ms means it fell back to the CPU.
curl -s localhost:5000/api/stats | jq '.detectors.qnn.inference_speed'
6.46

That is about it. I've only been running this a few days but its been stable so far. Object detection is working really well.