I had a play with object detection on the Radxa Dragon Q6A. This was a quick-and-dirty style 'project' (just an evening with a new SBC and an AI chatbot!).
The goal was to run Frigate with its object detection running on the board's DSP instead of the CPU. In my (limited) testing the DSP is around 10x faster for this than the CPU. I observed a single inference reducing from ~90ms (CPU) to ~8ms (DSP).
What the Q6A needs first
- Armbian (I'm using edge
7.0.10-edge-qcs6490with Ubuntu 26.04 userspace) - FastRPC support (built into Armbian)
Both DSPs should be running:
$ for r in /sys/class/remoteproc/remoteproc*; do echo "$(cat $r/name)=$(cat $r/state)"; done
adsp=running
cdsp=runningThe custom Frigate image is built to match the host's cDSP firmware version:
$ strings /lib/firmware/qcom/qcm6490/cdsp.mbn | grep QC_IMAGE_VERSION_STRING
QC_IMAGE_VERSION_STRING=CDSP.HT.2.5.c3-00134-KODIAK-1The model
I'm using YOLOv8 in ONNX format, quantised to w8a8. I used Qualcomm's AI Hub for this (maybe I'll cover this in another post)
The layout
Here's a tree view of the files involved:
.
├── config
│ ├── coco-80.txt # generated from the model's labels.txt
│ └── config.yml
├── detectors
│ └── qnn.py
├── docker-compose.yml
├── Dockerfile
└── model
└── yolov8_det-onnx-w8a8 # from Ai Hub
├── labels.txt
├── metadata.json
├── yolov8_det.data
└── yolov8_det.onnxThe detector plugin
detectors/qnn.py is the custom frigate detector that actually runs the model on the DSP, through ONNX Runtime's QNN execution provider. This detetor was almost entirely generated by AI, with my input purely being high level guidance (providing docs, detector examples, etc) and minor corrections.
import json
import logging
import os
import numpy as np
from typing import Literal
from frigate.detectors.detection_api import DetectionApi
from frigate.detectors.detector_config import BaseDetectorConfig
import onnxruntime as ort
import onnxruntime_qnn as qnn
logger = logging.getLogger(__name__)
DETECTOR_KEY = "qnn"
class QnnDetectorConfig(BaseDetectorConfig):
type: Literal[DETECTOR_KEY]
class QnnDetector(DetectionApi):
type_key = DETECTOR_KEY
def __init__(self, detector_config: QnnDetectorConfig):
super().__init__(detector_config)
model_path = detector_config.model.path
ort.register_execution_provider_library(
"QNNExecutionProvider", qnn.get_library_path()
)
qnn_devices = [
d for d in ort.get_ep_devices() if d.ep_name == "QNNExecutionProvider"
]
if not qnn_devices:
raise RuntimeError("QNN EP not discovered after registration")
so = ort.SessionOptions()
so.log_severity_level = 3
so.add_provider_for_devices(
qnn_devices,
{
"backend_path": qnn.get_qnn_htp_path(),
"htp_performance_mode": "burst",
"htp_arch": "68",
},
)
self.session = ort.InferenceSession(model_path, sess_options=so)
self.input_name = self.session.get_inputs()[0].name
self.output_names = [o.name for o in self.session.get_outputs()]
self.qparams = self._load_qparams(model_path)
self.iou = 0.45
logger.info(
"QNN detector ready on %s (input=%s, outputs=%s)",
self.session.get_providers(),
self.input_name,
self.output_names,
)
@staticmethod
def _load_qparams(model_path):
md = os.path.join(os.path.dirname(model_path), "metadata.json")
params = {}
if os.path.exists(md):
with open(md) as f:
data = json.load(f)
for v in data.get("model_files", {}).values():
for name, spec in v.get("outputs", {}).items():
params[name] = spec.get("quantization_parameters")
return params
def _dequant(self, name, arr):
qp = self.qparams.get(name)
arr = np.asarray(arr)
if qp and arr.dtype.kind in "ui":
return (arr.astype(np.float32) - qp.get("zero_point", 0)) * qp.get(
"scale", 1.0
)
return arr.astype(np.float32)
def _nms(self, boxes, scores, iou_thr):
if len(boxes) == 0:
return []
x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
areas = np.clip(x2 - x1, 0, None) * np.clip(y2 - y1, 0, None)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.clip(xx2 - xx1, 0, None)
h = np.clip(yy2 - yy1, 0, None)
inter = w * h
iou = inter / (areas[i] + areas[order[1:]] - inter + 1e-9)
order = order[1:][iou <= iou_thr]
return keep
def detect_raw(self, tensor_input):
outs = self.session.run(None, {self.input_name: tensor_input})
out = dict(zip(self.output_names, outs))
boxes = self._dequant("boxes", out["boxes"]).reshape(-1, 4)
scores = self._dequant("scores", out["scores"]).reshape(-1)
classes = np.asarray(out["class_idx"]).reshape(-1).astype(np.int32)
detections = np.zeros((20, 6), np.float32)
keep = scores >= self.thresh
boxes, scores, classes = boxes[keep], scores[keep], classes[keep]
if scores.size == 0:
return detections
kept = []
for c in np.unique(classes):
m = np.where(classes == c)[0]
for j in self._nms(boxes[m], scores[m], self.iou):
kept.append(m[j])
kept.sort(key=lambda i: -scores[i])
h, w = float(self.height), float(self.width)
for row, i in enumerate(kept[:20]):
x1, y1, x2, y2 = boxes[i]
detections[row] = [
float(classes[i]),
float(scores[i]),
np.clip(y1 / h, 0.0, 1.0),
np.clip(x1 / w, 0.0, 1.0),
np.clip(y2 / h, 0.0, 1.0),
np.clip(x2 / w, 0.0, 1.0),
]
return detectionsThe image
The Dockerfile builds a custom Frigate image (based from the official image). It compiles the FastRPC libs from source and pulls the version-matched DSP shell from the upstream sources at build time, then bakes the detector in. The CDSP_VER build arg is the firmware version from QC_IMAGE_VERSION_STRING earlier. This must match the hosts version.
ARG CDSP_VER=CDSP.HT.2.5.c3-00134-KODIAK-1
FROM debian:bookworm AS fastrpc
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
git ca-certificates build-essential autoconf automake autoconf-archive \
libtool pkg-config libyaml-dev libbsd-dev libmd-dev \
&& rm -rf /var/lib/apt/lists/*
RUN git clone --depth 1 https://github.com/qualcomm/fastrpc.git /src/fastrpc
WORKDIR /src/fastrpc
RUN ./gitcompile --prefix=/usr && make install DESTDIR=/out
RUN mkdir -p /libs && \
find /out \( -name 'lib?dsprpc.so*' -o -name 'lib?dsp_default_listener.so*' \) \
-exec cp -av {} /libs/ \; && \
ls -l /libs
FROM debian:bookworm AS dspshell
ARG CDSP_VER
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends git ca-certificates \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /hdb
RUN git clone --filter=blob:none --no-checkout \
https://github.com/linux-msm/hexagon-dsp-binaries.git . && \
D="qcm6490/Thundercomm/RB3gen2/${CDSP_VER}" && \
git sparse-checkout set --no-cone "$D" && \
git checkout && \
test -f "$D/fastrpc_shell_unsigned_3" || { \
echo "ERROR: no fastrpc_shell_unsigned_3 for ${CDSP_VER} in hexagon-dsp-binaries."; \
echo "Check your host firmware version and set --build-arg CDSP_VER accordingly."; \
exit 1; } && \
mkdir -p /cdsp && cp -av "$D/." /cdsp/
FROM ghcr.io/blakeblackshear/frigate:stable
RUN apt-get update && apt-get install -y --no-install-recommends \
libbsd0 libyaml-0-2 libmd0 libatomic1 \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --break-system-packages --no-cache-dir \
onnxruntime==1.24.4 onnxruntime-qnn==2.1.1
COPY --from=fastrpc /libs/ /usr/lib/
COPY --from=dspshell /cdsp/ /usr/lib/dsp/cdsp/
RUN ln -sf libcdsprpc.so /usr/lib/libxdsprpc.so && \
PKG="$(python3 -c 'import onnxruntime_qnn,os;print(os.path.dirname(os.path.abspath(onnxruntime_qnn.__file__)))')" && \
cp -f "$PKG/libQnnHtpV68Skel.so" /usr/lib/dsp/cdsp/ && \
printf '%s\n/usr/lib\n' "$PKG" > /etc/ld.so.conf.d/onnxruntime-qnn.conf && \
ldconfig
ENV ADSP_LIBRARY_PATH=/usr/lib/dsp/cdsp
COPY detectors/qnn.py /opt/frigate/frigate/detectors/plugins/qnn.pyWiring up Frigate
The model ships labels.txt with one class name per line, but Frigate wants an indexed labelmap - each line <index> <label>, zero-based. Quick awk to create the indexed labelmap:
awk '{printf "%d %s\n", NR-1, $0}' model/yolov8_det-onnx-w8a8/labels.txt > config/coco-80.txtconfig/config.yml
mqtt:
enabled: false
detectors:
qnn:
type: qnn
model:
path: /config/model/yolov8_det-onnx-w8a8/yolov8_det.onnx
labelmap_path: /config/coco-80.txt
width: 640
height: 640
input_tensor: nchw
input_dtype: int
input_pixel_format: rgb
cameras:
camera_1:
ffmpeg:
inputs:
- path: rtsp://USER:PASS@CAMERA_IP/stream1
roles: [record]
- path: rtsp://USER:PASS@CAMERA_IP/stream2
roles: [detect]
detect:
width: 640
height: 360
detect:
enabled: true
fps: 5
objects:
track: [person]
record:
enabled: true
snapshots:
enabled: true
version: 0.17-0Running it
docker-compose.yml builds the image (passing the firmware version as a build arg), passes the FastRPC and dma_heap device nodes through to the container, and mounts the config and model in:
services:
frigate:
container_name: frigate
build:
context: .
args:
CDSP_VER: CDSP.HT.2.5.c3-00134-KODIAK-1
image: frigate-qnn:local
restart: unless-stopped
stop_grace_period: 30s
shm_size: "256mb"
devices:
- /dev/fastrpc-cdsp:/dev/fastrpc-cdsp
- /dev/fastrpc-cdsp-secure:/dev/fastrpc-cdsp-secure
- /dev/fastrpc-adsp:/dev/fastrpc-adsp
- /dev/dma_heap/system:/dev/dma_heap/system
- /dev/dma_heap/reserved:/dev/dma_heap/reserved
- /dev/dma_heap/default_cma_region:/dev/dma_heap/default_cma_region
volumes:
- ./config:/config
- ./model:/config/model:ro
- ./frigate-storage:/media/frigate
- /etc/localtime:/etc/localtime:ro
ports:
- "8971:8971"
- "5000:5000"
- "8554:8554"Build the image and bring it up:
docker compose up --build -dThen confirm detection is actually running on the DSP and didn't quietly fall back to the CPU. The best I can do for this right now is two fold:
- The plugin logs when it's ready on the QNN provider (specifically mentions it is ready on
QNNExecutionProvider(DSP) andCPUExecutionProvider(CPU).:
$ docker compose logs | grep "QNN detector ready"
frigate | 2026-05-31 11:54:39.151524200 [2026-05-31 11:54:39] frigate.detectors.plugins.qnn INFO : QNN detector ready on ['QNNExecutionProvider', 'CPUExecutionProvider'] (input=image, outputs=['boxes', 'scores', 'class_idx'])- Reported inference speed. Frigate's
/api/statsreports: ~6-8ms means it's on the DSP; ~50+ms means it fell back to the CPU.
curl -s localhost:5000/api/stats | jq '.detectors.qnn.inference_speed'
6.46That is about it. I've only been running this a few days but its been stable so far. Object detection is working really well.