Crisp5 min readGo deeper →

ViewR - AWS Rekognition and RTSP

Built the ViewR vision pipeline on AWS Rekognition Collections for face matching and RTSP/ONVIF for camera control, plus the de-duplication and confidence-tuning logic on top.

The ViewR pipeline was 4 layers - get frames off the camera, send candidates to Rekognition, match against our collection, alert. Sounds simple. Each layer had a real edge case story.

Layer 1 - frames off the camera. RTSP is the standard but every IP camera vendor implements it slightly differently. I used FFmpeg as a child process from Electron to handle the codec zoo (H.264, H.265, motion JPEG, occasional MJPEG-over-HTTP). FFmpeg gave me raw frames at 2 FPS sampled (not full frame rate, because Rekognition costs add up and we did not need 30 FPS for entry detection).

Layer 2 - face detection on the client. Before sending a frame to Rekognition I ran a cheap local face detector to skip frames with no face. This cut Rekognition calls by 70-80 percent for typical office cameras (most frames have no one in them). I used MediaPipe in a worker for this. Cheap, fast, runs anywhere.

Layer 3 - Rekognition Collections. I created one collection per customer organization. Each enrolled face had an external ID (our internal user ID), and IndexFaces stored the embedding. At match time, SearchFacesByImage with the candidate face crop returned the top match with a similarity score.

Layer 4 - business logic on top. De-duplication (same face matched 30 times in 30 seconds is one entry event, not 30). Confidence threshold tuning per customer (we started at 90 percent similarity, some sites needed 95 percent because they had a lot of similar-looking faces, some sites worked at 85 percent). Watchlist matching as a separate code path with stricter thresholds and alert routing.

A successful face-recognition entry, end to end.

PTZ control was its own subproject. ONVIF's Profile S covers pan-tilt-zoom for compliant cameras. The protocol is SOAP over HTTP, which felt like time travel. I wrote a thin client (no full ONVIF SDK, the dependencies were painful) that supported the 3 vendors we had in the field.

The accuracy number that mattered to the customer was not Rekognition's similarity score. It was "of the people who walked through the door, how many were correctly identified". For Masters Union that was 94 percent. The gap was mostly people walking through at angles the enrolled photos did not cover. The fix was to re-enroll those people with more angle variety, not to tune the model.

Learn more

Docs
AWS Rekognition Collections docsAWS
Docs
ONVIF Profile S specONVIF
Docs
FFmpeg docsFFmpeg
Docs
RFC 7826 - RTSP 2.0IETF