Vision-Guided Impurity Removal
Operators remove impurities from molten steel by eye, with real costs whether they take too much or too little.
Computer Vision • Industrial Automation • Process Control • Edge Computing • Systems Integration • Near-Real-Time Analytics
Background
This project was funded by an NSERC ARD (Natural Sciences and Engineering Research Council of Canada - Applied Research and Development) grant in partnership between ArcelorMittal Dofasco and Mohawk College.
During slag raking in ladle metallurgy, operators remove surface impurities from molten steel before it moves downstream. They work at ladle level and can barely see the surface from where they stand. Removal is done entirely by eye.
SOPs exist but are vague because nothing is measurable. Every operator does it differently, and when experienced people retire or go on leave, knowledge degrades. Remove too little and quality suffers downstream. Remove too much and yield is lost directly.
The Solution
I designed a vision system that mounts a thermal camera above the ladle, giving operators a clear view of the surface for the first time. The system identifies material types and displays real-time coverage percentages, making existing SOPs measurable and followable instead of judgment-based. All process data is captured for trend analysis, operator comparison, and correlation against downstream chemistry results.
Deep Dive: Vision Pipeline
Architecture
The system is a single-host, multi-threaded vision pipeline running on an industrial PC. All acquisition, processing, inference, and recording happen in one C++ process.
A FLIR thermal camera streams frames over a dedicated network link into the application. Inside, a shared-memory ring buffer connects the camera thread to its consumers. Rake head detection, slag segmentation, and recording each read the latest frame independently. No thread waits for another.
Neural network inference runs on dedicated threads and CUDA streams. Results publish as events over MQTT to a Python middleware service. The middleware connects to the plant PLC through an OPC UA FactoryTalk gateway. Coverage results and health telemetry (including camera temperature for air cooling monitoring) push to the PLC for SCADA ingestion. Process state signals (start/stop, heat IDs, recipe identifiers) flow back, tying vision data to metallurgical context.
Video recording pipes frames through FFmpeg for 16-bit compression and writes to local storage, segmented by application state. Only footage from active processing windows is retained.
C++ Pipeline Internals
Seven threads share the ring buffer and communicate through shared buses. The camera thread (T1) writes frames into the ring. YOLO detection (T2) tracks rake position and pull cycle boundaries. SegFormer (T3) classifies material coverage, triggered by pull cycle events or running continuously at ~5 fps. A snapshot thread (T4) saves per-pull records. Recorder (T5) pipes frames to FFmpeg for raw footage capture during raking. GUI (T6) composites overlays from all other threads. MQTT (T7) receives commands from the middleware and publishes session results.
An application state machine governs which threads are active. In IDLE, the camera streams but inference and recording pause. When the middleware sends a RAKING command over MQTT, all processing threads activate. Pull events trigger snapshots and analysis accumulation. When raking ends, results publish and the system returns to IDLE.
Challenges
Layered tracking for pull detection. YOLO runs on every frame to detect the rake. A Kalman tracker smooths position estimates across frames. Hysteresis logic on top of tracking determines pull cycle boundaries, which trigger SegFormer for coverage measurement. Getting the state transitions right was where most iteration went. The system moves through idle, raking, individual pulls, and measurement, and each layer filters noise from the one below it.
Sustained throughput with thread isolation. Four consumer threads read from a 32-slot ring buffer while the camera writes at 30 fps. Each slot carries a version counter that consumers check before and after copying a frame. If the version changed mid-copy, the read is discarded and retried. Any lock-based design would break the latency budget. This separation lets each thread run at its own pace without coupling fast consumers to slow ones.
Building for operator trust. The system needed to earn operator trust before it could change behavior. I conducted interviews and showed recorded footage to operators, supervisors, and engineering across multiple roles. Their feedback shaped the stop-point recommendation, which suggests when coverage meets SOP thresholds and removal should stop. This threshold tunes as production data builds. Designing the system as decision support rather than autonomous control was key to adoption.