Blast Furnace Vision & Analytics Platform
Furnace operators relied on aging cameras with no analytics, tracking instability events in spreadsheets after the fact.
Computer Vision • Distributed Systems • Industrial Analytics • Edge Computing • Event-Driven Architecture • High-Throughput Streaming
Background
The blast furnace is a critical system in steel manufacturing. Key health indicators like blockages and flow changes happen in extreme conditions that are difficult to observe. A missed early warning can escalate to a breakout or extended unplanned shutdown.
The existing cameras were aging and vendor-locked, failing regularly with no path to repair. They provided raw video with no analytics. Operators reviewed footage after the fact and manually tracked events in spreadsheets during daily meetings. There was no structured data, no trend analysis, and no way to respond before conditions became dangerous.
The Solution
I designed a distributed vision platform that replaced the entire camera infrastructure and covers all 24 cameras simultaneously. The system detects early indicators of furnace instability and surfaces them through dashboards, automated daily reporting, and historical trend analysis. I owned the architecture, all computer vision and ML development, and the detection pipeline.
Deep Dive: System Architecture & Data Pipeline
Architecture
Each camera runs its own C++ process handling acquisition, analysis, and video output. The C++ process connects to a FLIR GigE machine vision camera, runs detection and analysis, and pipes frames to FFmpeg. Two FFmpeg outputs run per camera: one RTSP re-stream feeding the client's existing recording infrastructure, and one HLS stream for the web interface. Detection events publish over MQTT to a Python post-processing service that applies thresholds and writes to MySQL.
Kubernetes isolates failures and makes per-camera scaling predictable. Each C++ process runs as its own pod. A failure in one stream does not propagate, and adding cameras does not require re-architecting. The full stack runs on an air-gapped plant network.
Each camera sits in a custom enclosure designed to fit the furnace mounting points, with Peltier element cooling to handle ambient heat near the furnace shell.
Detection Approach
Detection combines client-tunable image analysis with a custom neural network. The image analysis layer extracts features from each frame: intensity regions, spatial patterns, and temporal changes. These features feed as structured inputs into a neural network alongside the raw frame. The network was trained on labeled process data collected by the system itself.
The image analysis algorithms are tunable per camera without retraining the network. Each camera views the furnace from a different angle and distance, so thresholds and filters vary. Python scripts automate testing and parameter tuning across all 24 camera configurations. When thresholds shift or new conditions appear, operators and engineers adjust without waiting for a model retrain.
What Didn't Work Initially
Temporal and frame-averaging approaches. Early attempts used temporal differencing for fast-moving events and frame averaging for gradual color changes. Both produced too many false positives. Camera vibration, lighting shifts, and cooling spray all triggered the same patterns as real events.
Neural network and LSTM without structured features. Training a network on raw frames produced noisy confidence scores. Adding an LSTM for temporal memory did not help. Without structured per-frame features, the model was trying to learn spatial and temporal patterns simultaneously from raw pixels. Events were too sparse across long sequences for the LSTM to accumulate meaningful signal.
What worked: separating feature extraction from learning. Once classical image analysis handled the structured inputs, the neural network only needed to learn decision boundaries rather than both spatial patterns and temporal dynamics from raw pixels. The separation delivered consistent results across all 24 camera angles and lighting conditions.
Challenges
Air-gapped deployment. The plant network has no internet access. All container images, dependencies, and updates are built on a connected machine, collected by custom scripts, and transferred offline. This constraint shaped the entire toolchain: local registries, offline-capable Jenkins pipelines, and scripted rollbacks for every component.
Scaling to 24 concurrent streams. The prototype validated detection on a few cameras. Scaling to 24 introduced network congestion and timing issues absent at smaller scale. GigE packet loss under load required NIC buffer tuning and interrupt coalescing before acquisition was reliable across all streams simultaneously.
Building for daily use. Dashboards, event graphs, and reporting formats went through multiple iterations driven by operator and supervisor feedback. Operators explained process behavior that clarified what detected events actually meant, which fed directly back into detection tuning. Event frequency, severity, and duration displays had to match how decisions were actually made in daily meetings and during shifts before anyone relied on the system.