An edge gateway on a factory floor isn't a REST API handling one request at a time. It's a real-time system juggling multiple competing demands simultaneously: polling a PLC for tag values every second, buffering data locally when the cloud connection drops, transmitting batched telemetry over MQTT, processing incoming configuration commands from the cloud, and monitoring its own health — all at once, on hardware with the computing power of a ten-year-old smartphone.
Get the concurrency wrong, and you don't get a 500 error in your logs. You get silent data loss, corrupted telemetry batches, or — worst case — a watchdog reboot loop that takes your monitoring offline during a critical production run.
This guide covers the architecture patterns that make industrial edge gateways reliable under real-world conditions: concurrent PLC polling, thread-safe buffering, MQTT delivery guarantees, and the store-and-forward patterns that keep data flowing when the network doesn't.

The Concurrency Challenge in Industrial Edge Gateways
A typical edge gateway has at least three threads running concurrently:
- The polling thread — reads tags from PLCs at configured intervals (1-second to 60-second cycles)
- The MQTT network thread — manages the broker connection, handles publish/subscribe, reconnection
- The main control thread — processes incoming commands, monitors watchdog timers, manages configuration
These threads all share one critical resource: the outgoing data buffer. The polling thread writes telemetry into the buffer. The MQTT thread reads from the buffer and transmits data. When the connection drops, the buffer must hold data without the polling thread stalling. When the connection recovers, the buffer must drain in order without losing or duplicating messages.
This is a classic producer-consumer problem, but with industrial constraints that make textbook solutions insufficient.
Why Standard Queues Fall Short
Your first instinct might be to use a thread-safe queue — a ConcurrentLinkedQueue in Java, a queue.Queue in Python, or a lock-free ring buffer. These work fine for web applications, but industrial edge gateways have constraints that break standard queue implementations:
1. Memory Is Fixed and Finite
Edge gateways run on embedded hardware with 64 MB to 512 MB of RAM — no swap space, no dynamic allocation after startup. An unbounded queue will eventually exhaust memory during a long network outage. A fixed-size queue forces you to choose: block the producer (stalling PLC polling) or drop the oldest data.
2. Network Outages Last Hours, Not Seconds
In a factory, network outages aren't transient blips. A fiber cut, a misconfigured switch, or a power surge on the network infrastructure can take connectivity down for hours. Your buffer needs to hold potentially thousands of telemetry batches — not just a few dozen.
3. Delivery Confirmation Is Asynchronous
MQTT QoS 1 guarantees at-least-once delivery, but the PUBACK confirmation comes back asynchronously — possibly hundreds of milliseconds after the PUBLISH. During that window, you can't release the buffer space (the message might need retransmission), and you can't stall the producer (PLC data keeps flowing).
4. Data Must Survive Process Restarts
If the edge gateway daemon restarts (due to a configuration update, a watchdog trigger, or a power cycle), buffered-but-undelivered data must be recoverable. Purely in-memory queues lose everything.
The Paged Ring Buffer Pattern
The pattern that works in production is a paged ring buffer — a fixed-size memory region divided into pages, with explicit state tracking for each page. Here's how it works:
Memory Layout
At startup, the gateway allocates a single contiguous memory block and divides it into equal-sized pages:
┌─────────┬─────────┬─────────┬─────────┬─────────┐
│ Page 0 │ Page 1 │ Page 2 │ Page 3 │ Page 4 │
│ FREE │ FREE │ FREE │ FREE │ FREE │
└─────────┴─────────┴─────────┴─────────┴─────────┘
Each page has its own header tracking:
- A page number (for logging and debugging)
- A
start_p pointer (beginning of writable space)
- A
write_p pointer (current write position)
- A
read_p pointer (current read position for transmission)
- A
next pointer (linking to the next page in whatever list it's in)
Three Page Lists
Pages move between three linked lists:
- Free pages — available for the producer to write into
- Used pages — full of data, queued for transmission
- Work page — the single page currently being written to
Producer (Polling Thread) Consumer (MQTT Thread)
│ │
▼ │
┌──────────┐ │
│Work Page │──────── When full ──────►┌──────────┐
│(writing) │ │Used Pages│──► MQTT Publish
└──────────┘ │(queued) │
▲ └──────────┘
│ │
│ When delivered │
│◄──────────────────────────────────────┘
┌──────────┐
│Free Pages│
│(empty) │
└──────────┘
The Producer Path
When the polling thread has a new batch of tag values to store:
- Check the work page — if there's no current work page, grab one from the free list
- Calculate space — check if the new data fits in the remaining space on the work page
- If it fits — write the data (with a size header) and advance
write_p
- If it doesn't fit — move the work page to the used list, grab a new page (from free, or steal the oldest from used if free is empty), and write there
- After writing — check if there's data ready to transmit and kick the consumer
The critical detail: if the free list is empty, the producer steals the oldest used page. This means during extended outages, the buffer wraps around and overwrites the oldest data — exactly the behavior you want. Recent data is more valuable than stale data in industrial monitoring.
The Consumer Path
When the MQTT connection is active and there's data to send:
- Check the used page list — if empty, check if the work page has unsent data and promote it
- Read the next message from the first used page's
read_p position
- Publish via MQTT with QoS 1
- Set a "packet sent" flag — this prevents sending the next message until the current one is acknowledged
- Wait for PUBACK — when the broker confirms receipt, advance
read_p
- If
read_p reaches write_p — the page is fully delivered; move it back to the free list
- Repeat — grab the next message from the next used page
The Mutex Strategy
The entire buffer is protected by a single mutex. This might seem like a bottleneck, but in practice:
- Write operations (adding data) take microseconds
- Read operations (preparing to transmit) take microseconds
- The actual MQTT transmission happens outside the mutex — only the buffer state management is locked
The mutex is held for a few microseconds at a time, never during network I/O. This keeps the polling thread from ever blocking on network latency.
Polling Thread: MQTT Thread:
lock(mutex) lock(mutex)
write data to page read data from page
check if page full mark as sent
maybe promote page unlock(mutex)
trigger send check ─── MQTT publish ───
unlock(mutex) (outside mutex!)
lock(mutex)
process PUBACK
maybe free page
unlock(mutex)
Message Framing Inside Pages
Each page holds multiple messages packed sequentially. Each message has a simple header:
┌──────────────┬──────────────┬─────────────────────┐
│ Message ID │ Message Size │ Message Body │
│ (4 bytes) │ (4 bytes) │ (variable) │
└──────────────┴──────────────┴─────────────────────┘
The Message ID field is initially zero. When the MQTT library publishes the message, it fills in the packet ID assigned by the broker. This is how the consumer tracks which specific message was acknowledged — when the PUBACK callback fires with a packet ID, it can match it to the message at read_p and advance.
This framing makes the buffer self-describing. During recovery after a restart, the gateway can scan page contents by reading size headers sequentially.
Handling Disconnections Gracefully
When the MQTT connection drops, the consumer thread must handle it without corrupting the buffer:
Connection Lost:
1. Set connected = 0
2. Clear "packet sent" flag
3. Do NOT touch any page pointers
That's it. The producer keeps writing — it doesn't know or care about the connection state. The buffer absorbs data normally.
When the connection recovers:
Connection Restored:
1. Set connected = 1
2. Trigger send check (under mutex)
3. Consumer picks up where it left off
The key insight: the "packet sent" flag prevents double-sending. If a PUBLISH was in flight when the connection dropped, the PUBACK never arrived. The flag remains set, but the disconnection handler clears it. When the connection recovers, the consumer re-reads the same message from read_p (which was never advanced) and re-publishes it. The broker either receives a duplicate (handled by QoS 1 dedup) or receives it for the first time.
Binary vs. JSON Batch Encoding
The telemetry data written into the buffer can be encoded in two formats, and the choice affects both bandwidth and reliability.
Each batch is a JSON object containing groups of timestamped values:
{
"groups": [
{
"ts": 1709424000,
"device_type": 1017,
"serial_number": 123456,
"values": [
{"id": 80, "values": [725]},
{"id": 81, "values": [680]},
{"id": 82, "values": [285]}
]
}
]
}
Pros: Human-readable, easy to debug, parseable by any language.
Cons: 5-8× larger than binary, float precision loss (decimal representation), size estimation is rough.
A compact binary encoding with a header byte (0xF7), followed by big-endian packed groups:
F7 ← Header
00 00 00 01 ← Number of groups (1)
65 E8 2C 00 ← Timestamp (Unix epoch)
03 F9 ← Device type (1017)
00 01 E2 40 ← Serial number
00 00 00 03 ← Number of values (3)
00 50 00 01 02 02 D5 ← Tag 80: status=0, 1 value, 2 bytes, 725
00 51 00 01 02 02 A8 ← Tag 81: status=0, 1 value, 2 bytes, 680
00 52 00 01 02 01 1D ← Tag 82: status=0, 1 value, 2 bytes, 285
Pros: 5-8× smaller, perfect float fidelity (raw bytes preserved), exact size calculation.
Cons: Requires matching decoder on the cloud side, harder to debug without tools.
For gateways communicating over cellular connections — common in remote facilities like water treatment plants, oil wells, or distributed renewable energy sites — binary encoding is essentially mandatory. A gateway polling 100 tags every 10 seconds generates about 260 MB/month in JSON versus 35 MB/month in binary. At typical IoT cellular rates ($0.50-$2.00/MB), that's the difference between $130/month and $17/month per gateway.
The MQTT Watchdog Pattern
MQTT connections can enter a zombie state — technically connected according to the TCP stack, but the broker has stopped responding. This is especially common behind industrial firewalls and NAT devices with aggressive connection timeout policies.
The Problem
The MQTT library reports the connection as alive. The gateway publishes messages. No PUBACK comes back — ever. The buffer fills up because the consumer thinks each message is "in flight" (the packet_sent flag is set). Eventually the buffer wraps and data loss begins.
The Solution: Last-Delivered Timestamp
Track the timestamp of the last successful PUBACK. If more than N seconds have passed since the last acknowledged delivery, and there are messages waiting to be sent, the connection is stale:
monitor_watchdog():
if connected AND packet_sent:
elapsed = now - last_delivered_packet_timestamp
if elapsed > WATCHDOG_THRESHOLD:
// Force disconnect and reconnect
force_disconnect()
// Disconnection handler clears packet_sent
// Reconnection handler will re-deliver from read_p
A typical threshold is 60 seconds for LAN connections and 120 seconds for cellular. This catches zombie connections that the TCP stack and MQTT keep-alive miss.
Reconnection with Backoff
When the watchdog (or a genuine disconnection) triggers a reconnect, use a dedicated thread for the connection attempt. The connect_async call can block for the TCP timeout duration (potentially 30+ seconds), and you don't want that blocking the main loop or the polling thread.
A semaphore controls the reconnection thread:
Main Thread: Reconnection Thread:
Detects need to (blocked on semaphore)
reconnect │
Posts semaphore ──────► Wakes up
Calls connect_async()
(may block 30s)
Success or failure
Posts "done" semaphore
Waits for "done" ◄──────
Checks result
The reconnect delay should be fixed and short (5 seconds is typical) for industrial applications, not exponential backoff. In a factory, the network outage either resolves quickly (a transient) or it's a hard failure that needs human intervention. Exponential backoff just delays reconnection after the network recovers.
Batching Strategy: Size vs. Time
Telemetry batches should be finalized and queued for transmission based on whichever threshold hits first: size or time.
Size-Based Finalization
When the accumulated batch data exceeds a configured maximum (typically 4-500 KB for JSON, 50-100 KB for binary), finalize and queue it. This prevents any single MQTT message from being too large for the broker or the network MTU.
Time-Based Finalization
When the batch has been collecting data for more than a configured timeout (typically 30-60 seconds), finalize it regardless of size. This ensures that even slowly-changing tags get transmitted within a bounded time window.
The Interaction Between Batching and Buffering
Batching and buffering are separate concerns that interact:
PLC Tags ──► Batch (collecting) ──► Buffer Page (queued) ──► MQTT (transmitted)
Tag reads accumulate When batch finalizes, Pages are transmitted
in the batch structure the encoded batch goes one at a time with
into the ring buffer PUBACK confirmation
A batch contains one or more "groups" — each group is a set of tag values read at the same timestamp. Multiple polling cycles might go into a single batch before it's finalized by size or time. The finalized batch then goes into the ring buffer as a single message.
Dependent Tag Reads and Atomic Groups
In many PLC configurations, certain tags are only meaningful when read together. For example:
-
Alarm word tags — a uint16 register where each bit represents a different alarm. You read the alarm word, then extract the individual bits. If the alarm word changes, you need to read and deliver the extracted bits atomically with the parent.
-
Machine state transitions — when a "blender running" tag changes from 0 to 1, you might need to immediately read all associated process values (RPM, temperatures, pressures) to capture the startup snapshot.
The architecture handles this through dependent tag chains:
Parent Tag (alarm_word, interval=1s, compare=true)
└── Calculated Tag (alarm_bit_0, shift=0, mask=0x01)
└── Calculated Tag (alarm_bit_1, shift=1, mask=0x01)
└── Dependent Tag (motor_speed, read_on_change=true)
└── Dependent Tag (temperature, read_on_change=true)
When the parent tag changes, the polling thread:
- Finalizes the current batch
- Recursively reads all dependent tags (forced read, ignoring intervals)
- Starts a new batch group with the same timestamp
This ensures that the dependent values are timestamped identically with the trigger event and delivered together.
Hourly Full-Read Reset
Change-of-value (COV) filtering dramatically reduces bandwidth, but it introduces a subtle failure mode: if a value changes during a transient read error, the gateway might never know it changed.
Here's the scenario:
- At 10:00:00, tag value = 72.5 → transmitted
- At 10:00:01, PLC returns an error for that tag → not transmitted
- At 10:00:02, tag value = 73.0 → compared against last successful read (72.5), change detected, transmitted
- But if the error at 10:00:01 was actually a valid read of 73.0 that was misinterpreted as an error, and the value stayed at 73.0, then at 10:00:02 the comparison against the last known value (72.5) correctly catches it.
The real problem is when:
- At 10:00:00, tag value = 72.5 → transmitted
- The PLC program changes the tag to 73.0 and then back to 72.5 between polling cycles
- The gateway never sees 73.0 — it polls at 10:00:00 and 10:00:01 and gets 72.5 both times
For most industrial applications, this sub-second transient is irrelevant. But to guard against drift — where small rounding differences accumulate between the gateway's cached value and the PLC's actual value — a full reset is performed every hour:
Every hour boundary (when the system clock's hour changes):
1. Clear the "read once" flag on every tag
2. Clear all last-known values
3. Force read and transmit every tag regardless of COV
This guarantees that the cloud platform has a complete snapshot of every tag value at least once per hour, even for tags that haven't changed.
Putting It All Together: The Polling Loop
Here's the complete polling loop architecture that ties all these patterns together:
main_polling_loop():
FOREVER:
current_time = monotonic_clock()
FOR each configured device:
// Hourly reset check
if hour(current_time) != hour(last_poll_time):
reset_all_tags(device)
// Start a new batch group
start_group(device.batch, unix_timestamp())
FOR each tag in device.tags:
// Check if this tag needs reading now
if not tag.read_once OR elapsed(tag.last_read) >= tag.interval:
value, status = read_tag(device, tag)
if status == LINK_ERROR:
set_link_state(device, DOWN)
break // Stop reading this device
set_link_state(device, UP)
// COV check
if tag.compare AND tag.read_once:
if value == tag.last_value AND status == tag.last_status:
continue // No change, skip
// Deliver value
if tag.do_not_batch:
deliver_immediately(device, tag, value)
else:
add_to_batch(device.batch, tag, value)
// Check dependent tags
if value_changed AND tag.has_dependents:
finalize_batch()
read_dependents(device, tag)
start_new_group()
// Update tracking
tag.last_value = value
tag.last_status = status
tag.read_once = true
tag.last_read = current_time
// Finalize batch group
stop_group(device.batch, output_buffer)
// ↑ This checks size/time thresholds and may
// queue the batch into the ring buffer
sleep(polling_interval)
On a typical industrial edge gateway (ARM Cortex-A9, 512 MB RAM, Linux):
| Operation | Time | Notes |
|---|
| Mutex lock/unlock | ~1 µs | Per buffer operation |
| Modbus TCP read (10 registers) | 5-15 ms | Network dependent |
| Modbus RTU read (10 registers) | 20-50 ms | Baud rate dependent (9600-115200) |
| EtherNet/IP tag read | 2-8 ms | CIP overhead |
| JSON batch encoding | 0.5-2 ms | 100 tags |
| Binary batch encoding | 0.1-0.5 ms | 100 tags |
| MQTT publish (QoS 1) | 1-5 ms | LAN broker |
| Buffer page write | 5-20 µs | memcpy only |
The bottleneck is always the PLC protocol reads, not the buffer or transmission logic. A gateway polling 200 Modbus TCP tags can complete a full cycle in under 200 ms, leaving plenty of headroom for a 1-second polling interval.
For Modbus RTU (serial), the bottleneck shifts to the baud rate. At 9600 baud, a single register read takes ~15 ms including response. Polling 50 registers individually would take 750 ms — too close to a 1-second interval. This is why contiguous register grouping matters: reading 50 consecutive registers in a single request takes about 50 ms, a 15× improvement.
How machineCDN Implements These Patterns
machineCDN's edge gateway uses exactly these patterns — paged ring buffers with mutex-protected page management, QoS 1 MQTT with PUBACK-based buffer advancement, and both binary and JSON encoding depending on the deployment's bandwidth constraints.
The platform's gateway daemon runs on Linux-based edge hardware (including cellular routers like the Teltonika RUT series) and handles simultaneous Modbus RTU, Modbus TCP, and EtherNet/IP connections to mixed-vendor equipment. The buffer is sized during commissioning based on the expected outage duration — a 64 KB buffer holds roughly 4 hours of data at typical polling rates; a 512 KB buffer extends that to over 24 hours.
The result: plants running machineCDN don't lose telemetry during network outages. When connectivity recovers, the buffered data drains automatically and fills in the gaps in trending charts and analytics — no manual intervention, no missing data points.
Key Takeaways
- Use paged ring buffers, not unbounded queues — fixed memory, graceful overflow (oldest data dropped first)
- Protect buffer operations with a mutex, but never hold it during network I/O — microsecond lock durations keep producers and consumers non-blocking
- Track PUBACK per-message to prevent double-sending and enable reliable buffer advancement
- Implement a MQTT watchdog using last-delivery timestamps to catch zombie connections
- Batch by size OR time (whichever hits first) to balance bandwidth and latency
- Reset all tags hourly to guarantee complete snapshots and prevent drift
- Binary encoding saves 5-8× bandwidth with zero precision loss — essential for cellular-connected gateways
- Group contiguous Modbus registers into single requests — 15× faster than individual reads on RTU
Building a reliable IIoT edge gateway is fundamentally a systems programming challenge. The protocols, the buffering, the concurrency — each one is manageable alone, but getting them all right together, on constrained hardware, with zero tolerance for data loss, is what separates toy prototypes from production infrastructure.
See machineCDN's store-and-forward buffering in action with real factory data. Request a demo to explore the platform.