18 posts tagged with "edge-gateway"

Paged Ring Buffers for Industrial MQTT: How to Never Lose a Data Point [2026]

March 3, 2026 · 10 min read

Here's the scenario every IIoT engineer dreads: your edge gateway is collecting temperature, pressure, and vibration data from 200 tags across 15 PLCs. The cellular modem on the factory roof drops its connection — maybe for 30 seconds during a handover, maybe for 4 hours because a backhoe hit a fiber line. When connectivity returns, what happens to the data?

If your answer is "it's gone," you have a buffer management problem. And fixing it properly requires understanding paged ring buffers — the unsung hero of reliable industrial telemetry.

Why Naive Buffering Fails

The simplest approach — queue MQTT messages in memory and retry on reconnect — has three fatal flaws:

Memory exhaustion: A gateway reading 200 tags at 1-second intervals generates ~12,000 readings per minute. At ~100 bytes per JSON reading, that's 1.2 MB/minute. A 4-hour outage accumulates ~288 MB. Your 256 MB embedded gateway just died.
No delivery confirmation: MQTT QoS 1 guarantees "at least once" delivery, but the Mosquitto client library's in-flight message queue is finite. If you publish 50,000 messages into a disconnected client, most will be silently dropped by the client library's internal buffer long before the broker sees them.
Thundering herd on reconnect: When connectivity returns, dumping 288 MB of queued messages simultaneously will choke the cellular uplink (typically 1–5 Mbps), cause broker-side backpressure, and likely trigger another disconnect.

The Paged Ring Buffer Architecture

The solution is a fixed-size, page-based circular buffer that sits between the data collection layer and the MQTT client. Here's how it works:

Memory Layout

The buffer is allocated as a single contiguous block — typically 2 MB on an embedded gateway. This block is divided into equal-sized pages, where each page can hold one complete MQTT payload.

┌─────────────────────────────────────────────────┐
│                 2 MB Buffer Memory              │
├────────┬────────┬────────┬────────┬────────┬────┤
│ Page 0 │ Page 1 │ Page 2 │ Page 3 │ Page 4 │ ...│
│ 4 KB   │ 4 KB   │ 4 KB   │ 4 KB   │ 4 KB   │    │
└────────┴────────┴────────┴────────┴────────┴────┘

With a 4 KB page size and 2 MB total buffer, you get approximately 500 pages. Each page holds multiple MQTT messages packed sequentially.

Page States

Every page exists in exactly one of three states:

Free: Available for new data. Part of a singly-linked free list.
Work: Currently being filled with incoming data. Only one work page exists at a time.
Used: Full of data, waiting to be transmitted. Part of a singly-linked FIFO queue.

Free Pages → [P5] → [P6] → [P7] → null
Work Page  → [P3] (currently filling)
Used Pages → [P0] → [P1] → [P2] → null
              ↑ sending        waiting →

The Write Path

When a batch of PLC tag values arrives from the data collection layer:

Check the work page: If there's no current work page, pop one from the free list. If the free list is empty, steal the oldest used page (overflow — we're losing old data to make room for new data, which is the correct trade-off for operational monitoring).
Calculate fit: Each message is packed as: [4-byte message ID] [4-byte message size] [message payload]. Check if the current work page has enough remaining space for this overhead plus the payload.
If it fits: Write the message ID (initially zero — will be filled by the MQTT client), the size, and the payload. Advance the write pointer.
If it doesn't fit: Move the current work page to the tail of the used queue. Pop a new page from the free list (or steal from used queue). Write into the new page.

Page Internal Layout:
┌──────────┬──────────┬─────────────┬──────────┬──────────┬─────────────┐
│ msg_id_1 │ msg_sz_1 │ payload_1   │ msg_id_2 │ msg_sz_2 │ payload_2   │
│ (4 bytes) │ (4 bytes) │ (N bytes)  │ (4 bytes) │ (4 bytes) │ (M bytes)  │
└──────────┴──────────┴─────────────┴──────────┴──────────┴─────────────┘
                                     ↑ write_p (current position)

The Send Path

The MQTT send logic runs after every write operation and follows strict rules:

Check prerequisites: Connection must be up (connected == 1) AND no packet currently in-flight (packet_sent == 0). If either fails, do nothing — the data is safely buffered.
Select the send source: If there are used pages, send from the first one in the FIFO. If no used pages exist but the work page has data, promote the work page to used and send from it.
Read the next message from the current page's read pointer: extract the size, get the data pointer, and call mosquitto_publish() with QoS 1.
Mark packet as in-flight: Set packet_sent = 1. This is critical — only one message can be in-flight at a time. This prevents the thundering herd problem and ensures ordered delivery.
Wait for acknowledgment: The MQTT client library calls the publish callback when the broker confirms receipt (PUBACK for QoS 1). Only then do we advance the read pointer and send the next message.

The Acknowledgment Path

When the Mosquitto library fires the on_publish callback with a packet ID:

Verify the ID matches the in-flight message on the current used page
Advance the read pointer past the delivered message (skip message ID + size + payload bytes)
Check if page is fully delivered: If read_p >= write_p, move the page back to the free list
Clear the in-flight flag: Set packet_sent = 0
Immediately attempt to send the next message — this creates a natural flow control where messages are delivered as fast as the broker can acknowledge them

Delivery Flow:
                     publish()
  [Used Page] ──────────────────→ [MQTT Broker]
       ↑                               │
       │              PUBACK            │
       └────────────────────────────────┘
       advance read_p, try next

Thread Safety: The Mutex Dance

In a real gateway, data collection and MQTT delivery run on different threads. The PLC polling loop writes data every second, while the Mosquitto client library fires callbacks from its own network thread. Every buffer operation — add, send, acknowledge, connect, disconnect — must be wrapped in a mutex:

// Data collection thread:
mutex_lock(buffer)
  add_data(payload)
  try_send_next()  // opportunistic send
mutex_unlock(buffer)

// MQTT callback thread:
mutex_lock(buffer)
  mark_delivered(packet_id)
  try_send_next()  // chain next send
mutex_unlock(buffer)

The key insight is that try_send_next() is called from both threads — after every write (in case we're connected and idle) and after every acknowledgment (to chain the next message). This ensures maximum throughput without busy-waiting.

Handling Disconnects Gracefully

When the MQTT connection drops, two things happen:

The disconnect callback fires: Set connected = 0 and packet_sent = 0. The in-flight message is NOT lost — it's still in the page at the current read pointer. When connectivity returns, it will be re-sent.
Data keeps flowing in: The PLC polling loop doesn't stop. New data continues to fill pages. The used queue grows. If it fills all available pages, new pages will steal from the oldest used pages — but this only happens under extreme sustained outages.

When the connection re-establishes:

The connect callback fires: Set connected = 1 and trigger try_send_next()
Buffered data starts flowing: Messages are delivered in FIFO order, one at a time, with acknowledgment pacing

This means the broker receives data in chronological order, with timestamps embedded in each batch. Analytics systems downstream can seamlessly handle the gap — they see a burst of historical data followed by real-time data, all correctly timestamped.

The Cloud Watchdog: Detecting Silent Failures

There's a subtle failure mode: the MQTT connection appears healthy (no disconnect callback), but data isn't actually being delivered. This can happen with certain TLS middlebox issues, stale TCP connections that haven't timed out, or Azure IoT Hub token expirations.

The solution is a delivery watchdog:

Track the timestamp of the last successful packet delivery
On a periodic check (every 120 seconds), compare the current time against the last delivery timestamp
If no data has been delivered in 120 seconds AND the connection claims to be up, force a reconnection:
- Reset the MQTT configuration timestamp (triggers config reload)
- Clear the watchdog timer
- The main loop will detect the stale configuration and restart the MQTT client

if (now - last_delivery_time > 120s) AND (connected) {
    log("No data delivered in 120s — forcing MQTT reconnect")
    force_mqtt_restart()
}

This catches the "zombie connection" problem that plagues many IIoT deployments — the gateway thinks it's sending, but nothing is actually arriving at the cloud.

Binary vs. JSON: The Bandwidth Trade-off

The paged buffer doesn't care about the payload format — it stores raw bytes. But the choice between JSON and binary encoding has massive implications for buffer utilization:

JSON payload for one tag reading:

{"id":42,"values":[23.7],"ts":1709337600}

~45 bytes per reading.

Binary payload for the same reading:

Tag ID:    2 bytes (uint16)
Status:    1 byte
Value Cnt: 1 byte  
Value Sz:  1 byte
Value:     4 bytes (float32)
─────────────────────
Total:     9 bytes per reading

That's a 5x reduction. With batching (multiple readings per batch header), the per-reading overhead drops further because the timestamp and device identity are shared across a group of values.

On a cellular connection billing per megabyte, this isn't academic — it's the difference between $15/month and $75/month per gateway. On satellite connections (Iridium, Starlink maritime), it can be $50 vs. $250.

Binary Batch Wire Format

A binary batch on the wire follows this structure:

[0xF7]                          — 1 byte, magic/version marker
[num_groups]                    — 4 bytes, big-endian uint32
For each group:
  [timestamp]                   — 4 bytes, big-endian time_t
  [device_type]                 — 2 bytes, big-endian uint16
  [serial_number]               — 4 bytes, big-endian uint32
  [num_values]                  — 4 bytes, big-endian uint32
  For each value:
    [tag_id]                    — 2 bytes, big-endian uint16
    [status]                    — 1 byte (0 = OK, else error code)
    If status == 0:
      [values_count]            — 1 byte
      [value_size]              — 1 byte (1, 2, or 4)
      [values...]               — values_count × value_size bytes

A batch of 50 tag readings fits in ~600 bytes binary versus ~3,000 bytes JSON. Over a 4-hour outage with 200 tags at 60-second intervals, that's the difference between buffering ~4.8 MB (binary) versus ~24 MB (JSON) — within or far exceeding a typical gateway's buffer.

Sizing Your Buffer: The Math

For a given deployment, calculate your buffer needs:

Tags: 200
Read interval: 60 seconds
Binary payload per reading: ~9 bytes
Readings per minute: 200
Bytes per minute: 200 × 9 = 1,800 bytes
With batch overhead (~15 bytes per group): ~1,815 bytes/min

Buffer size: 2 MB = 2,097,152 bytes
Retention: 2,097,152 / 1,815 = ~1,155 minutes = ~19.2 hours

So a 2 MB buffer can hold approximately 19 hours of data for 200 tags at 60-second intervals using binary encoding. With JSON, that drops to ~3.8 hours. Size your buffer accordingly.

What machineCDN Does Differently

machineCDN's edge gateway implements this paged ring buffer architecture natively. Every gateway shipped includes:

Fixed 2 MB paged buffer with configurable page sizes matching the MQTT broker's maximum packet size
Automatic binary encoding for all telemetry — 5x bandwidth reduction over JSON
Single-message flow control with QoS 1 acknowledgment tracking — no thundering herd on reconnect
120-second delivery watchdog that detects zombie connections and forces reconnect
Graceful overflow handling — when buffer fills, oldest data is recycled (not newest), preserving the most recent operational state

For plant engineers, this means deploying a gateway on a cellular connection and knowing that a connectivity outage — whether 30 seconds or 12 hours — won't result in lost data. The buffer holds, the watchdog monitors, and data flows in order when the link comes back.

Key Takeaways

Never use unbounded queues for industrial telemetry buffering — use fixed-size paged buffers that degrade gracefully under memory pressure
One message in-flight at a time prevents the thundering herd problem and ensures ordered delivery
Always track delivery acknowledgments — don't just publish and forget; verify the broker received each packet before advancing
Implement a delivery watchdog — silent MQTT failures are harder to detect than disconnects
Use binary encoding — 5x bandwidth reduction means 5x longer buffer retention on the same memory
Size for your worst outage — calculate how much buffer you need based on tag count, interval, and the longest connectivity gap you expect
Thread safety is non-negotiable — data collection and MQTT delivery run concurrently; every buffer operation needs mutex protection

The paged ring buffer isn't exotic computer science — it's a practical engineering pattern that's been battle-tested in thousands of industrial deployments. The difference between a prototype IIoT system and a production one often comes down to exactly this kind of infrastructure.

Best Industrial Edge Gateway Software 2026: Connect Your Factory Floor to the Cloud

March 2, 2026 · 8 min read

MachineCDN Team

Industrial IoT Experts

The edge gateway is where the factory floor meets the cloud. It's the device — and the software running on it — that reads data from your PLCs, sensors, and controllers, processes it locally, and transmits it to cloud platforms for analytics, monitoring, and predictive maintenance.

In 2026, the edge gateway market has matured significantly. You no longer need to build custom solutions with Raspberry Pis and Python scripts. Purpose-built industrial edge gateway software handles protocol translation, data buffering, security, and cloud connectivity out of the box. But the options range from bare-metal connectivity platforms to full IIoT suites that include the edge as just one component.

Here's how to evaluate edge gateway software and which platforms deliver the most value for manufacturing environments.

Modbus Address Mapping Demystified: Register Ranges, Function Codes, and Sorted-Tag Optimization [2026]

March 2, 2026 · 10 min read

If you've ever stared at a PLC register map and wondered why address 400001 is actually register 0, or why your gateway reads the wrong data when you mix holding registers with input registers in the same request — this article is for you.

Modbus addressing is one of the most misunderstood aspects of industrial communication. The protocol itself is simple. The addressing conventions built on top of it are where engineers lose hours to debugging. And the optimization strategies for reading registers efficiently can cut your polling time in half.

Let's break it all down.

Modbus register address mapping and function codes

The Four Register Ranges

Modbus defines four distinct data spaces, each with its own addressing convention and access characteristics:

Range	Address Convention	Function Code (Read)	Function Code (Write)	Data Type	Access
Coils	0xxxxx (0–65535)	FC 01	FC 05/15	Single bit	Read/Write
Discrete Inputs	1xxxxx (100000–165535)	FC 02	—	Single bit	Read Only
Input Registers	3xxxxx (300000–365535)	FC 04	—	16-bit word	Read Only
Holding Registers	4xxxxx (400000–465535)	FC 03	FC 06/16	16-bit word	Read/Write

The Great Addressing Confusion

Here's the source of 90% of Modbus debugging pain: the convention addresses include a prefix digit that doesn't exist in the actual protocol.

When you see "address 400001" in a PLC manual, the actual register address sent over the wire is 0 (zero). The "4" prefix tells you it's a holding register (use FC 03), and the remaining digits are 1-indexed, so you subtract 1.

Convention Address → Wire Address
400001 → Holding Register 0 (FC 03)
400100 → Holding Register 99 (FC 03)
300001 → Input Register 0 (FC 04)
000001 → Coil 0 (FC 01)
100001 → Discrete Input 0 (FC 02)

Some PLC manufacturers use 0-indexed conventions (Modicon style), while others use 1-indexed (common in building automation). Always verify with the actual PLC documentation. Getting this wrong by one register means you're reading the wrong variable — which might look plausible but be subtly incorrect, leading to phantom issues that take days to diagnose.

Automatic Function Code Selection

A well-designed gateway should automatically determine the correct Modbus function code from the register address, eliminating manual configuration errors:

Address Range           → Function Code
0 – 65535              → FC 01 (Read Coils)
100000 – 165535        → FC 02 (Read Discrete Inputs)
300000 – 365535        → FC 04 (Read Input Registers)
400000 – 465535        → FC 03 (Read Holding Registers)

This means the engineer configuring the gateway only needs to specify the convention address from the PLC manual. The gateway strips the prefix, determines the function code, and calculates the wire address automatically.

To extract the wire address from the convention address:

if address >= 400000:
    wire_address = address - 400000
    function_code = 3
elif address >= 300000:
    wire_address = address - 300000
    function_code = 4
elif address >= 100000:
    wire_address = address - 100000
    function_code = 2
else:
    wire_address = address
    function_code = 1

16-Bit vs. 32-Bit Values: The Element Count Problem

Each Modbus register holds exactly 16 bits. But many real-world values are 32-bit (floats, unsigned integers, large counters). This requires reading two consecutive registers and combining them.

Element Count Configuration

When configuring a tag, you specify the element count — how many 16-bit registers to read for this tag:

Data Type	Element Count	Registers Read
`bool`	1	1 register (only LSB used)
`int8` / `uint8`	1	1 register (masked to 8 bits)
`int16` / `uint16`	1	1 register
`int32` / `uint32`	2	2 consecutive registers
`float` (IEEE 754)	2	2 consecutive registers

Byte Ordering (The Endianness Trap)

When combining two 16-bit registers into a 32-bit value, the byte order matters — and different PLCs use different conventions:

Big-endian (Modbus standard): Register N = high word, Register N+1 = low word

uint32_value = (register[N+1] << 16) | register[N]

Little-endian (some PLCs): Register N = low word, Register N+1 = high word

uint32_value = (register[N] << 16) | register[N+1]

For IEEE 754 floating-point values, the situation is even trickier. The modbus_get_float() function in libmodbus handles the byte swapping, but you need to know your PLC's byte order. Common byte orderings for 32-bit floats:

Order	Name	Register Layout
ABCD	Big-endian	Most common in Modicon/Schneider
DCBA	Little-endian	Some Allen-Bradley
BADC	Mid-big-endian	Siemens S7
CDAB	Mid-little-endian	Some Japanese PLCs

Pro tip: If you're getting nonsensical float values (like 1.4e38 when you expect 72.5), you almost certainly have a byte-order mismatch. Swap the register order and try again.

Handling 8-Bit Values from 16-Bit Registers

When a PLC stores an 8-bit value (bool, int8, uint8) in a 16-bit register, the value sits in the lower byte:

register_value = 0x00FF  # Only lower 8 bits are the value
int8_value = register_value & 0xFF
bool_value = register_value & 0x01

This is straightforward for single registers, but gets interesting when you're reading coils (FC 01/02). Coil reads return packed bits — 8 coils per byte — which need to be unpacked into individual boolean values.

Sorted-Tag Optimization: Contiguous Register Grouping

This is where theory meets performance. A naive implementation reads each tag individually:

# Naive: 10 tags = 10 separate Modbus requests
read_register(400100)  # Temperature
read_register(400101)  # Pressure  
read_register(400102)  # Flow rate
read_register(400103)  # Setpoint
...

Each Modbus request has overhead: a request frame (~8 bytes), a response frame (~8 bytes + data), and a round-trip delay (typically 5-50ms on a LAN, 50-400ms on serial). Ten individual reads means 10× the overhead.

The Contiguous Grouping Algorithm

Instead, a smart gateway sorts tags by address and groups contiguous registers into single multi-register reads:

# Optimized: 10 contiguous tags = 1 Modbus request
read_registers(400100, count=10)  # All 10 values in one shot

The algorithm works in four steps:

Step 1: Sort tags by address. When the configuration is loaded, tags are inserted into a sorted linked list ordered by their register address. This is critical — without sorting, you can't detect contiguous ranges.

Step 2: Identify contiguous groups. Walk the sorted list and group tags that satisfy ALL of these conditions:

Same function code (same register type)
Addresses are contiguous (tag N+1 starts where tag N ends)
Same polling interval
Total register count doesn't exceed the protocol limit

# Grouping logic
head = first_tag
registers = head.element_count
tags_in_group = 1

for each subsequent tag:
    if tag.function_code == head.function_code
       AND head.address + registers == tag.address
       AND head.interval == tag.interval
       AND registers < MAX_REGISTERS:
        # Attach to current group
        registers += tag.element_count
        tags_in_group += 1
    else:
        # Read current group, start new one
        read_registers(head.address, registers)
        head = tag
        registers = tag.element_count
        tags_in_group = 1

Step 3: Respect the maximum register limit. The Modbus spec allows up to 125 registers per read request (FC 03/04) or 2000 coils per read (FC 01/02). In practice, many PLCs have lower limits. A safe ceiling is 50 registers per request — this keeps response sizes under 100 bytes, reducing the chance of packet fragmentation and timeouts.

Step 4: Dispatch values. After the multi-register read returns, walk the buffer and dispatch values to individual tags based on their position in the group.

Performance Impact

On a typical Modbus TCP network with 5ms round-trip time:

Tags	Naive Approach	Grouped Approach	Speedup
10	50ms	5ms	10×
50	250ms	25ms (5 groups)	10×
100	500ms	50ms (10 groups)	10×

On Modbus RTU at 9600 baud, the difference is even more dramatic:

Tags	Naive	Grouped	Speedup
10	800ms	120ms	6.7×
50	4000ms	600ms	6.7×

For a gateway polling 50 tags every second, naive reads won't even fit in the time budget on serial. Grouping makes it feasible.

Handling Gaps

What about non-contiguous registers? If tags at addresses 400100 and 400105 need reading, you have two choices:

Read the gap: Request registers 400100–400105 (6 registers), discard the 4 unused ones. Wastes bandwidth but saves a round-trip.
Split into two reads: Two separate requests for 400100 (1 reg) and 400105 (1 reg). Two round-trips but no wasted data.

The breakeven point depends on your network. For gaps of 3 registers or fewer, reading the gap is usually faster. For larger gaps, split. A good heuristic:

if gap_size <= 3:
    read_through_gap()
else:
    split_into_separate_reads()

Inter-Read Delays: The 50ms Rule

After each contiguous group read, insert a small delay (typically 50ms) before the next read. This serves two purposes:

PLC processing time: Some PLCs need time between successive requests to maintain their scan cycle. Hammering them with back-to-back reads can cause watchdog timeouts.
Serial line recovery: On RS-485, the bus needs time to switch direction between request and response. Without this gap, you risk frame collisions on noisy lines.

read_group_1()
sleep(50ms)  # Let the PLC breathe
read_group_2()
sleep(50ms)
read_group_3()

This 50ms penalty per group is why minimizing the number of groups (through contiguous addressing) matters so much.

Change Detection: Read vs. Deliver

Reading a tag and delivering its value to the cloud are two separate decisions. Efficient gateways implement change detection to avoid delivering unchanged values:

if tag.compare_enabled:
    if new_value == tag.last_value:
        # Value unchanged — don't deliver
        update_read_timestamp()
        continue
    else:
        # Value changed — deliver and update
        deliver(tag, new_value)
        tag.last_value = new_value

Combined with interval-based polling, this creates a two-tier optimization:

Interval: Don't read the tag if less than N seconds have elapsed since the last read
Comparison: Don't deliver the value if it hasn't changed since the last delivery

The result: your MQTT bandwidth is dominated by actual state changes, not redundant repetitions of the same value.

Practical Configuration Example

Here's how a well-configured Modbus device might look in JSON:

{
  "protocol": "modbus-tcp",
  "device_type": 1017,
  "plctags": [
    {
      "name": "mold_temp_actual",
      "id": 1,
      "type": "float",
      "addr": 400100,
      "ecount": 2,
      "interval": 5,
      "compare": true
    },
    {
      "name": "mold_temp_setpoint",
      "id": 2,
      "type": "float",
      "addr": 400102,
      "ecount": 2,
      "interval": 60,
      "compare": true
    },
    {
      "name": "pump_running",
      "id": 3,
      "type": "bool",
      "addr": 000010,
      "ecount": 1,
      "interval": 1,
      "compare": true,
      "do_not_batch": true
    },
    {
      "name": "alarm_word",
      "id": 4,
      "type": "uint16",
      "addr": 400200,
      "ecount": 1,
      "interval": 1,
      "compare": true,
      "do_not_batch": true
    }
  ]
}

Notice:

The two temperature tags (400100, 400102) are contiguous and will be read as one 4-register block
They have different intervals (5s vs 60s), so they'll only group when both are due
The alarm word uses do_not_batch: true — it's delivered immediately on change, not held for the next batch
The pump running tag reads a coil (address < 100000), so it uses FC 01 — it can't group with the holding registers

How machineCDN Optimizes Modbus Polling

machineCDN's edge daemon automatically sorts tags by address at configuration load time, groups contiguous registers with matching intervals and function codes, and caps each read at 50 registers to prevent timeouts on older PLCs. The firmware handles the address-to-function-code mapping transparently — engineers configure tags using the convention addresses from the PLC manual, and the gateway handles the rest.

For devices with mixed protocols (e.g., a machine with EtherNet/IP on the main PLC and Modbus RTU on the temperature controller), machineCDN runs independent polling loops per protocol, each with its own connection management and buffering — so a failure on the serial line doesn't affect the EtherNet/IP connection.

Conclusion

Modbus addressing doesn't have to be painful. The key takeaways:

Understand the four address ranges and how they map to function codes — this eliminates the #1 source of configuration errors
Sort tags by address at configuration time to enable contiguous grouping
Group contiguous registers into single multi-register reads — the performance improvement is 5–10× on typical deployments
Handle 32-bit values carefully — element count and byte ordering are the two most common float-reading bugs
Cap register counts at 50 per read to stay within PLC capabilities
Use change detection to minimize cloud bandwidth — only deliver values that actually changed
Insert 50ms delays between group reads to respect PLC processing requirements

Master these patterns, and you'll spend your time analyzing production data instead of debugging communication failures.

PLC Connection Resilience: Link-State Monitoring and Automatic Recovery for IIoT Gateways [2026]

March 2, 2026 · 9 min read

In any industrial IIoT deployment, the connection between your edge gateway and the PLC is the most critical — and most fragile — link in the data pipeline. Ethernet cables get unplugged during maintenance. Serial lines pick up noise from VFDs. PLCs go into fault mode and stop responding. Network switches reboot.

If your edge software can't detect these failures, recover gracefully, and continue collecting data once the link comes back, you don't have a monitoring system — you have a monitoring hope.

This guide covers the real-world engineering patterns for building resilient PLC connections, drawn from years of deploying gateways on factory floors where "the network just works" is a fantasy.

PLC connection resilience and link-state monitoring

Why Connection Resilience Isn't Optional

Consider what happens when a Modbus TCP connection silently drops:

No timeout configured? Your gateway hangs on a blocking read forever.
No reconnection logic? You lose all telemetry until someone manually restarts the service.
No link-state tracking? Your cloud dashboard shows stale data as if the machine is still running — potentially masking a safety-critical failure.

In a 2024 survey of manufacturing downtime causes, 17% of IIoT data gaps were attributed to gateway-to-PLC communication failures that weren't detected for hours. The machines were fine. The monitoring was blind.

The Link-State Model

The foundation of connection resilience is treating the PLC connection as a state machine with explicit transitions:

┌──────────┐     connect()      ┌───────────┐
│           │ ─────────────────► │           │
│ DISCONNECTED │               │ CONNECTED   │
│  (state=0) │ ◄───────────────── │ (state=1)   │
│           │   error detected  │           │
└──────────┘                    └───────────┘

Every time the link state changes, the gateway should:

Log the transition with a precise timestamp
Deliver a special link-state tag upstream so the cloud platform knows the device is offline
Suppress stale data delivery — never send old values as if they're fresh
Trigger reconnection logic appropriate to the protocol

Link-State as a Virtual Tag

One of the most powerful patterns is treating link state as a virtual tag with its own ID — distinct from any physical PLC tag. When the connection drops, the gateway immediately publishes:

{
  "tag_id": "0x8001",
  "type": "bool",
  "value": false,
  "timestamp": 1709395200
}

When it recovers:

{
  "tag_id": "0x8001",
  "type": "bool",
  "value": true,
  "timestamp": 1709395260
}

This gives the cloud platform (and downstream analytics) an unambiguous signal. Dashboards can show a "Link Down" banner. Alert rules can fire. Downtime calculations can account for monitoring gaps vs. actual machine downtime.

The link-state tag should be delivered outside the normal batch — immediately, with QoS 1 — so it arrives even if the regular telemetry buffer is full.

Protocol-Specific Failure Detection

Modbus TCP

Modbus TCP connections fail in predictable ways. The key errors that indicate a lost connection:

Error	Meaning	Action
`ETIMEDOUT`	Response never arrived	Close + reconnect
`ECONNRESET`	PLC reset the TCP connection	Close + reconnect
`ECONNREFUSED`	PLC not listening on port 502	Close + retry after delay
`EPIPE`	Broken pipe (write to closed socket)	Close + reconnect
`EBADF`	File descriptor invalid	Destroy context + rebuild

When any of these occur, the correct sequence is:

Call flush() to clear any pending data in the socket buffer
Close the Modbus context
Set the link state to disconnected
Deliver the link-state tag
Wait before reconnecting (back-off strategy)
Re-create the TCP context and reconnect

Critical detail: After a connection failure, you should flush the serial/TCP buffer before attempting reads. Stale bytes in the buffer will cause desynchronization — the gateway reads the response to a previous request and interprets it as the current one, producing garbage data.

# Pseudocode — Modbus TCP recovery sequence
on_read_error(errno):
    modbus_flush(context)
    modbus_close(context)
    link_state = DISCONNECTED
    deliver_link_state(0)
    
    # Don't reconnect immediately — the PLC might be rebooting
    sleep(5 seconds)
    
    result = modbus_connect(context, ip, port)
    if result == OK:
        link_state = CONNECTED
        deliver_link_state(1)
        force_read_all_tags()  # Re-read everything to establish baseline

Modbus RTU (Serial)

Serial connections have additional failure modes that TCP doesn't:

Baud rate mismatch after PLC firmware update
Parity errors from electrical noise (especially near VFDs or welding equipment)
Silence on the line — device powered off or address conflict

For Modbus RTU, timeout tuning is critical:

Byte timeout: How long to wait between characters within a frame (typically 50ms)
Response timeout: How long to wait for the complete response after sending a request (typically 400ms for serial, can go lower for TCP)

If the response timeout is too short, you'll get false disconnections on slow PLCs. Too long, and a genuine failure takes forever to detect. For most industrial environments:

Byte timeout: 50ms (adjust for baud rates below 9600)
Response timeout: 400ms for RTU, 2000ms for TCP

After any RTU failure, flush the serial buffer. Serial buffers accumulate noise bytes during disconnections, and these will corrupt the first valid response after reconnection.

EtherNet/IP (CIP)

EtherNet/IP connections through the CIP protocol have a different failure signature. The libplctag library (commonly used for Allen-Bradley Micro800 and CompactLogix PLCs) returns specific error codes:

Error -32: Gateway cannot reach the PLC. This is the most common failure — it means the TCP connection to the gateway succeeded, but the CIP path to the PLC is broken.
Negative tag handle on create: The tag path is wrong, or the PLC program was downloaded with different tag names.

For EtherNet/IP, a smart approach is to count consecutive -32 errors and break the reading cycle after a threshold (typically 3 attempts):

# Stop hammering a dead connection
if consecutive_error_32_count >= MAX_ATTEMPTS:
    set_link_state(DISCONNECTED)
    break_reading_cycle()
    wait_and_retry()

This prevents the gateway from spending its entire polling cycle sending requests to a PLC that clearly isn't responding, which would delay reads from other devices on the same gateway.

Contiguous Read Failure Handling

When reading multiple Modbus registers in a contiguous block, a single failure takes out the entire block. The gateway should:

Attempt up to 3 retries for the same register block before declaring failure
Report failure status per-tag — each tag in the block gets an error status, not just the block head
Only deliver error status on state change — if a tag was already in error, don't spam the cloud with repeated error messages

# Retry logic for contiguous Modbus reads
read_count = 3
do:
    result = modbus_read_registers(start_addr, count, buffer)
    read_count -= 1
while (result != count) AND (read_count > 0)

if result != count:
    # All retries failed — mark entire block as error
    for each tag in block:
        if tag.last_status != ERROR:
            deliver_error(tag)
            tag.last_status = ERROR

The Hourly Reset Pattern

Here's a pattern that might seem counterintuitive: force-read all tags every hour, regardless of whether values changed.

Why? Because in long-running deployments, subtle drift accumulates:

A tag value might change during a brief disconnection and the change is missed
The PLC program might be updated with new initial values
Clock drift between the gateway and cloud can create gaps in time-series data

The hourly reset works by comparing the current system hour to the hour of the last reading. When the hour changes, all tags have their "read once" flag reset, forcing a complete re-read:

current_hour = localtime(now).hour
previous_hour = localtime(last_reading_time).hour

if current_hour != previous_hour:
    reset_all_tags()  # Clear "readed_once" flag
    log("Force reading all tags — hourly reset")

This creates natural "checkpoints" in your time-series data. If you ever need to verify that the gateway was functioning correctly at a given time, you can look for these hourly full-read batches.

Buffered Delivery: Surviving MQTT Disconnections

The PLC connection is only half the story. The other critical link is between the gateway and the cloud (typically over MQTT). When this link drops — cellular blackout, broker maintenance, DNS failure — you need to buffer data locally.

A well-designed telemetry buffer uses a page-based architecture:

┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ Free   │ │ Work   │ │ Used   │ │ Used   │
│ Page   │ │ Page   │ │ Page 1 │ │ Page 2 │
│        │ │ (writing) │ │ (queued) │ │ (sending)│
└────────┘ └────────┘ └────────┘ └────────┘

Work page: Currently being written to by the tag reader
Used pages: Full pages queued for MQTT delivery
Free pages: Delivered pages recycled for reuse
Overflow: When free pages run out, the oldest used page is sacrificed (data loss, but the system keeps running)

Each page tracks the MQTT packet ID assigned by the broker. When the broker confirms delivery (PUBACK for QoS 1), the page is moved to the free list. If the connection drops mid-delivery, the packet_sent flag is cleared, and delivery resumes from the same position when the connection recovers.

Buffer sizing rule of thumb: At least 3 pages, each sized to hold 60 seconds of telemetry data. For a typical 50-tag device polling every second, that's roughly 4KB per page. A 64KB buffer gives you ~16 pages — enough to survive a 15-minute connectivity gap.

Practical Deployment Checklist

Before deploying a gateway to the factory floor:

Test cable disconnection: Unplug the Ethernet cable. Does the gateway detect it within 10 seconds? Does it reconnect automatically?
Test PLC power cycle: Turn off the PLC. Does the gateway show "Link Down"? Turn it back on. Does data resume without manual intervention?
Test MQTT broker outage: Kill the broker. Does local buffering engage? Restart the broker. Does buffered data arrive in order?
Test serial noise (for RTU): Introduce a ground loop or VFD near the RS-485 cable. Does the gateway detect errors without crashing?
Test hourly reset: Wait for the hour boundary. Do all tags get re-read?
Monitor link-state transitions: Over 24 hours, how many disconnections occur? More than 2/hour indicates a cabling or electrical issue.

How machineCDN Handles This

machineCDN's edge gateway software implements all of these patterns natively. The daemon tracks link state as a first-class virtual tag, buffers telemetry through MQTT disconnections using page-based memory management, and automatically recovers connections across Modbus TCP, Modbus RTU, and EtherNet/IP — with protocol-specific retry logic tuned from thousands of deployments in plastics manufacturing, auxiliary equipment, and temperature control systems.

When you connect a machine through machineCDN, the platform knows the difference between "the machine stopped" and "the gateway lost connection" — a distinction that most IIoT platforms can't make.

Conclusion

Connection resilience isn't a feature you add later. It's an architectural decision that determines whether your IIoT deployment survives its first month on the factory floor. The core principles:

Track link state explicitly — as a deliverable tag, not just a log message
Handle each protocol's failure modes — Modbus TCP, RTU, and EtherNet/IP all fail differently
Buffer through MQTT outages — page-based buffers with delivery confirmation
Force-read periodically — hourly resets prevent drift and create verification checkpoints
Retry intelligently — back off after consecutive failures instead of hammering dead connections

Build these patterns into your gateway from day one, and your monitoring system will be as reliable as the machines it's watching.

Protocol Bridging: Translating Between EtherNet/IP, Modbus, and MQTT at the Edge [2026]

March 2, 2026 · 14 min read

Every manufacturing plant is multilingual. One production line speaks EtherNet/IP to Allen-Bradley PLCs. The next line uses Modbus TCP to communicate with temperature controllers. A legacy packaging machine only understands Modbus RTU over RS-485. And the cloud platform that needs to ingest all of this data speaks MQTT.

The edge gateway that bridges these protocols isn't just a translator — it's an architect of data quality. A poor bridge produces garbled timestamps, mistyped values, and silent data gaps. A well-designed bridge normalizes disparate protocols into a unified, timestamped data stream that cloud analytics can consume without post-processing.

This guide covers the engineering patterns that make protocol bridging work reliably at scale.

Multi-Protocol PLC Auto-Detection: Building Intelligent Edge Gateway Discovery [2026]

March 1, 2026 · 14 min read

Multi-Protocol Auto-Detection Edge Gateway

You plug a new edge gateway into a plant floor network. It needs to figure out what PLCs are on the wire, what protocol each one speaks, and how to read their data — all without a configuration file.

This is the auto-detection problem, and getting it right is the difference between a 10-minute commissioning process and a 2-day integration project. In this guide, we'll walk through exactly how industrial edge gateways probe, detect, and configure communication with PLCs across EtherNet/IP and Modbus TCP, drawing from real-world patterns used in production IIoT deployments.

Protocol Bridging: Translating Modbus to MQTT at the Industrial Edge [2026]

March 1, 2026 · 15 min read

Protocol Bridging Architecture

Every plant floor speaks Modbus. Every cloud platform speaks MQTT. The 20 inches of Ethernet cable between them is where industrial IoT projects succeed or fail.

Protocol bridging — the act of reading data from one industrial protocol and publishing it via another — sounds trivial on paper. Poll a register, format a JSON payload, publish to a topic. Three lines of pseudocode. But the engineers who've actually deployed these bridges at scale know the truth: the hard problems aren't in the translation. They're in the timing, the buffering, the failure modes, and the dozens of edge cases that only surface when a PLC reboots at 2 AM while your MQTT broker is mid-failover.

This guide covers the real engineering of Modbus-to-MQTT bridges — from register-level data mapping to store-and-forward architectures that survive weeks of disconnection.

Why Bridging Is Harder Than It Looks

Modbus and MQTT are fundamentally different communication paradigms. Understanding these differences is critical to building a bridge that doesn't collapse under production conditions.

Modbus is synchronous and polled. The master (your gateway) initiates every transaction. It sends a request frame, waits for a response, processes the data, and moves on. There's no concept of subscriptions, push notifications, or asynchronous updates. If you want a value, you ask for it. Every. Single. Time.

MQTT is asynchronous and event-driven. Publishers send messages whenever they have data. Subscribers receive messages whenever they arrive. The broker decouples producers from consumers. There's no concept of polling — data flows when it's ready.

Bridging these two paradigms means your gateway must act as a Modbus master on one side (issuing timed read requests) and an MQTT client on the other (publishing messages asynchronously). The gateway is the only component that speaks both languages, and it bears the full burden of timing, error handling, and data integrity.

The Timing Mismatch

Modbus RTU on RS-485 at 9600 baud takes roughly 20ms per single-register transaction (request frame + inter-frame delay + response frame + turnaround time). Reading 100 registers individually would take 2 seconds — an eternity if you need sub-second update rates.

Modbus TCP eliminates the serial timing constraints but introduces TCP socket management, connection timeouts, and the possibility of the PLC's TCP stack running out of connections (most PLCs support only 4–8 simultaneous TCP connections).

MQTT, meanwhile, can handle thousands of messages per second. The bottleneck is never the MQTT side — it's always the Modbus side. Your bridge architecture must respect the slower protocol's constraints while maximizing throughput.

Register Mapping: The Foundation

The first engineering decision is how to map Modbus registers to MQTT topics and payloads. There are three common approaches, each with trade-offs.

Approach 1: One Register, One Message

Topic: plant/line3/plc1/holding/40001
Payload: {"value": 1847, "ts": 1709312400, "type": "uint16"}

Pros: Simple, granular, easy to subscribe to individual data points. Cons: Catastrophic at scale. 200 registers means 200 MQTT publishes per poll cycle. At a 1-second poll rate, that's 200 messages/second — sustainable for the broker, but wasteful in bandwidth and processing overhead on constrained gateways.

Approach 2: Batched JSON Messages

Topic: plant/line3/plc1/batch
Payload: {
  "ts": 1709312400,
  "device_type": 1010,
  "tags": [
    {"id": 1, "value": 1847, "type": "uint16"},
    {"id": 2, "value": 23.45, "type": "float"},
    {"id": 3, "value": true, "type": "bool"}
  ]
}

Pros: Drastically fewer MQTT messages. One publish carries an entire poll cycle's worth of data. Cons: JSON encoding adds CPU overhead on embedded gateways. Payload size can grow large if you have hundreds of tags.

Approach 3: Binary-Encoded Batches

Instead of JSON, encode tag values in a compact binary format: a header with timestamp and device metadata, followed by packed tag records (tag ID + status + type + value). A single 16-bit register value takes 2 bytes in binary vs. ~30 bytes in JSON.

Pros: Minimum bandwidth. Critical for cellular-connected gateways where data costs money per megabyte. Cons: Requires matching decoders on the cloud side. Harder to debug.

The right approach depends on your constraints. For Ethernet-connected gateways with ample bandwidth, batched JSON is the sweet spot. For cellular or satellite links, binary encoding can reduce data costs by 10–15x.

Contiguous Register Coalescing

The single most impactful optimization in any Modbus-to-MQTT bridge is contiguous register coalescing: instead of reading registers one at a time, group adjacent registers into a single Modbus read request.

Consider a tag list where you need registers at addresses 40100, 40101, 40102, 40103, and 40110. A naive implementation makes 5 read requests. A smart bridge recognizes that 40100–40103 are contiguous and reads them in one Read Holding Registers (function code 03) call with a quantity of 4. That's 2 transactions instead of 5.

The coalescing logic must respect several constraints:

Same function code. You can't coalesce a coil read (FC 01) with a holding register read (FC 03). The bridge must group tags by their Modbus register type — coils (0xxxxx), discrete inputs (1xxxxx), input registers (3xxxxx), and holding registers (4xxxxx) — and coalesce within each group.
Maximum register count per transaction. The Modbus specification limits a single read to 125 registers (for 16-bit registers) or 2000 coils. In practice, keeping blocks under 50 registers reduces the risk of timeout errors on slower PLCs.
Addressing gaps. If registers 40100 and 40150 both need reading, coalescing them into a single 51-register read wastes 49 registers worth of response data. Set a maximum gap threshold (e.g., 10 registers) — if the gap exceeds it, split into separate transactions.
Same polling interval. Tags polled every second shouldn't be grouped with tags polled every 60 seconds. Coalescing must respect per-tag timing configuration.

// Pseudocode: Coalescing algorithm
sort tags by address ascending
group_head = first_tag
group_count = 1

for each subsequent tag:
    if tag.function_code == group_head.function_code
       AND tag.address == group_head.address + group_registers
       AND group_registers < MAX_BLOCK_SIZE
       AND tag.interval == group_head.interval:
        // extend current group
        group_registers += tag.elem_count
        group_count += 1
    else:
        // read current group, start new one
        read_modbus_block(group_head, group_count, group_registers)
        group_head = tag
        group_count = 1

In production deployments, contiguous coalescing routinely reduces Modbus transaction counts by 5–10x, which directly translates to faster poll cycles and fresher data.

Data Type Handling: Where the Devils Live

Modbus registers are 16-bit words. Everything else — 32-bit integers, IEEE 754 floats, booleans packed into bit fields — is a convention imposed by the PLC programmer. Your bridge must handle all of these correctly.

32-Bit Values Across Two Registers

A 32-bit float or integer spans two consecutive 16-bit Modbus registers. The critical question: which register contains the high word?

There's no standard. Some PLCs use big-endian word order (high word first, often called "ABCD" byte order). Others use little-endian word order (low word first, "CDAB"). Some use mid-endian orders ("BADC" or "DCBA"). You must know your PLC's convention, or your 23.45°C temperature reading becomes 1.7e+38 garbage.

For IEEE 754 floats specifically, the conversion from two 16-bit registers to a float is:

// Big-endian word order (ABCD)
float_value = ieee754_decode(register[n] << 16 | register[n+1])

// Little-endian word order (CDAB)
float_value = ieee754_decode(register[n+1] << 16 | register[n])

Production bridges must support configurable byte/word ordering on a per-tag basis, because it's common to have PLCs from different manufacturers on the same network.

Boolean Extraction From Status Words

PLCs frequently pack multiple boolean states into a single 16-bit register — machine running, alarm active, door open, etc. Extracting individual bits requires configurable shift-and-mask operations:

bit_value = (register_value >> shift_count) & mask

Where shift_count identifies the bit position (0–15) and mask is typically 0x01 for a single bit. The bridge's tag configuration should support this as a first-class feature, not a post-processing hack.

Type Safety Across the Bridge

When values cross from Modbus to MQTT, type information must be preserved. A uint16 register value of 65535 means something very different from a signed int16 value of -1 — even though the raw bits are identical. Your MQTT payload must carry the type alongside the value, whether in JSON field names or binary format headers.

Connection Resilience: The Store-and-Forward Pattern

The Modbus side of a protocol bridge is local — wired directly to PLCs over Ethernet or RS-485. It rarely fails. The MQTT side connects to a remote broker over a WAN link that will fail. Cellular drops out. VPN tunnels collapse. Cloud brokers restart for maintenance.

A production bridge must implement store-and-forward: continue reading from Modbus during MQTT outages, buffer the data locally, and drain the buffer when connectivity returns.

Page-Based Ring Buffers

The most robust buffering approach for embedded gateways uses a page-based ring buffer in pre-allocated memory:

Format a fixed memory region into equal-sized pages at startup.
Write incoming Modbus data to the current "work page." When a page fills, move it to the "used" queue.
Send pages from the "used" queue to MQTT, one message at a time. Wait for the MQTT publish acknowledgment (at QoS 1) before advancing the read pointer.
Recycle fully-delivered pages back to the "free" list.

If the MQTT connection drops:

Stop sending, but keep writing to new pages.
If all pages fill up (true buffer overflow), start overwriting the oldest used page. You lose the oldest data, but never the newest.

This design has several properties that matter for industrial deployments:

No dynamic memory allocation. The entire buffer is pre-allocated. No malloc, no fragmentation, no out-of-memory crashes at 3 AM.
Bounded memory usage. You know exactly how much RAM the buffer consumes. Critical on gateways with 64–256 MB.
Delivery guarantees. Each page tracks its own read pointer. If the gateway crashes mid-delivery, the page is re-sent on restart (at-least-once semantics).

How Long Can You Buffer?

Quick math: A gateway reading 100 tags every 5 seconds generates roughly 2 KB of batched JSON per poll cycle. That's 24 KB/minute, 1.4 MB/hour, 34 MB/day. A 256 MB buffer holds 7+ days of data. In binary format, that extends to 50+ days.

For most industrial applications, 24–48 hours of buffering is sufficient to survive maintenance windows, network outages, and firmware upgrades.

MQTT Connection Management

The MQTT side of the bridge deserves careful engineering. Industrial connections aren't like web applications — they run for months without restart, traverse multiple NATs and firewalls, and must recover automatically from every failure mode.

Async Connection With Threaded Reconnect

Never block the Modbus polling loop waiting for an MQTT connection. The correct architecture uses a separate thread for MQTT connection management:

The main thread polls Modbus on a tight timer and writes data to the buffer.
A connection thread handles MQTT connect/reconnect attempts asynchronously.
The buffer drains automatically when the MQTT connection becomes available.

This separation ensures that a 30-second MQTT connection timeout doesn't stall your 1-second Modbus poll cycle. Data keeps flowing into the buffer regardless of MQTT state.

Reconnect Strategy

Use a fixed reconnect delay (5 seconds works well for most deployments) rather than exponential backoff. Industrial MQTT connections are long-lived — the overhead of a 5-second retry is negligible compared to the cost of missing data during a 60-second exponential backoff.

However, protect against connection storms: if the broker is down for an extended period, ensure reconnect attempts don't overwhelm the gateway's CPU or the broker's TCP listener.

TLS Certificate Management

Production MQTT bridges almost always use TLS (port 8883 rather than 1883). The bridge must handle:

Certificate expiration. Monitor the TLS certificate file's modification timestamp. If the cert file changes on disk, tear down the current MQTT connection and reinitialize with the new certificate. Don't wait for the existing connection to fail — proactively reconnect.
SAS token rotation. When using Azure IoT Hub or similar services with time-limited tokens, parse the token's expiration timestamp and reconnect before it expires.
CA certificate bundles. Embedded gateways often ship with minimal CA stores. Ensure your IoT hub's root CA is explicitly included in the gateway's certificate chain.

Change-of-Value vs. Periodic Reporting

Not all tags need the same reporting strategy. A bridge should support both:

Periodic reporting publishes every tag value at a fixed interval, regardless of whether the value changed. Simple, predictable, but wasteful for slowly-changing values like ambient temperature or firmware version.

Change-of-value (COV) reporting compares each newly read value against the previous value and only publishes when a change is detected. This dramatically reduces MQTT traffic for boolean states (machine on/off), setpoints, and alarm registers that change infrequently.

The implementation stores the last-read value for each tag and performs a comparison before deciding whether to publish:

if tag.compare_enabled:
    if new_value != tag.last_value:
        publish(tag, new_value)
        tag.last_value = new_value
else:
    publish(tag, new_value)  # always publish

A hybrid approach works best: use COV for digital signals and alarm words, periodic for analog measurements like temperature and pressure. Some tags (critical alarms, safety interlocks) should always be published immediately — bypassing both the normal comparison logic and the batching system — to minimize latency.

Calculated and Dependent Tags

Real-world PLCs don't always expose data in the format you need. A bridge should support calculated tags — values derived from raw register data through mathematical or bitwise operations.

Common patterns include:

Bit extraction from status words. A 16-bit register contains 16 individual boolean states. The bridge extracts each bit as a separate tag using shift-and-mask operations.
Scaling and offset. Raw register value 4000 represents 400.0°F when divided by 10. The bridge applies a linear transformation (value × k1 / k2) to produce engineering units.
Dependent tag chains. When a parent tag's value changes, the bridge automatically reads and publishes a set of dependent tags. Example: when the "recipe number" register changes, immediately read all recipe parameter registers.

These calculations must happen at the edge, inside the bridge, before data is published to MQTT. Pushing raw register values to the cloud and calculating there wastes bandwidth and adds latency.

Link State Monitoring

A bridge should publish its own health status alongside machine data. The most critical metric is link state — whether the gateway can actually communicate with the PLC.

When a Modbus read fails with a connection error (timeout, connection reset, connection refused, or broken pipe), the bridge should:

Set the link state to "down" and publish immediately (not batched).
Close the existing Modbus connection and attempt reconnection.
Continue publishing link-down status at intervals so the cloud system knows the gateway is alive but the PLC is unreachable.
When reconnection succeeds, set link state to "up" and force-read all tags to re-establish baseline values.

This link state telemetry is invaluable for distinguishing between "the machine is off" and "the network cable is unplugged" — two very different problems that look identical without gateway-level diagnostics.

How machineCDN Handles Protocol Bridging

machineCDN's edge gateway was built from the ground up for exactly this problem. The gateway daemon handles Modbus RTU (serial), Modbus TCP, and EtherNet/IP on the device side, and publishes all data over MQTT with TLS to the cloud.

Key architectural decisions in the machineCDN gateway:

Pre-allocated page buffer with configurable page sizes for zero-allocation runtime operation.
Automatic contiguous register coalescing that respects function code boundaries, tag intervals, and register limits.
Per-tag COV comparison with an option to bypass batching for latency-critical values.
Calculated tag chains for bit extraction and dependent tag reads.
Hourly full refresh — every 60 minutes, the gateway resets all COV baselines and publishes every tag value, ensuring the cloud always has a complete snapshot even if individual change events were missed.
Async MQTT reconnection with certificate hot-reloading and SAS token expiration monitoring.

The result is a bridge that reliably moves data from plant-floor PLCs to cloud dashboards with sub-second latency during normal operation and zero data loss during outages lasting hours or days.

Deployment Checklist

Before deploying a Modbus-to-MQTT bridge in production:

Map every register — document address, data type, byte order, scaling factor, and engineering units
Set appropriate poll intervals — 1s for process-critical, 5–60s for environmental, 300s+ for configuration data
Size the buffer — calculate daily data volume and ensure the buffer can hold 24+ hours
Test byte ordering — verify float and 32-bit integer decoding against known PLC values before trusting the data
Configure COV vs periodic — boolean and alarm tags = COV, analog = periodic
Enable TLS — never run MQTT unencrypted on production networks
Monitor link state — alert on PLC disconnections, not just missing data
Test failover — unplug the WAN cable for 4 hours and verify data drains correctly when it reconnects

Protocol bridging isn't glamorous work. It's plumbing. But it's the plumbing that determines whether your IIoT deployment delivers reliable data or expensive noise. Get the bridge right, and everything downstream — analytics, dashboards, predictive maintenance — just works.

Protocol Bridging in IIoT: Translating Between Modbus, EtherNet/IP, and MQTT at the Edge [2026]

February 28, 2026 · 14 min read

Every manufacturing plant is a polyglot. Modbus RTU on the serial bus. Modbus TCP on the local network. EtherNet/IP talking to Allen-Bradley PLCs. And now someone wants all of that data in the cloud via MQTT.

Protocol bridging at the edge is the unglamorous but critical work that makes IIoT actually function. Get it right, and you have a seamless data pipeline from a 20-year-old Modbus RTU device to a modern cloud analytics platform. Get it wrong, and you have data gaps, crashed connections, and a plant floor that's lost trust in your "smart factory" initiative.

This guide covers the architecture, pitfalls, and hard-won lessons from building protocol bridges that run in production — not just in proof-of-concepts.

Why Naive Buffering Fails​

The Paged Ring Buffer Architecture​

Memory Layout​

Page States​

The Write Path​

The Send Path​

The Acknowledgment Path​

Thread Safety: The Mutex Dance​

Handling Disconnects Gracefully​

The Cloud Watchdog: Detecting Silent Failures​

Binary vs. JSON: The Bandwidth Trade-off​

Binary Batch Wire Format​

Sizing Your Buffer: The Math​

What machineCDN Does Differently​

Key Takeaways​

The Four Register Ranges​

The Great Addressing Confusion​

Automatic Function Code Selection​

16-Bit vs. 32-Bit Values: The Element Count Problem​

Element Count Configuration​

Byte Ordering (The Endianness Trap)​

Handling 8-Bit Values from 16-Bit Registers​

Sorted-Tag Optimization: Contiguous Register Grouping​

The Contiguous Grouping Algorithm​

Performance Impact​

Handling Gaps​

Inter-Read Delays: The 50ms Rule​

Change Detection: Read vs. Deliver​

Practical Configuration Example​

How machineCDN Optimizes Modbus Polling​

Conclusion​

Why Connection Resilience Isn't Optional​

The Link-State Model​

Link-State as a Virtual Tag​

Protocol-Specific Failure Detection​

Modbus TCP​

Modbus RTU (Serial)​

EtherNet/IP (CIP)​

Contiguous Read Failure Handling​

The Hourly Reset Pattern​

Buffered Delivery: Surviving MQTT Disconnections​

Practical Deployment Checklist​

How machineCDN Handles This​

Conclusion​

Why Bridging Is Harder Than It Looks​

The Timing Mismatch​

Register Mapping: The Foundation​

Approach 1: One Register, One Message​

Approach 2: Batched JSON Messages​

Approach 3: Binary-Encoded Batches​

Contiguous Register Coalescing​

Data Type Handling: Where the Devils Live​

32-Bit Values Across Two Registers​

Boolean Extraction From Status Words​

Type Safety Across the Bridge​

Connection Resilience: The Store-and-Forward Pattern​

Page-Based Ring Buffers​

How Long Can You Buffer?​

MQTT Connection Management​

Async Connection With Threaded Reconnect​

Reconnect Strategy​

TLS Certificate Management​

Change-of-Value vs. Periodic Reporting​

Calculated and Dependent Tags​

Link State Monitoring​

How machineCDN Handles Protocol Bridging​

Deployment Checklist​

Why Naive Buffering Fails

The Paged Ring Buffer Architecture

Memory Layout

Page States

The Write Path

The Send Path

The Acknowledgment Path

Thread Safety: The Mutex Dance

Handling Disconnects Gracefully

The Cloud Watchdog: Detecting Silent Failures

Binary vs. JSON: The Bandwidth Trade-off

Binary Batch Wire Format

Sizing Your Buffer: The Math

What machineCDN Does Differently

Key Takeaways

The Four Register Ranges

The Great Addressing Confusion

Automatic Function Code Selection

16-Bit vs. 32-Bit Values: The Element Count Problem

Element Count Configuration

Byte Ordering (The Endianness Trap)

Handling 8-Bit Values from 16-Bit Registers

Sorted-Tag Optimization: Contiguous Register Grouping

The Contiguous Grouping Algorithm

Performance Impact

Handling Gaps

Inter-Read Delays: The 50ms Rule

Change Detection: Read vs. Deliver

Practical Configuration Example

How machineCDN Optimizes Modbus Polling

Conclusion

Why Connection Resilience Isn't Optional

The Link-State Model

Link-State as a Virtual Tag

Protocol-Specific Failure Detection

Modbus TCP

Modbus RTU (Serial)

EtherNet/IP (CIP)

Contiguous Read Failure Handling

The Hourly Reset Pattern

Buffered Delivery: Surviving MQTT Disconnections

Practical Deployment Checklist

How machineCDN Handles This

Conclusion

Why Bridging Is Harder Than It Looks

The Timing Mismatch

Register Mapping: The Foundation

Approach 1: One Register, One Message

Approach 2: Batched JSON Messages

Approach 3: Binary-Encoded Batches

Contiguous Register Coalescing

Data Type Handling: Where the Devils Live

32-Bit Values Across Two Registers

Boolean Extraction From Status Words

Type Safety Across the Bridge

Connection Resilience: The Store-and-Forward Pattern

Page-Based Ring Buffers

How Long Can You Buffer?

MQTT Connection Management

Async Connection With Threaded Reconnect

Reconnect Strategy

TLS Certificate Management

Change-of-Value vs. Periodic Reporting

Calculated and Dependent Tags

Link State Monitoring

How machineCDN Handles Protocol Bridging

Deployment Checklist