187 posts tagged with "Industrial IoT"

Industrial Internet of Things insights and best practices

PLC Alarm Word Decoding: How to Extract Bit-Level Alarm States for IIoT Monitoring [2026]

March 2, 2026 · 12 min read

Most plant engineers understand alarms at the HMI level — a red indicator lights up, a buzzer sounds, someone walks over to the machine. But when you connect PLCs to an IIoT platform for remote monitoring, you hit a fundamental data representation problem: PLCs don't store alarms as individual boolean values. They pack them into 16-bit registers called alarm words.

A single uint16 register can encode 16 different alarm conditions. A chiller with 10 refrigeration circuits might have 30+ alarm word registers — encoding hundreds of individual alarm states. If your IIoT platform doesn't understand this encoding, you'll either miss critical alarms or drown in meaningless raw register values.

This guide explains how alarm word decoding works at the edge, why it matters for reliable remote monitoring, and how to implement it without flooding your cloud platform with unnecessary data.

PLC Connection Resilience: Link-State Monitoring and Automatic Recovery for IIoT Gateways [2026]

March 2, 2026 · 9 min read

In any industrial IIoT deployment, the connection between your edge gateway and the PLC is the most critical — and most fragile — link in the data pipeline. Ethernet cables get unplugged during maintenance. Serial lines pick up noise from VFDs. PLCs go into fault mode and stop responding. Network switches reboot.

If your edge software can't detect these failures, recover gracefully, and continue collecting data once the link comes back, you don't have a monitoring system — you have a monitoring hope.

This guide covers the real-world engineering patterns for building resilient PLC connections, drawn from years of deploying gateways on factory floors where "the network just works" is a fantasy.

PLC connection resilience and link-state monitoring

Why Connection Resilience Isn't Optional

Consider what happens when a Modbus TCP connection silently drops:

No timeout configured? Your gateway hangs on a blocking read forever.
No reconnection logic? You lose all telemetry until someone manually restarts the service.
No link-state tracking? Your cloud dashboard shows stale data as if the machine is still running — potentially masking a safety-critical failure.

In a 2024 survey of manufacturing downtime causes, 17% of IIoT data gaps were attributed to gateway-to-PLC communication failures that weren't detected for hours. The machines were fine. The monitoring was blind.

The Link-State Model

The foundation of connection resilience is treating the PLC connection as a state machine with explicit transitions:

┌──────────┐     connect()      ┌───────────┐
│           │ ─────────────────► │           │
│ DISCONNECTED │               │ CONNECTED   │
│  (state=0) │ ◄───────────────── │ (state=1)   │
│           │   error detected  │           │
└──────────┘                    └───────────┘

Every time the link state changes, the gateway should:

Log the transition with a precise timestamp
Deliver a special link-state tag upstream so the cloud platform knows the device is offline
Suppress stale data delivery — never send old values as if they're fresh
Trigger reconnection logic appropriate to the protocol

Link-State as a Virtual Tag

One of the most powerful patterns is treating link state as a virtual tag with its own ID — distinct from any physical PLC tag. When the connection drops, the gateway immediately publishes:

{
  "tag_id": "0x8001",
  "type": "bool",
  "value": false,
  "timestamp": 1709395200
}

When it recovers:

{
  "tag_id": "0x8001",
  "type": "bool",
  "value": true,
  "timestamp": 1709395260
}

This gives the cloud platform (and downstream analytics) an unambiguous signal. Dashboards can show a "Link Down" banner. Alert rules can fire. Downtime calculations can account for monitoring gaps vs. actual machine downtime.

The link-state tag should be delivered outside the normal batch — immediately, with QoS 1 — so it arrives even if the regular telemetry buffer is full.

Protocol-Specific Failure Detection

Modbus TCP

Modbus TCP connections fail in predictable ways. The key errors that indicate a lost connection:

Error	Meaning	Action
`ETIMEDOUT`	Response never arrived	Close + reconnect
`ECONNRESET`	PLC reset the TCP connection	Close + reconnect
`ECONNREFUSED`	PLC not listening on port 502	Close + retry after delay
`EPIPE`	Broken pipe (write to closed socket)	Close + reconnect
`EBADF`	File descriptor invalid	Destroy context + rebuild

When any of these occur, the correct sequence is:

Call flush() to clear any pending data in the socket buffer
Close the Modbus context
Set the link state to disconnected
Deliver the link-state tag
Wait before reconnecting (back-off strategy)
Re-create the TCP context and reconnect

Critical detail: After a connection failure, you should flush the serial/TCP buffer before attempting reads. Stale bytes in the buffer will cause desynchronization — the gateway reads the response to a previous request and interprets it as the current one, producing garbage data.

# Pseudocode — Modbus TCP recovery sequence
on_read_error(errno):
    modbus_flush(context)
    modbus_close(context)
    link_state = DISCONNECTED
    deliver_link_state(0)
    
    # Don't reconnect immediately — the PLC might be rebooting
    sleep(5 seconds)
    
    result = modbus_connect(context, ip, port)
    if result == OK:
        link_state = CONNECTED
        deliver_link_state(1)
        force_read_all_tags()  # Re-read everything to establish baseline

Modbus RTU (Serial)

Serial connections have additional failure modes that TCP doesn't:

Baud rate mismatch after PLC firmware update
Parity errors from electrical noise (especially near VFDs or welding equipment)
Silence on the line — device powered off or address conflict

For Modbus RTU, timeout tuning is critical:

Byte timeout: How long to wait between characters within a frame (typically 50ms)
Response timeout: How long to wait for the complete response after sending a request (typically 400ms for serial, can go lower for TCP)

If the response timeout is too short, you'll get false disconnections on slow PLCs. Too long, and a genuine failure takes forever to detect. For most industrial environments:

Byte timeout: 50ms (adjust for baud rates below 9600)
Response timeout: 400ms for RTU, 2000ms for TCP

After any RTU failure, flush the serial buffer. Serial buffers accumulate noise bytes during disconnections, and these will corrupt the first valid response after reconnection.

EtherNet/IP (CIP)

EtherNet/IP connections through the CIP protocol have a different failure signature. The libplctag library (commonly used for Allen-Bradley Micro800 and CompactLogix PLCs) returns specific error codes:

Error -32: Gateway cannot reach the PLC. This is the most common failure — it means the TCP connection to the gateway succeeded, but the CIP path to the PLC is broken.
Negative tag handle on create: The tag path is wrong, or the PLC program was downloaded with different tag names.

For EtherNet/IP, a smart approach is to count consecutive -32 errors and break the reading cycle after a threshold (typically 3 attempts):

# Stop hammering a dead connection
if consecutive_error_32_count >= MAX_ATTEMPTS:
    set_link_state(DISCONNECTED)
    break_reading_cycle()
    wait_and_retry()

This prevents the gateway from spending its entire polling cycle sending requests to a PLC that clearly isn't responding, which would delay reads from other devices on the same gateway.

Contiguous Read Failure Handling

When reading multiple Modbus registers in a contiguous block, a single failure takes out the entire block. The gateway should:

Attempt up to 3 retries for the same register block before declaring failure
Report failure status per-tag — each tag in the block gets an error status, not just the block head
Only deliver error status on state change — if a tag was already in error, don't spam the cloud with repeated error messages

# Retry logic for contiguous Modbus reads
read_count = 3
do:
    result = modbus_read_registers(start_addr, count, buffer)
    read_count -= 1
while (result != count) AND (read_count > 0)

if result != count:
    # All retries failed — mark entire block as error
    for each tag in block:
        if tag.last_status != ERROR:
            deliver_error(tag)
            tag.last_status = ERROR

The Hourly Reset Pattern

Here's a pattern that might seem counterintuitive: force-read all tags every hour, regardless of whether values changed.

Why? Because in long-running deployments, subtle drift accumulates:

A tag value might change during a brief disconnection and the change is missed
The PLC program might be updated with new initial values
Clock drift between the gateway and cloud can create gaps in time-series data

The hourly reset works by comparing the current system hour to the hour of the last reading. When the hour changes, all tags have their "read once" flag reset, forcing a complete re-read:

current_hour = localtime(now).hour
previous_hour = localtime(last_reading_time).hour

if current_hour != previous_hour:
    reset_all_tags()  # Clear "readed_once" flag
    log("Force reading all tags — hourly reset")

This creates natural "checkpoints" in your time-series data. If you ever need to verify that the gateway was functioning correctly at a given time, you can look for these hourly full-read batches.

Buffered Delivery: Surviving MQTT Disconnections

The PLC connection is only half the story. The other critical link is between the gateway and the cloud (typically over MQTT). When this link drops — cellular blackout, broker maintenance, DNS failure — you need to buffer data locally.

A well-designed telemetry buffer uses a page-based architecture:

┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ Free   │ │ Work   │ │ Used   │ │ Used   │
│ Page   │ │ Page   │ │ Page 1 │ │ Page 2 │
│        │ │ (writing) │ │ (queued) │ │ (sending)│
└────────┘ └────────┘ └────────┘ └────────┘

Work page: Currently being written to by the tag reader
Used pages: Full pages queued for MQTT delivery
Free pages: Delivered pages recycled for reuse
Overflow: When free pages run out, the oldest used page is sacrificed (data loss, but the system keeps running)

Each page tracks the MQTT packet ID assigned by the broker. When the broker confirms delivery (PUBACK for QoS 1), the page is moved to the free list. If the connection drops mid-delivery, the packet_sent flag is cleared, and delivery resumes from the same position when the connection recovers.

Buffer sizing rule of thumb: At least 3 pages, each sized to hold 60 seconds of telemetry data. For a typical 50-tag device polling every second, that's roughly 4KB per page. A 64KB buffer gives you ~16 pages — enough to survive a 15-minute connectivity gap.

Practical Deployment Checklist

Before deploying a gateway to the factory floor:

Test cable disconnection: Unplug the Ethernet cable. Does the gateway detect it within 10 seconds? Does it reconnect automatically?
Test PLC power cycle: Turn off the PLC. Does the gateway show "Link Down"? Turn it back on. Does data resume without manual intervention?
Test MQTT broker outage: Kill the broker. Does local buffering engage? Restart the broker. Does buffered data arrive in order?
Test serial noise (for RTU): Introduce a ground loop or VFD near the RS-485 cable. Does the gateway detect errors without crashing?
Test hourly reset: Wait for the hour boundary. Do all tags get re-read?
Monitor link-state transitions: Over 24 hours, how many disconnections occur? More than 2/hour indicates a cabling or electrical issue.

How machineCDN Handles This

machineCDN's edge gateway software implements all of these patterns natively. The daemon tracks link state as a first-class virtual tag, buffers telemetry through MQTT disconnections using page-based memory management, and automatically recovers connections across Modbus TCP, Modbus RTU, and EtherNet/IP — with protocol-specific retry logic tuned from thousands of deployments in plastics manufacturing, auxiliary equipment, and temperature control systems.

When you connect a machine through machineCDN, the platform knows the difference between "the machine stopped" and "the gateway lost connection" — a distinction that most IIoT platforms can't make.

Conclusion

Connection resilience isn't a feature you add later. It's an architectural decision that determines whether your IIoT deployment survives its first month on the factory floor. The core principles:

Track link state explicitly — as a deliverable tag, not just a log message
Handle each protocol's failure modes — Modbus TCP, RTU, and EtherNet/IP all fail differently
Buffer through MQTT outages — page-based buffers with delivery confirmation
Force-read periodically — hourly resets prevent drift and create verification checkpoints
Retry intelligently — back off after consecutive failures instead of hammering dead connections

Build these patterns into your gateway from day one, and your monitoring system will be as reliable as the machines it's watching.

Protocol Bridging: Translating Between EtherNet/IP, Modbus, and MQTT at the Edge [2026]

March 2, 2026 · 14 min read

Every manufacturing plant is multilingual. One production line speaks EtherNet/IP to Allen-Bradley PLCs. The next line uses Modbus TCP to communicate with temperature controllers. A legacy packaging machine only understands Modbus RTU over RS-485. And the cloud platform that needs to ingest all of this data speaks MQTT.

The edge gateway that bridges these protocols isn't just a translator — it's an architect of data quality. A poor bridge produces garbled timestamps, mistyped values, and silent data gaps. A well-designed bridge normalizes disparate protocols into a unified, timestamped data stream that cloud analytics can consume without post-processing.

This guide covers the engineering patterns that make protocol bridging work reliably at scale.

Shift-Based Production Reporting for Manufacturing: How to Compare Output, Quality, and Efficiency Across Shifts

March 2, 2026 · 7 min read

MachineCDN Team

Industrial IoT Experts

Every manufacturing plant has a shift problem they can feel but can't quantify. First shift runs smoother. Third shift has more scrap. Second shift uses more material. Everyone knows it, but without shift-aligned data, nobody can prove it — let alone fix it. Shift-based production reporting turns anecdotal observations into actionable data. Here's how to implement it and what it reveals.

Sparkplug B Specification Deep Dive: Birth Certificates, Death Certificates, and Why Your IIoT MQTT Deployment Needs It [2026]

March 2, 2026 · 14 min read

MachineCDN Team

Industrial IoT Experts

MQTT is the de facto transport layer for industrial IoT. Every edge gateway, every cloud platform, and every IIoT architecture diagram draws that same line: device → MQTT broker → cloud. But here's the uncomfortable truth that anyone who's deployed MQTT in a real factory knows: raw MQTT tells you nothing about the data inside those payloads.

MQTT is a transport protocol. It delivers bytes. It doesn't define what a "temperature reading" looks like, how to discover which devices are online, or what happens when a device reboots at 3 AM. That's where Sparkplug B comes in — and understanding it deeply is the difference between a demo and a production deployment.

Total Productive Maintenance (TPM) in the IIoT Era: Data-Driven Pillars for Modern Manufacturing

March 2, 2026 · 11 min read

MachineCDN Team

Industrial IoT Experts

Total Productive Maintenance was developed by Seiichi Nakajima at Nippondenso (now Denso) in the 1970s. Fifty years later, the core philosophy remains sound: maximize equipment effectiveness by involving every employee in maintenance. But the implementation? That's where most TPM programs stall.

The traditional TPM toolkit — AM tags, one-point lessons, CILT sheets (Clean, Inspect, Lubricate, Tighten) — was designed for an era when machine data meant a gauge on the side of a press and a clipboard on the operator's desk. In 2026, your PLCs collect thousands of data points per second. Your operators carry smartphones. Your maintenance systems can talk to your production systems.

IIoT doesn't replace TPM. It supercharges it. Here's how each TPM pillar transforms when backed by real-time machine data.

Binary Payload Encoding for Industrial MQTT: Cutting Bandwidth by 10x on Constrained Networks [2026]

March 1, 2026 · 13 min read

Binary Payload Encoding

JSON is killing your cellular data budget.

When your edge gateway publishes a single temperature reading as {"tag_id": 42, "value": 23.45, "type": "float", "status": 0, "ts": 1709312400}, that's 72 bytes of text to convey 10 bytes of actual information: a 2-byte tag ID, a 4-byte float, a 1-byte status code, and a 4-byte timestamp (which is shared across all tags in the same poll cycle anyway).

At 200 tags polled every 5 seconds, JSON payloads consume roughly 100 KB/minute — over 4 GB/month. On a $15/month cellular plan with a 1 GB cap, you've blown your data budget by day 8.

Binary encoding solves this. By designing a compact wire format purpose-built for industrial telemetry, you can reduce per-tag overhead from ~70 bytes to ~7 bytes — a 10x reduction that makes cellular and satellite IIoT deployments economically viable.

This article covers the engineering of binary payload formats for industrial MQTT, from byte-level encoding decisions to the buffering and delivery systems that ensure data integrity.

Why JSON Falls Short for Industrial Telemetry

JSON became the default payload format for MQTT in the IIoT world because it's human-readable, self-describing, and every platform can parse it. These are real advantages during development and debugging. But they come at a cost that compounds brutally at scale.

The Overhead Tax

Let's dissect a typical JSON telemetry message:

{
  "device_type": 1010,
  "serial": 1106550353,
  "ts": 1709312400,
  "tags": [
    {"id": 1, "status": 0, "type": "uint16", "values": [4200]},
    {"id": 2, "status": 0, "type": "float", "values": [23.45]},
    {"id": 3, "status": 0, "type": "bool", "values": [1]}
  ]
}

This payload is approximately 250 bytes. The actual data content:

Device type: 2 bytes
Serial number: 4 bytes
Timestamp: 4 bytes
3 tag values: 2 + 4 + 1 = 7 bytes
3 tag IDs: 6 bytes
3 status codes: 3 bytes

Total useful data: 26 bytes. The other 224 bytes are structural overhead — curly braces, square brackets, quotation marks, colons, commas, key names, and redundant type strings.

That's an overhead ratio of 9.6x. For every byte of machine data, you're transmitting nearly 10 bytes of JSON syntax.

CPU Cost on Embedded Gateways

JSON serialization isn't free on embedded hardware. Constructing JSON objects, converting numbers to strings, escaping special characters, and computing string lengths all consume CPU cycles that could be spent polling more tags or running edge analytics.

On an ARM Cortex-A7 gateway (common in industrial routers), JSON serialization of a 200-tag batch takes 2–5ms. The equivalent binary encoding takes 200–500μs — an order of magnitude faster. When you're polling Modbus every second and need to leave CPU headroom for other tasks, this matters.

Designing a Binary Telemetry Format

A practical binary format for industrial MQTT must balance compactness with extensibility. Here's a proven structure used in production industrial gateways.

Message Structure

┌─────────────────────────────────────────┐
│ Header                                  │
│  ├─ Timestamp (4 bytes, uint32)         │
│  ├─ Device Type (2 bytes, uint16)       │
│  └─ Serial Number (4 bytes, uint32)     │
├─────────────────────────────────────────┤
│ Tag Group                               │
│  ├─ Tag Count (2 bytes, uint16)         │
│  ├─ Tag Record 1                        │
│  │   ├─ Tag ID (2 bytes, uint16)        │
│  │   ├─ Status (1 byte, uint8)          │
│  │   ├─ Type (1 byte, uint8)            │
│  │   ├─ Value Count (1 byte, uint8)     │
│  │   └─ Values (variable)              │
│  ├─ Tag Record 2                        │
│  │   └─ ...                             │
│  └─ Tag Record N                        │
└─────────────────────────────────────────┘

Type Encoding

Use a single byte to encode the value type, which also determines the byte width of each value:

Type Code	Type	Bytes per Value
0x01	bool	1
0x02	int32	4
0x03	uint32	4
0x04	float32	4
0x05	int16	2
0x06	uint16	2
0x07	int8	1
0x08	uint8	1

This type system covers every data type you'll encounter in Modbus and EtherNet/IP PLCs. The decoder uses the type code to determine exactly how many bytes to read for each value — no parsing ambiguity, no delimiter scanning.

Size Comparison

For the same 3-tag example above:

Binary encoding:

Header: 10 bytes (timestamp + device type + serial)
Tag count: 2 bytes
Tag 1 (uint16): 2 + 1 + 1 + 1 + 2 = 7 bytes
Tag 2 (float32): 2 + 1 + 1 + 1 + 4 = 9 bytes
Tag 3 (bool): 2 + 1 + 1 + 1 + 1 = 6 bytes

Total: 34 bytes vs. 250 bytes for JSON. That's a 7.3x reduction.

The savings compound as tag count increases. At 100 tags (a typical mid-size PLC), a JSON batch runs 6–8 KB; the binary equivalent is 700–900 bytes. At 200 tags, JSON hits 12–16 KB while binary stays under 2 KB.

Data Grouping: Batches and Groups

Individual tag values shouldn't be published as individual MQTT messages. The MQTT protocol itself adds overhead: a PUBLISH packet includes a fixed header (2 bytes minimum), topic string (20–50 bytes for a typical industrial topic), and packet identifier (2 bytes for QoS 1). Publishing 200 individual messages means 200× this overhead.

Timestamp-Grouped Batches

The most effective grouping strategy collects all tag values from a single poll cycle into one batch, sharing a single timestamp:

[Batch Start: timestamp=1709312400]
  Tag 1: id=1, status=0, type=uint16, value=4200
  Tag 2: id=2, status=0, type=float,  value=23.45
  Tag 3: id=3, status=0, type=bool,   value=1
  ...
[Batch End]

The timestamp in the batch header applies to all contained tags. This eliminates per-tag timestamp overhead — a savings of 4 bytes per tag, or 800 bytes across 200 tags.

Batch Size Limits

MQTT brokers and clients have maximum message size limits. Azure IoT Hub limits messages to 256 KB. AWS IoT Core allows 128 KB. Most on-premise Mosquitto deployments default to 256 MB but should be configured lower for production use.

More importantly, your edge gateway's memory and processing constraints impose practical limits. A 4 KB batch size works well for most deployments:

Large enough to hold 200+ tags in binary format
Small enough to fit in constrained gateway memory
Fast enough to serialize without impacting the poll loop

When a batch exceeds the configured size, close it and start a new one. The cloud decoder handles multiple batches with the same timestamp gracefully.

Change-of-Value Filtering Before Batching

Apply change-of-value (COV) filtering before adding values to the batch, not after. If a tag's value hasn't changed since the last report and COV is enabled for that tag, skip it entirely. This reduces batch sizes further during steady-state operation — when 80% of tags are unchanged, your binary batch shrinks proportionally.

However, implement a periodic full-refresh: every hour (or configurable interval), reset all COV baselines and include every tag in the next batch. This ensures the cloud always has a complete snapshot, even if individual change events were lost during a brief disconnection.

The Page Buffer: Store-and-Forward in Fixed Memory

Binary encoding solves the bandwidth problem. But you still need to handle MQTT disconnections without losing data. The page-based ring buffer is the industrial standard for store-and-forward in embedded systems.

Architecture

Pre-allocate a contiguous memory region at startup and divide it into fixed-size pages:

┌────────────────────────────────────────────────┐
│ Buffer Memory (e.g., 512 KB)                   │
│                                                │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Page 0│ │Page 1│ │Page 2│ │Page 3│ │Page 4│ │
│ │      │ │      │ │      │ │      │ │      │ │
│ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ │
└────────────────────────────────────────────────┘

Pages cycle through three states:

Free — empty, available for writing
Work — currently being written to by the Modbus polling thread
Used — full, waiting for MQTT delivery

Page Layout

Each page contains multiple messages, packed sequentially:

┌─────────────────────────────────────┐
│ Page Header (struct, ~16 bytes)     │
├─────────────────────────────────────┤
│ Message 1:                          │
│  ├─ Message ID (4 bytes)            │
│  ├─ Message Size (4 bytes)          │
│  └─ Message Body (variable)         │
├─────────────────────────────────────┤
│ Message 2:                          │
│  ├─ Message ID (4 bytes)            │
│  ├─ Message Size (4 bytes)          │
│  └─ Message Body (variable)         │
├─────────────────────────────────────┤
│ ... (more messages)                 │
├─────────────────────────────────────┤
│ Free space                          │
└─────────────────────────────────────┘

The 4-byte message ID field is filled by the MQTT library when the message is published (at QoS 1). The gateway uses this ID to match publish acknowledgments to specific messages.

Write Path

Check if the current work page has enough space for the new message (message size + 8 bytes for ID and size fields).
If yes: write the message, advance the write pointer.
If no: move the work page to the "used" queue, grab a free page as the new work page, and write there.
If no free pages exist: grab the oldest used page (overflow condition). Log a warning — you're losing the oldest buffered data, but preserving the newest.

This overflow strategy is deliberately biased toward fresh data. In industrial monitoring, a temperature reading from 5 minutes ago is far more valuable than one from 3 days ago that was buffered during an outage.

Delivery Path

Take the first page from the "used" queue.
Read the next undelivered message (tracked by a per-page read pointer).
Publish via MQTT at QoS 1.
Wait for PUBACK — don't advance the read pointer until the broker confirms receipt.
On PUBACK: advance the read pointer. If the page is fully delivered, move it back to "free."
On disconnect: stop sending, keep writing. The buffer absorbs the outage.

The wait-for-PUBACK step is critical. Without it, you're fire-and-forgetting into a potentially disconnected socket, and data silently disappears.

Thread Safety

The write path (Modbus polling thread) and delivery path (MQTT thread) operate concurrently on the same buffer. A mutex protects all page state transitions:

Moving pages between free/work/used queues
Checking available space
Advancing read/write pointers
Processing delivery acknowledgments

Keep the critical section as small as possible — lock, update pointers, unlock. Never hold the mutex during a Modbus read or MQTT publish; those operations can block for seconds.

Delivery Tracking and Watchdogs

In production, "the MQTT connection is up" doesn't mean data is flowing. The connection can be technically alive (TCP socket open, keepalives passing) while messages silently fail to publish or acknowledge.

Delivery Timestamp Tracking

Track the timestamp of the last successfully delivered message (confirmed by PUBACK). If this timestamp falls more than N minutes behind the current time, something is wrong:

The broker may be rejecting messages (payload too large, topic permission denied)
The network may be passing keepalives but dropping data packets
The MQTT library may be stuck in an internal error state

When the delivery watchdog fires, tear down the entire MQTT connection and reinitialize. It's a heavy-handed recovery, but it's reliable. In industrial systems, a clean restart beats a subtle degradation every time.

Status Telemetry

The gateway should periodically publish its own status message containing:

Daemon uptime — how long since last restart
System uptime — how long since last boot
Buffer state — pages free/used/work, current fill level
PLC link state — is the Modbus connection healthy
Firmware version — for remote fleet management
Token expiration — time remaining on the MQTT auth token

This status message can use JSON even if data messages use binary — it's infrequent (every 30–60 seconds) and readability matters more than compactness for diagnostics.

Bandwidth Math: Real-World Numbers

Let's calculate the actual savings for a typical deployment:

Scenario: 150 tags, polled every 5 seconds, 50% change rate with COV enabled, cellular connection.

JSON Format

Average tag JSON: ~60 bytes
Tags per poll (with 50% COV): 75
Batch overhead: ~50 bytes
Total per poll: 75 × 60 + 50 = 4,550 bytes
Per minute (12 polls): 54.6 KB
Per day: 78.6 MB
Per month: 2.36 GB

Binary Format

Average tag binary: ~7 bytes
Header per batch: 12 bytes
Total per poll: 75 × 7 + 12 = 537 bytes
Per minute (12 polls): 6.4 KB
Per day: 9.3 MB
Per month: 279 MB

Savings: 88% reduction — from 2.36 GB to 279 MB. On a $20/month cellular plan with 500 MB included, JSON doesn't fit. Binary does, with headroom.

Add MQTT overhead (topic strings, packet headers) and TLS overhead (~40 bytes per record), and real-world savings are slightly less dramatic but still consistently in the 8–10x range.

Decoding on the Cloud Side

Binary encoding shifts complexity from the edge to the cloud. The decoder must:

Parse the header to extract timestamp, device type, and serial number.
Iterate tag records using the type code to determine value byte widths.
Reconstruct typed values — particularly IEEE 754 floats from their 4-byte binary representation.
Handle partial messages — if a batch was truncated due to buffer overflow, the decoder must fail gracefully on the last incomplete record without losing the valid records before it.

Most cloud platforms (Azure IoT Hub, AWS IoT Core) support custom message decoders that transform binary payloads to JSON for downstream processing. Write the decoder once, and the rest of your analytics pipeline sees standard JSON.

How machineCDN Implements Binary Telemetry

machineCDN's edge daemon uses binary encoding by default for all data telemetry. The implementation includes:

Compact binary batching with shared timestamps per group, reducing per-tag overhead to 5–9 bytes depending on data type.
Page-based ring buffer with pre-allocated memory, zero runtime allocation, and deliberate overflow behavior that preserves fresh data.
Per-message PUBACK tracking with delivery watchdog and automatic connection recycling.
Parallel JSON status messages for gateway diagnostics, published on a separate topic at lower frequency.
Automatic format negotiation — the cloud ingestion layer detects binary vs. JSON based on the first byte of the payload and routes to the appropriate decoder.

The result: machineCDN gateways routinely operate on 500 MB/month cellular plans, monitoring 200+ tags at 5-second intervals, with full store-and-forward resilience during connectivity outages.

When to Use Binary vs. JSON

Binary encoding isn't always the right choice. Use this decision framework:

Criterion	Use Binary	Use JSON
Network	Cellular, satellite, metered	Ethernet, WiFi, unmetered
Tag count	> 50	< 20
Poll interval	< 10 seconds	> 60 seconds
Gateway CPU	Constrained (< 500 MHz)	Capable (> 1 GHz)
Debug needs	Production, stable	Development, changing
Downstream	Custom decoder available	Generic tooling needed

For most production industrial deployments — where gateways connect hundreds of tags over cellular and reliability trumps developer convenience — binary encoding is the clear winner. Save JSON for your status messages and the debugging serial port.

Getting Started

If you're designing a binary telemetry format for your own gateway:

Start with the type system. Define your type codes and byte widths. Match them to your PLC's native data types.
Design the header. Include version, device identity, and a shared timestamp. Add a format version byte so you can evolve the format without breaking old decoders.
Build the buffer first. Get store-and-forward working before optimizing the encoding. Data integrity matters more than data compactness.
Write the decoder alongside the encoder. Test with known values. Verify float encoding especially — IEEE 754 byte ordering bugs are silent and devastating.
Measure real bandwidth. Deploy both JSON and binary formats on the same gateway for a week and compare actual data consumption. The numbers will sell the approach to stakeholders who question the added complexity.

Binary encoding is a solved problem in industrial telemetry. The patterns are well-established, the savings are dramatic, and the complexity cost is paid once at design time and amortized across every byte your fleet ever transmits.

Binary Telemetry Encoding for IIoT: Why JSON Is Killing Your Bandwidth [2026]

March 1, 2026 · 11 min read

If you're sending PLC tag values as JSON from edge gateways to the cloud, you're wasting 80–90% of your bandwidth. On a cellular-connected factory floor with dozens of machines, that's the difference between a $50/month data plan and a $500/month one — and the difference between sub-second telemetry and multi-second lag.

This guide breaks down binary telemetry encoding: how to pack industrial data efficiently at the edge, preserve type fidelity across the wire, and design batch grouping strategies that survive unreliable networks.

Binary telemetry encoding for IIoT edge devices

Cellular Gateway Architecture for IIoT: Bridging Modbus to Cloud Over LTE [2026]

March 1, 2026 · 13 min read

Cellular IoT gateway for industrial automation

The industrial edge gateway is the unsung hero of every IIoT deployment. It sits in a DIN-rail enclosure on the factory floor, silently bridging the gap between a PLC speaking Modbus over a serial wire and a cloud platform expecting JSON over HTTPS. It does this over a cellular connection that drops out during shift changes when 200 workers simultaneously hit the break room's Wi-Fi, and it does it reliably for years without anyone touching it.

Getting the gateway architecture right determines whether your IIoT deployment delivers real-time visibility or an expensive collection of intermittent data.

Why Cellular, Not Ethernet

The first question every plant engineer asks: "We have Ethernet everywhere — why do we need cellular?"

Three reasons:

IT/OT separation: Connecting industrial devices to the corporate network requires firewall rules, VLAN configuration, security audits, and ongoing IT involvement. A cellular gateway operates on its own network — no interaction with plant IT at all.
Deployment speed: Plugging in a cellular gateway takes 15 minutes. Getting a network drop approved, installed, and configured takes 2–6 weeks in most manufacturing environments.
Retrofit flexibility: Many older plants have machines in locations where running Ethernet cable would require cutting through concrete or routing through hazardous areas. Cellular covers everywhere the factory has cell signal.

The trade-off is bandwidth and latency. Cellular connections typically deliver 10–50 Mbps down, 5–20 Mbps up, with 30–100ms latency. For industrial telemetry — where you're sending a few kilobytes per second of register values — this is more than sufficient.

Gateway Architecture: What's Inside

A modern industrial cellular gateway combines several functions in one device:

┌─────────────────────────────────────────┐
│           Cellular Gateway               │
│                                          │
│  ┌──────────┐  ┌──────────┐             │
│  │ Modbus   │  │ Modbus   │             │
│  │ TCP      │  │ RTU      │             │
│  │ Client   │  │ Master   │             │
│  │ (Eth)    │  │ (RS-485) │             │
│  └────┬─────┘  └────┬─────┘             │
│       │              │                   │
│  ┌────┴──────────────┴────┐             │
│  │   Protocol Engine      │             │
│  │   - Tag mapping        │             │
│  │   - Polling scheduler  │             │
│  │   - Data normalization │             │
│  └────────────┬───────────┘             │
│               │                          │
│  ┌────────────┴───────────┐             │
│  │   Batch Buffer         │             │
│  │   - Store & forward    │             │
│  │   - Compression        │             │
│  │   - Retry queue        │             │
│  └────────────┬───────────┘             │
│               │                          │
│  ┌────────────┴───────────┐             │
│  │   Cellular Modem       │             │
│  │   - LTE Cat 4/6        │             │
│  │   - SIM management     │             │
│  │   - Signal monitoring  │             │
│  └────────────────────────┘             │
└─────────────────────────────────────────┘

The Dual-Protocol Challenge

Most factory floors have two Modbus variants in play simultaneously:

Modbus TCP — for newer PLCs with Ethernet ports. The gateway connects as a TCP client to the PLC's IP address on port 502. Each request/response is wrapped in a MBAP (Modbus Application Protocol) header with a transaction identifier, allowing multiple outstanding requests.

Modbus RTU — for legacy equipment using RS-485 serial connections. The gateway acts as a bus master, addressing individual slave devices by station address. Communication is half-duplex: request, wait, response, next request.

A single gateway typically needs to handle both simultaneously. The configuration for each side looks fundamentally different:

Modbus TCP configuration:

{
  "plc": {
    "ip": "192.168.1.100",
    "modbus_tcp_port": 502,
    "timeout_ms": 1000,
    "max_retries": 3
  }
}

Modbus RTU configuration:

{
  "serial": {
    "port": "/dev/rs485",
    "baud_rate": 9600,
    "parity": "none",
    "data_bits": 8,
    "stop_bits": 1,
    "byte_timeout_ms": 4,
    "response_timeout_ms": 100,
    "base_address": 1
  }
}

The RTU side requires careful attention to timing. The byte timeout (time between consecutive bytes in a frame) and response timeout (time waiting for a slave to respond) must be tuned to the specific serial bus. Too short, and you'll get fragmented frames. Too long, and your polling rate drops.

Common mistake: Setting baud rate to 19200 or higher on long RS-485 runs (>100 meters). While the specification supports it, in electrically noisy factory environments with marginal cabling, 9600 baud with 8N1 (8 data bits, no parity, 1 stop bit) is the most reliable default.

Polling Architecture

The polling engine is the heartbeat of the gateway. It determines which PLC registers to read, how often, and in what order.

Tag-Based Polling

Rather than blindly scanning entire register ranges, modern gateways use tag-based polling — reading only the specific registers that map to meaningful process variables:

Tag ID	Register Type	Address	Size	Description	Poll Rate
1001	Holding (FC03)	40001	1	Hopper weight	1s
1002	Holding (FC03)	40002	2	Extruder temp (32-bit float)	5s
1003	Input (FC02)	10001	1	Motor running bit	1s
1004	Holding (FC03)	40010	1	Alarm word (bitmask)	1s
1005	Holding (FC03)	40100	4	Cycle counter (64-bit)	30s

Contiguous Register Optimization

A naive implementation would make one Modbus request per tag. Reading 20 tags means 20 round-trips, each with TCP overhead or RTU bus turnaround time.

The optimization: identify contiguous register ranges and batch them into single multi-register reads:

Tags at registers: 40001, 40002, 40003, 40004, 40010, 40100
→ Request 1: Read 40001–40004 (FC03, 4 registers) — one request for 4 tags
→ Request 2: Read 40010 (FC03, 1 register)
→ Request 3: Read 40100 (FC03, 4 registers for 64-bit value)

This reduces 6 individual requests to 3. On an RS-485 bus at 9600 baud, each request takes roughly 15–30ms round-trip, so this optimization saves 45–90ms per poll cycle.

Threshold for batching: If the gap between two registers is less than ~10 registers, it's usually faster to read the entire range (including unused registers) than to make separate requests. The overhead of an additional Modbus transaction exceeds the cost of reading a few extra registers.

Multi-Rate Polling

Not every tag needs the same poll rate. Process temperatures change slowly — every 5 seconds is fine. Motor run/stop status needs sub-second detection. Alarm words need immediate attention.

A well-designed polling engine runs multiple poll groups:

Fast group (100ms–1s): Alarms, run/stop status, critical process variables
Standard group (1–5s): Temperatures, pressures, weights, flow rates
Slow group (10–60s): Counters, totals, configuration values, serial numbers

Handling Read Failures

On Modbus TCP, a failed read typically means the TCP connection dropped. Recovery:

Close the socket
Wait 1 second (avoid hammering a PLC that's rebooting)
Re-establish the TCP connection
Resume polling from where you left off

On Modbus RTU, failures are more nuanced:

No response: Slave device might be offline, wrong address, or bus conflict
CRC error: Electrical noise on the serial bus
Exception response: Slave is online but rejecting the request (wrong function code, invalid address)

Each failure type requires different retry behavior. CRC errors warrant immediate retry. No-response might need a longer backoff to let the bus settle. Exception responses should be logged but not retried (the slave is telling you it can't do what you asked).

Batch Buffering and Telemetry Upload

Raw Modbus polling might produce 50–200 data points per second across all tags. Uploading each point individually over cellular would be wasteful and expensive. Instead, the gateway batches data before transmission.

Batch Parameters

Two parameters control batching:

{
  "batch_size": 4000,
  "batch_timeout": 60
}

batch_size: Maximum number of data points in a single upload. When the buffer hits 4,000 points, upload immediately.
batch_timeout: Maximum time (seconds) before uploading regardless of buffer size. Even if only 100 points have accumulated in 60 seconds, upload them.

The batch triggers on whichever condition is met first. This ensures:

During normal operation (steady data flow), uploads happen every few seconds driven by batch_size
During quiet periods (machine idle, few data changes), uploads still happen within batch_timeout seconds

Startup Buffering

When a gateway powers on, it needs time to establish a cellular connection — typically 15–60 seconds for LTE negotiation, IP assignment, and cloud authentication. But PLCs start responding to Modbus queries immediately.

A startup timeout parameter (e.g., 140 seconds) tells the gateway to buffer all polled data during this initial period without attempting upload. This prevents a flood of failed HTTP requests that would fill system logs and waste CPU.

{
  "startup_timeout": 140
}

Store-and-Forward During Outages

When cellular connectivity drops, the gateway continues polling PLCs and storing data locally. A well-sized buffer can hold hours or days of data depending on the poll rate and storage capacity.

When connectivity returns, the gateway replays buffered data in chronological order. The cloud platform must be designed to handle out-of-order and delayed timestamps gracefully — particularly for:

OEE calculations that might already have partial data for the outage period
Alarm histories where the "alarm active" timestamp arrives hours after the "alarm cleared" timestamp
Counter values that might not increase monotonically if the PLC was restarted during the outage

Connectivity Monitoring

The gateway must continuously report its own health alongside process data. Operators need to distinguish between "the machine is fine but the gateway is offline" and "the machine has a fault."

PLC Link Status

The gateway monitors its connection to each PLC independently:

Router status: Is the gateway itself online? (Cellular modem connected, IP assigned)
PLC link status: Can the gateway reach the PLC? (Modbus TCP connection active, or RTU slave responding)

These two status indicators create four possible states:

Router	PLC Link	Meaning
✅ Online	✅ Connected	Normal operation
✅ Online	❌ Disconnected	PLC issue — check Ethernet cable or PLC power
❌ Offline	—	Cellular issue — no data flowing to cloud
❌ Offline	—	Power outage — everything is down

The cloud platform should display machine status based on this hierarchy. A machine should show as "Router Not Connected" (gray) before showing as "PLC Not Connected" (red) before showing process-level status like "Running" or "Alarm."

Signal Quality Metrics

For cellular deployments, signal strength is a leading indicator of data reliability:

Metric	Good	Marginal	Poor
RSSI	> -70 dBm	-70 to -85 dBm	< -85 dBm
RSRP	> -90 dBm	-90 to -110 dBm	< -110 dBm
RSRQ	> -10 dB	-10 to -15 dB	< -15 dB
SINR	> 10 dB	3 to 10 dB	< 3 dB

When signal quality drops below the marginal threshold, the gateway should increase its batch size (send fewer, larger uploads) and enable more aggressive compression to reduce the number of cellular transactions.

Remote Configuration and Over-the-Air Updates

Once a gateway is deployed behind a machine in a locked electrical panel, physical access becomes expensive. Remote management is essential.

Configuration Push

The cloud platform should be able to push configuration changes to deployed gateways:

{
  "cmd": "daemon_config",
  "plc": {
    "ip": "192.168.1.101",
    "modbus_tcp_port": 502
  },
  "serial": {
    "port": "/dev/rs485",
    "base_addr": 1,
    "baud": 9600,
    "parity": "none",
    "data_bits": 8,
    "stop_bits": 1
  },
  "batch_size": 4000,
  "batch_timeout": 60,
  "startup_timeout": 140
}

Critical safety rule: Configuration changes must never interrupt an active polling cycle. The gateway should:

Receive the new configuration
Complete the current poll cycle
Gracefully close existing Modbus connections
Apply the new configuration
Re-establish connections with new parameters
Resume polling

A configuration push that crashes the gateway daemon means a truck roll to the plant. This is the most expensive bug in IIoT.

Firmware Updates

Over-the-air (OTA) firmware updates for edge gateways require a dual-bank approach:

Download the new firmware to a secondary partition while continuing to run the current version
Verify the download integrity (checksum, signature)
Reboot into the new partition
Run self-tests (can it connect to cellular? Can it reach the PLC? Can it upload to cloud?)
If self-tests pass, mark the new partition as "good"
If self-tests fail, automatically revert to the previous partition

Never update all gateways simultaneously. Roll out to 5% of the fleet first, monitor for 48 hours, then expand gradually.

Multi-Device Gateway Configurations

Some gateway models support multiple PLC connections — one Ethernet port for Modbus TCP plus one or two RS-485 ports for RTU devices. A common example: a plastics extrusion line where the main extruder PLC communicates via Modbus TCP, but the temperature control units (TCUs) are legacy devices on RS-485.

The gateway must multiplex between devices:

┌──────────┐    Ethernet/Modbus TCP    ┌──────────┐
│ Extruder │◄─────────────────────────►│          │
│ PLC      │     Port 502              │          │
└──────────┘                           │ Gateway  │
                                       │          │
┌──────────┐    RS-485/Modbus RTU      │          │
│ TCU #1   │◄─────────────────────────►│          │
│ Addr: 1  │    9600 8N1               │          │
└──────────┘                           │          │
                                       │          │
┌──────────┐    RS-485/Modbus RTU      │          │
│ TCU #2   │◄─────────────────────────►│          │
│ Addr: 2  │    Same bus               │          │
└──────────┘                           └──────────┘

The TCP and RTU polling can run concurrently (they use different physical interfaces). Multiple RTU devices on the same RS-485 bus must be polled sequentially (half-duplex constraint). The polling engine needs to interleave RTU device addresses fairly to prevent one slow-responding device from starving others.

Alarm Processing at the Gateway

Raw alarm data from PLCs often arrives as packed bitmasks — a single 16-bit register where each bit represents a different alarm condition. The gateway must unpack these into individual, named alarm events.

Byte-Level Alarm Decoding

Consider a PLC that packs 16 alarms into a single holding register (40010):

Register value: 0x0025 = 0000 0000 0010 0101

Bit 0 (offset 0): Motor Overload    → ACTIVE (1)
Bit 1 (offset 1): High Temperature  → CLEARED (0)
Bit 2 (offset 2): Low Pressure      → ACTIVE (1)
Bit 3 (offset 3): Door Open         → CLEARED (0)
Bit 4 (offset 4): Emergency Stop    → CLEARED (0)
Bit 5 (offset 5): Hopper Empty      → ACTIVE (1)
...

The gateway extracts each alarm using bitwise operations:

alarm_active = (register_value >> bit_offset) & mask

Where mask defines how many bits this alarm spans (usually 1, but some PLCs pack multi-bit severity levels).

Some PLCs use a different pattern: multi-register alarm arrays where each element in the array represents a different alarm, and the value indicates severity or status. The gateway configuration must specify which pattern each machine type uses:

Single-bit in word: Offset = bit position, bytes (mask) = 1
Array element: Offset = array index, bytes = 0 (use raw value)
Multi-bit field: Offset = starting bit, bytes = field width mask

The gateway should also detect transitions — alarm activating and alarm clearing — rather than just reporting current state. This enables alarm duration tracking and alarm history in the cloud platform.

Security Considerations

A cellular gateway is a network device with a public IP address (or at least, carrier-NATed) connected to critical industrial equipment. Security isn't optional.

Minimum Security Checklist

No inbound ports: The gateway initiates all connections outbound. Never expose SSH, HTTP, or Modbus ports on the cellular interface.
TLS for all cloud communication: Certificate pinning where possible. Mutual TLS (mTLS) for high-security deployments.
VPN or private APN: Use a carrier-provided private APN to avoid traversing the public internet entirely. This also provides static IP addressing for firewall rules.
Disable unused interfaces: If only RS-485 is used, disable the Ethernet port. If Wi-Fi is present but unused, disable it.
Secure boot and signed firmware: Prevent unauthorized firmware from being loaded onto the device.
Local Modbus isolation: The gateway's Modbus interface should only be reachable from the local network segment, never from the cellular side.

How machineCDN Deploys Gateways at Scale

machineCDN uses cellular gateways to connect industrial equipment across distributed manufacturing sites without requiring plant IT involvement. Each gateway is pre-configured with the target PLC's protocol parameters — whether Modbus TCP over Ethernet or Modbus RTU over RS-485 — and ships ready to install.

Once powered on, the gateway automatically establishes its cellular connection, begins polling the PLC, and starts streaming telemetry to machineCDN's cloud platform. Device provisioning, tag mapping, and alarm configuration are managed remotely through the platform's device management interface.

The result: a new machine goes from "unmonitored" to "live on dashboard" in under 30 minutes, with no network infrastructure changes and no IT tickets. For multi-site manufacturers, this means rolling out IIoT monitoring to 50 machines across 10 plants in weeks instead of months.

The cellular gateway is where the physical world meets the digital one. Every design decision — polling rates, batch sizes, timeout values, alarm decoding — directly impacts whether operators see reliable, real-time machine data or frustrating gaps and delays. Get the architecture right, and the gateway disappears into the background. Get it wrong, and it becomes the bottleneck that undermines the entire deployment.

Data Normalization for Industrial IoT: Handling Register Formats, Byte Ordering, and Scaling Factors Across PLCs [2026]

March 1, 2026 · 14 min read

Here's a truth every IIoT engineer discovers the hard way: the hardest part of connecting industrial equipment to the cloud isn't the networking, the security, or the cloud architecture. It's getting a raw register value of 0x4248 from a PLC and knowing whether that means 50.0°C, 16,968 PSI, or the hex representation of half a 32-bit float that needs its companion register before it means anything at all.

Data normalization — the process of transforming raw PLC register values into meaningful engineering units — is the unglamorous foundation that every reliable IIoT system is built on. Get it wrong, and your dashboards display nonsense. Get it subtly wrong, and your analytics quietly produce misleading results for months before anyone notices.

This guide covers the real-world data normalization challenges you'll face when integrating PLCs from different manufacturers, and the patterns that actually work in production.

The Fundamental Problem: Registers Don't Know What They Contain

Industrial protocols like Modbus define a simple data model: 16-bit registers. That's it. A Modbus holding register at address 40001 contains a 16-bit unsigned integer (0–65535). The protocol has no concept of:

Whether that value represents temperature, pressure, flow rate, or a status code
What engineering units it's in
Whether it needs to be scaled (divided by 10? by 100?)
Whether it's part of a multi-register value (32-bit integer, IEEE 754 float)
What byte order the multi-register value uses

This information lives in manufacturer documentation — usually a PDF that's three firmware versions behind, written by someone who assumed you'd use their proprietary software, and references register addresses using a different numbering convention than your gateway.

Even within a single plant, you'll encounter:

Chiller controllers using input registers (function code 4, 30001+ addressing)
Temperature controllers using holding registers (function code 3, 40001+ addressing)
Older devices using coils (function code 1) for status bits
Mixed addressing conventions (some manufacturers start at 0, others at 1)

Modbus Register Types and Function Code Mapping

The first normalization challenge is mapping register addresses to the correct Modbus function code. The traditional Modbus addressing convention uses a 6-digit numbering scheme:

Address Range	Register Type	Function Code	Access
000001–065536	Coils	FC 01 (read) / FC 05 (write)	Read/Write
100001–165536	Discrete Inputs	FC 02	Read Only
300001–365536	Input Registers	FC 04	Read Only
400001–465536	Holding Registers	FC 03 (read) / FC 06/16 (write)	Read/Write

In practice, the high-digit prefix determines the function code, and the remaining digits (after subtracting the prefix) determine the actual register address sent in the Modbus PDU:

Address 300201 → Function Code 4, Register Address 201
Address 400006 → Function Code 3, Register Address 6
Address 5 → Function Code 1, Coil Address 5

Common pitfall: Some device manufacturers use "register address" to mean the PDU address (0-based), while others use the traditional Modbus numbering (1-based). Register 40001 in the documentation might mean PDU address 0 or PDU address 1 depending on the manufacturer. Always verify with a Modbus scanner tool before building your configuration.

The Byte Ordering Nightmare

A 16-bit Modbus register stores two bytes. That's unambiguous — the protocol spec defines big-endian (most significant byte first) for individual registers. The problem starts when you need values larger than 16 bits.

32-Bit Integers from Two Registers

A 32-bit value requires two consecutive 16-bit registers. The question is: which register holds the high word?

Consider a 32-bit value of 0x12345678:

Word order Big-Endian (most common):

Register N:   0x1234 (high word)
Register N+1: 0x5678 (low word)
Result: (0x1234 << 16) | 0x5678 = 0x12345678 ✓

Word order Little-Endian:

Register N:   0x5678 (low word)
Register N+1: 0x1234 (high word)
Result: (Register[N+1] << 16) | Register[N] = 0x12345678 ✓

Both are common in practice. When building an edge data collection system, you need to support at least these two variants per device configuration.

IEEE 754 Floating-Point: Where It Gets Ugly

32-bit IEEE 754 floats span two Modbus registers, and the byte ordering permutations multiply. There are four real-world variants:

1. ABCD (Big-Endian / Network Order)

Register N:   0x4248  (bytes A,B)
Register N+1: 0x0000  (bytes C,D)
IEEE 754: 0x42480000 = 50.0

Used by: Most European manufacturers, Honeywell, ABB, many process instruments

2. DCBA (Little-Endian / Byte-Swapped)

Register N:   0x0000  (bytes D,C)
Register N+1: 0x4842  (bytes B,A)
IEEE 754: 0x42480000 = 50.0

Used by: Some legacy Allen-Bradley controllers, older Omron devices

3. BADC (Mid-Big-Endian / Word-Swapped)

Register N:   0x4842  (bytes B,A)
Register N+1: 0x0000  (bytes D,C)
IEEE 754: 0x42480000 = 50.0

Used by: Schneider Electric, Daniel/Emerson flow meters, some Siemens devices

4. CDAB (Mid-Little-Endian)

Register N:   0x0000  (bytes C,D)
Register N+1: 0x4248  (bytes A,B)
IEEE 754: 0x42480000 = 50.0

Used by: Various Asian manufacturers, some OEM controllers

Here's the critical lesson: The libmodbus library (used by many edge gateways and IIoT platforms) provides a modbus_get_float() function that assumes BADC word order — which is not the most common convention. If you use the standard library function on a device that transmits ABCD, you'll get garbage values that are still valid IEEE 754 floats, meaning they won't trigger obvious error conditions. Your dashboard will show readings like 3.14 × 10⁻²⁷ instead of 50.0°C, and if nobody's watching closely, this goes undetected.

Always verify byte ordering with a known test value. Read a temperature sensor that's showing 25°C on its local display, decode the registers with all four byte orderings, and see which one gives you 25.0.

Generic Float Decoding Pattern

A robust normalization engine should accept a byte-order parameter per tag:

# Device configuration example
tags:
  - name: "Tank Temperature"
    register: 300001
    type: float32
    byte_order: ABCD        # Big-endian (verify with test read!)
    unit: "°C"
    registers_count: 2
    
  - name: "Flow Rate"
    register: 300003
    type: float32
    byte_order: BADC        # Schneider-style mid-big-endian
    unit: "L/min"
    registers_count: 2

Integer Scaling: The Hidden Conversion

Many PLCs transmit fractional values as scaled integers because integer math is faster and simpler to implement on microcontrollers. Common patterns:

Divide-by-10 Temperature

Register value: 234
Actual temperature: 23.4°C
Scale factor: 0.1

Divide-by-100 Pressure

Register value: 14696
Actual pressure: 146.96 PSI
Scale factor: 0.01

Offset + Scale

Some devices use a linear transformation: engineering_value = (raw * k1) + k2

Register value: 4000
k1 (gain): 0.025
k2 (offset): -50.0
Temperature: (4000 × 0.025) + (-50.0) = 50.0°C

This pattern is common in 4–20 mA analog input modules where the 16-bit ADC value (0–65535) maps to an engineering range:

0     = 4.00 mA  = Range minimum (e.g., 0°C)
65535 = 20.00 mA = Range maximum (e.g., 200°C)

Scale: 200.0 / 65535 = 0.00305
Offset: 0.0

For raw value 32768: 32768 × 0.00305 + 0 ≈ 100.0°C

The trap: Some devices use signed 16-bit integers (int16, range -32768 to +32767) to represent negative values (e.g., freezer temperatures). If your normalization engine treats everything as uint16, negative temperatures will appear as large positive numbers (~65,000+). Always verify whether a register is signed or unsigned.

Bit Extraction from Packed Status Words

Industrial controllers frequently pack multiple boolean status values into a single register. A single 16-bit holding register might contain:

Bit 0: Compressor Running
Bit 1: High Pressure Alarm
Bit 2: Low Pressure Alarm
Bit 3: Pump Running
Bit 4: Defrost Active
Bits 5-7: Operating Mode (3-bit enum)
Bits 8-15: Error Code

To extract individual boolean values from a packed word:

value = (register_value >> shift_count) & mask

For single bits, the mask is 1:

compressor_running = (register >> 0) & 0x01
high_pressure_alarm = (register >> 1) & 0x01

For multi-bit fields:

operating_mode = (register >> 5) & 0x07  // 3-bit mask
error_code = (register >> 8) & 0xFF     // 8-bit mask

Why this matters for IIoT: Each extracted bit often needs to be published as an independent data point for alarming, trending, and analytics. A robust data pipeline defines "calculated tags" that derive from a parent register — when the parent register is read, the derived boolean tags are automatically extracted and published.

This approach is more efficient than reading each coil individually. Reading one holding register and extracting 16 bits is one Modbus transaction. Reading 16 individual coils is 16 transactions (or at best, one FC01 read for 16 coils — but many implementations don't optimize this).

Contiguous Register Coalescence

When reading multiple tags from a Modbus device, transaction overhead dominates performance. Each Modbus TCP request carries:

TCP/IP overhead: ~54 bytes (headers)
Modbus MBAP header: 7 bytes
Function code + address: 5 bytes
Response overhead: Similar

For a single register read, you're spending ~120 bytes of framing to retrieve 2 bytes of data. This is wildly inefficient.

The optimization: Coalesce reads of contiguous registers into a single transaction. If you need registers 300001 through 300050, issue one Read Input Registers command for 50 registers instead of 50 individual reads.

The coalescence conditions are:

Same function code (can't mix holding and input registers)
Contiguous addresses (no gaps)
Same polling interval (don't slow down a fast-poll tag to batch it with a slow-poll tag)
Within protocol limits (Modbus allows up to 125 registers per read for FC03/FC04)

In practice, the maximum PDU payload is 250 bytes (125 × 16-bit registers), so batches should be capped at ~50 registers to keep response sizes reasonable and avoid fragmenting the IP packet.

Practical batch sizing:

Maximum safe batch: 50 registers
Typical latency per batch: 2-5 ms (Modbus TCP, local network)
Inter-request delay: ~50 ms (prevent bus saturation on Modbus RTU)

When a gap appears in the register map (e.g., you need registers 1-10 and 20-30), you have two choices:

Two separate reads: 10 registers + 10 registers = 2 transactions
One read with gap: 30 registers = 1 transaction (reading 9 registers you don't need)

For gaps of 10 registers or less, reading the gap is usually more efficient than the overhead of a second transaction. For larger gaps, split the reads.

Change Detection and Report-by-Exception

Not every data point changes every poll cycle. A temperature sensor might hold steady at 23.4°C for hours. Publishing identical values every second wastes bandwidth, storage, and processing.

Report-by-exception (RBE) compares each new reading against the last published value:

if new_value != last_published_value:
    publish(new_value)
    last_published_value = new_value

For integer types, exact comparison works. For floating-point values, use a deadband:

if abs(new_value - last_published_value) > deadband:
    publish(new_value)
    last_published_value = new_value

Important: Even with RBE, periodically force-publish all values (e.g., every hour) to ensure the IIoT platform has fresh data. Some edge cases can cause stale values:

A sensor drifts back to exactly the last published value after changing
Network outage causes missed change events
Cloud-side data expires or is purged

A well-designed data pipeline resets its "last read" state on an hourly boundary, forcing a full publish of all tags regardless of whether they've changed.

Multi-Protocol Device Detection

In brownfield plants, you often encounter devices that support multiple protocols. The same PLC might respond to both EtherNet/IP (Allen-Bradley AB-EIP) and Modbus TCP on port 502. Your edge gateway needs to determine which protocol the device actually speaks.

A practical detection sequence:

Try EtherNet/IP first: Attempt to read a known tag (like a device type identifier) using the CIP protocol. If successful, you know the device speaks EtherNet/IP and can use tag-based addressing.
Fall back to Modbus TCP: If EtherNet/IP fails (connection refused or timeout), try a Modbus TCP connection on port 502. Read a known device-type register to identify the equipment.
Device-specific addressing: Once the device type is identified, load the correct register map, byte ordering, and scaling configuration for that specific model.

This multi-protocol detection pattern is how platforms like machineCDN handle heterogeneous plant environments — where one production line might have Allen-Bradley Micro800 controllers communicating via EtherNet/IP, while an adjacent chiller system uses Modbus TCP, and both need to feed into the same telemetry pipeline.

Batch Delivery and Wire Efficiency

Once data is normalized, it needs to be efficiently packaged for upstream delivery (typically via MQTT or HTTPS). Sending one MQTT message per data point is wasteful — the MQTT overhead (fixed header, topic, QoS) can exceed the payload size for simple values.

Batching pattern:

Start a collection window (e.g., 60 seconds or until batch size limit is reached)
Group normalized values by timestamp into "groups"
Each group contains all tag values read at that timestamp
When the batch timeout expires or the size limit is reached, serialize and publish the entire batch

{
  "device": "chiller-01",
  "batch": [
    {
      "timestamp": 1709292000,
      "values": [
        {"id": 1, "type": "int16", "value": 234},
        {"id": 2, "type": "float", "value": 50.125},
        {"id": 6, "type": "bool", "value": true}
      ]
    },
    {
      "timestamp": 1709292060,
      "values": [
        {"id": 1, "type": "int16", "value": 237},
        {"id": 2, "type": "float", "value": 50.250}
      ]
    }
  ]
}

For bandwidth-constrained connections (cellular, satellite), consider binary serialization instead of JSON. A binary batch format can reduce payload size by 3–5x compared to JSON, which matters when you're paying per megabyte on a cellular link.

Error Handling and Resilience

Data normalization isn't just about converting values — it's about handling failures gracefully:

Communication Errors

Timeout (ETIMEDOUT): Device not responding. Could be network issue or device power failure. Set link state to DOWN, trigger reconnection logic.
Connection reset (ECONNRESET): TCP connection dropped. Close and re-establish.
Connection refused (ECONNREFUSED): Device not accepting connections. May be in commissioning mode or at connection limit.

Data Quality

Read succeeds but value is implausible: A temperature sensor reading -273°C (below absolute zero) or 999.9°C (sensor wiring fault). The normalization layer should flag these with data quality indicators, not silently forward them.
Sensor stuck at same value: If a process value hasn't changed in an unusual time period (hours for a temperature, minutes for a vibration sensor), it may indicate a sensor failure rather than a stable process.

Reconnection Strategy

When communication with a device is lost:

Close the connection cleanly (flush buffers, release resources)
Wait before reconnecting (backoff to avoid hammering a failed device)
On reconnection, force-read all tags (the device state may have changed while disconnected)
Re-deliver the link state change event so downstream systems know the device was briefly offline

Practical Normalization Checklist

For every new device you integrate:

The Bigger Picture

Data normalization is where the theoretical elegance of IIoT architectures meets the messy reality of installed industrial equipment. Every plant is a museum of different vendors, different decades of technology, and different engineering conventions.

The platforms that succeed in production — like machineCDN — are the ones that invest heavily in this normalization layer. Because once raw register 0x4248 reliably becomes 50.0°C with the correct timestamp, units, and quality metadata, everything downstream — analytics, alarming, machine learning, digital twins — actually works.

It's not glamorous work. But it's the difference between an IIoT proof-of-concept that demos well and a production system that a plant manager trusts.

Why Connection Resilience Isn't Optional​

The Link-State Model​

Link-State as a Virtual Tag​

Protocol-Specific Failure Detection​

Modbus TCP​

Modbus RTU (Serial)​

EtherNet/IP (CIP)​

Contiguous Read Failure Handling​

The Hourly Reset Pattern​

Buffered Delivery: Surviving MQTT Disconnections​

Practical Deployment Checklist​

How machineCDN Handles This​

Conclusion​

Why JSON Falls Short for Industrial Telemetry​

The Overhead Tax​

CPU Cost on Embedded Gateways​

Designing a Binary Telemetry Format​

Message Structure​

Type Encoding​

Size Comparison​

Data Grouping: Batches and Groups​

Timestamp-Grouped Batches​

Batch Size Limits​

Change-of-Value Filtering Before Batching​

The Page Buffer: Store-and-Forward in Fixed Memory​

Architecture​

Page Layout​

Write Path​

Delivery Path​

Thread Safety​

Delivery Tracking and Watchdogs​

Delivery Timestamp Tracking​

Status Telemetry​

Bandwidth Math: Real-World Numbers​

JSON Format​

Binary Format​

Decoding on the Cloud Side​

How machineCDN Implements Binary Telemetry​

When to Use Binary vs. JSON​

Getting Started​

Why Cellular, Not Ethernet​

Gateway Architecture: What's Inside​

The Dual-Protocol Challenge​

Polling Architecture​

Tag-Based Polling​

Contiguous Register Optimization​

Multi-Rate Polling​

Handling Read Failures​

Batch Buffering and Telemetry Upload​

Batch Parameters​

Startup Buffering​

Store-and-Forward During Outages​

Connectivity Monitoring​

PLC Link Status​

Signal Quality Metrics​

Remote Configuration and Over-the-Air Updates​

Configuration Push​

Firmware Updates​

Multi-Device Gateway Configurations​

Alarm Processing at the Gateway​

Byte-Level Alarm Decoding​

Security Considerations​

Minimum Security Checklist​

How machineCDN Deploys Gateways at Scale​

The Fundamental Problem: Registers Don't Know What They Contain​

Modbus Register Types and Function Code Mapping​

The Byte Ordering Nightmare​

32-Bit Integers from Two Registers​

IEEE 754 Floating-Point: Where It Gets Ugly​

Generic Float Decoding Pattern​

Integer Scaling: The Hidden Conversion​

Divide-by-10 Temperature​

Divide-by-100 Pressure​

Offset + Scale​

Bit Extraction from Packed Status Words​

Contiguous Register Coalescence​

Change Detection and Report-by-Exception​

Multi-Protocol Device Detection​

Batch Delivery and Wire Efficiency​

Error Handling and Resilience​

Why Connection Resilience Isn't Optional

The Link-State Model

Link-State as a Virtual Tag

Protocol-Specific Failure Detection

Modbus TCP

Modbus RTU (Serial)

EtherNet/IP (CIP)

Contiguous Read Failure Handling

The Hourly Reset Pattern

Buffered Delivery: Surviving MQTT Disconnections

Practical Deployment Checklist

How machineCDN Handles This

Conclusion

Why JSON Falls Short for Industrial Telemetry

The Overhead Tax

CPU Cost on Embedded Gateways

Designing a Binary Telemetry Format

Message Structure

Type Encoding

Size Comparison

Data Grouping: Batches and Groups

Timestamp-Grouped Batches

Batch Size Limits

Change-of-Value Filtering Before Batching

The Page Buffer: Store-and-Forward in Fixed Memory

Architecture

Page Layout

Write Path

Delivery Path

Thread Safety

Delivery Tracking and Watchdogs

Delivery Timestamp Tracking

Status Telemetry

Bandwidth Math: Real-World Numbers

JSON Format

Binary Format

Decoding on the Cloud Side

How machineCDN Implements Binary Telemetry

When to Use Binary vs. JSON

Getting Started

Why Cellular, Not Ethernet

Gateway Architecture: What's Inside

The Dual-Protocol Challenge

Polling Architecture

Tag-Based Polling

Contiguous Register Optimization

Multi-Rate Polling

Handling Read Failures

Batch Buffering and Telemetry Upload

Batch Parameters

Startup Buffering

Store-and-Forward During Outages

Connectivity Monitoring

PLC Link Status

Signal Quality Metrics

Remote Configuration and Over-the-Air Updates

Configuration Push

Firmware Updates

Multi-Device Gateway Configurations

Alarm Processing at the Gateway

Byte-Level Alarm Decoding

Security Considerations

Minimum Security Checklist

How machineCDN Deploys Gateways at Scale

The Fundamental Problem: Registers Don't Know What They Contain

Modbus Register Types and Function Code Mapping

The Byte Ordering Nightmare

32-Bit Integers from Two Registers

IEEE 754 Floating-Point: Where It Gets Ugly

Generic Float Decoding Pattern

Integer Scaling: The Hidden Conversion

Divide-by-10 Temperature

Divide-by-100 Pressure

Offset + Scale

Bit Extraction from Packed Status Words

Contiguous Register Coalescence

Change Detection and Report-by-Exception

Multi-Protocol Device Detection

Batch Delivery and Wire Efficiency

Error Handling and Resilience