2 posts tagged with "polling-optimization"

Modbus Address Mapping Demystified: Register Ranges, Function Codes, and Sorted-Tag Optimization [2026]

March 2, 2026 · 10 min read

If you've ever stared at a PLC register map and wondered why address 400001 is actually register 0, or why your gateway reads the wrong data when you mix holding registers with input registers in the same request — this article is for you.

Modbus addressing is one of the most misunderstood aspects of industrial communication. The protocol itself is simple. The addressing conventions built on top of it are where engineers lose hours to debugging. And the optimization strategies for reading registers efficiently can cut your polling time in half.

Let's break it all down.

Modbus register address mapping and function codes

The Four Register Ranges

Modbus defines four distinct data spaces, each with its own addressing convention and access characteristics:

Range	Address Convention	Function Code (Read)	Function Code (Write)	Data Type	Access
Coils	0xxxxx (0–65535)	FC 01	FC 05/15	Single bit	Read/Write
Discrete Inputs	1xxxxx (100000–165535)	FC 02	—	Single bit	Read Only
Input Registers	3xxxxx (300000–365535)	FC 04	—	16-bit word	Read Only
Holding Registers	4xxxxx (400000–465535)	FC 03	FC 06/16	16-bit word	Read/Write

The Great Addressing Confusion

Here's the source of 90% of Modbus debugging pain: the convention addresses include a prefix digit that doesn't exist in the actual protocol.

When you see "address 400001" in a PLC manual, the actual register address sent over the wire is 0 (zero). The "4" prefix tells you it's a holding register (use FC 03), and the remaining digits are 1-indexed, so you subtract 1.

Convention Address → Wire Address
400001 → Holding Register 0 (FC 03)
400100 → Holding Register 99 (FC 03)
300001 → Input Register 0 (FC 04)
000001 → Coil 0 (FC 01)
100001 → Discrete Input 0 (FC 02)

Some PLC manufacturers use 0-indexed conventions (Modicon style), while others use 1-indexed (common in building automation). Always verify with the actual PLC documentation. Getting this wrong by one register means you're reading the wrong variable — which might look plausible but be subtly incorrect, leading to phantom issues that take days to diagnose.

Automatic Function Code Selection

A well-designed gateway should automatically determine the correct Modbus function code from the register address, eliminating manual configuration errors:

Address Range           → Function Code
0 – 65535              → FC 01 (Read Coils)
100000 – 165535        → FC 02 (Read Discrete Inputs)
300000 – 365535        → FC 04 (Read Input Registers)
400000 – 465535        → FC 03 (Read Holding Registers)

This means the engineer configuring the gateway only needs to specify the convention address from the PLC manual. The gateway strips the prefix, determines the function code, and calculates the wire address automatically.

To extract the wire address from the convention address:

if address >= 400000:
    wire_address = address - 400000
    function_code = 3
elif address >= 300000:
    wire_address = address - 300000
    function_code = 4
elif address >= 100000:
    wire_address = address - 100000
    function_code = 2
else:
    wire_address = address
    function_code = 1

16-Bit vs. 32-Bit Values: The Element Count Problem

Each Modbus register holds exactly 16 bits. But many real-world values are 32-bit (floats, unsigned integers, large counters). This requires reading two consecutive registers and combining them.

Element Count Configuration

When configuring a tag, you specify the element count — how many 16-bit registers to read for this tag:

Data Type	Element Count	Registers Read
`bool`	1	1 register (only LSB used)
`int8` / `uint8`	1	1 register (masked to 8 bits)
`int16` / `uint16`	1	1 register
`int32` / `uint32`	2	2 consecutive registers
`float` (IEEE 754)	2	2 consecutive registers

Byte Ordering (The Endianness Trap)

When combining two 16-bit registers into a 32-bit value, the byte order matters — and different PLCs use different conventions:

Big-endian (Modbus standard): Register N = high word, Register N+1 = low word

uint32_value = (register[N+1] << 16) | register[N]

Little-endian (some PLCs): Register N = low word, Register N+1 = high word

uint32_value = (register[N] << 16) | register[N+1]

For IEEE 754 floating-point values, the situation is even trickier. The modbus_get_float() function in libmodbus handles the byte swapping, but you need to know your PLC's byte order. Common byte orderings for 32-bit floats:

Order	Name	Register Layout
ABCD	Big-endian	Most common in Modicon/Schneider
DCBA	Little-endian	Some Allen-Bradley
BADC	Mid-big-endian	Siemens S7
CDAB	Mid-little-endian	Some Japanese PLCs

Pro tip: If you're getting nonsensical float values (like 1.4e38 when you expect 72.5), you almost certainly have a byte-order mismatch. Swap the register order and try again.

Handling 8-Bit Values from 16-Bit Registers

When a PLC stores an 8-bit value (bool, int8, uint8) in a 16-bit register, the value sits in the lower byte:

register_value = 0x00FF  # Only lower 8 bits are the value
int8_value = register_value & 0xFF
bool_value = register_value & 0x01

This is straightforward for single registers, but gets interesting when you're reading coils (FC 01/02). Coil reads return packed bits — 8 coils per byte — which need to be unpacked into individual boolean values.

Sorted-Tag Optimization: Contiguous Register Grouping

This is where theory meets performance. A naive implementation reads each tag individually:

# Naive: 10 tags = 10 separate Modbus requests
read_register(400100)  # Temperature
read_register(400101)  # Pressure  
read_register(400102)  # Flow rate
read_register(400103)  # Setpoint
...

Each Modbus request has overhead: a request frame (~8 bytes), a response frame (~8 bytes + data), and a round-trip delay (typically 5-50ms on a LAN, 50-400ms on serial). Ten individual reads means 10× the overhead.

The Contiguous Grouping Algorithm

Instead, a smart gateway sorts tags by address and groups contiguous registers into single multi-register reads:

# Optimized: 10 contiguous tags = 1 Modbus request
read_registers(400100, count=10)  # All 10 values in one shot

The algorithm works in four steps:

Step 1: Sort tags by address. When the configuration is loaded, tags are inserted into a sorted linked list ordered by their register address. This is critical — without sorting, you can't detect contiguous ranges.

Step 2: Identify contiguous groups. Walk the sorted list and group tags that satisfy ALL of these conditions:

Same function code (same register type)
Addresses are contiguous (tag N+1 starts where tag N ends)
Same polling interval
Total register count doesn't exceed the protocol limit

# Grouping logic
head = first_tag
registers = head.element_count
tags_in_group = 1

for each subsequent tag:
    if tag.function_code == head.function_code
       AND head.address + registers == tag.address
       AND head.interval == tag.interval
       AND registers < MAX_REGISTERS:
        # Attach to current group
        registers += tag.element_count
        tags_in_group += 1
    else:
        # Read current group, start new one
        read_registers(head.address, registers)
        head = tag
        registers = tag.element_count
        tags_in_group = 1

Step 3: Respect the maximum register limit. The Modbus spec allows up to 125 registers per read request (FC 03/04) or 2000 coils per read (FC 01/02). In practice, many PLCs have lower limits. A safe ceiling is 50 registers per request — this keeps response sizes under 100 bytes, reducing the chance of packet fragmentation and timeouts.

Step 4: Dispatch values. After the multi-register read returns, walk the buffer and dispatch values to individual tags based on their position in the group.

Performance Impact

On a typical Modbus TCP network with 5ms round-trip time:

Tags	Naive Approach	Grouped Approach	Speedup
10	50ms	5ms	10×
50	250ms	25ms (5 groups)	10×
100	500ms	50ms (10 groups)	10×

On Modbus RTU at 9600 baud, the difference is even more dramatic:

Tags	Naive	Grouped	Speedup
10	800ms	120ms	6.7×
50	4000ms	600ms	6.7×

For a gateway polling 50 tags every second, naive reads won't even fit in the time budget on serial. Grouping makes it feasible.

Handling Gaps

What about non-contiguous registers? If tags at addresses 400100 and 400105 need reading, you have two choices:

Read the gap: Request registers 400100–400105 (6 registers), discard the 4 unused ones. Wastes bandwidth but saves a round-trip.
Split into two reads: Two separate requests for 400100 (1 reg) and 400105 (1 reg). Two round-trips but no wasted data.

The breakeven point depends on your network. For gaps of 3 registers or fewer, reading the gap is usually faster. For larger gaps, split. A good heuristic:

if gap_size <= 3:
    read_through_gap()
else:
    split_into_separate_reads()

Inter-Read Delays: The 50ms Rule

After each contiguous group read, insert a small delay (typically 50ms) before the next read. This serves two purposes:

PLC processing time: Some PLCs need time between successive requests to maintain their scan cycle. Hammering them with back-to-back reads can cause watchdog timeouts.
Serial line recovery: On RS-485, the bus needs time to switch direction between request and response. Without this gap, you risk frame collisions on noisy lines.

read_group_1()
sleep(50ms)  # Let the PLC breathe
read_group_2()
sleep(50ms)
read_group_3()

This 50ms penalty per group is why minimizing the number of groups (through contiguous addressing) matters so much.

Change Detection: Read vs. Deliver

Reading a tag and delivering its value to the cloud are two separate decisions. Efficient gateways implement change detection to avoid delivering unchanged values:

if tag.compare_enabled:
    if new_value == tag.last_value:
        # Value unchanged — don't deliver
        update_read_timestamp()
        continue
    else:
        # Value changed — deliver and update
        deliver(tag, new_value)
        tag.last_value = new_value

Combined with interval-based polling, this creates a two-tier optimization:

Interval: Don't read the tag if less than N seconds have elapsed since the last read
Comparison: Don't deliver the value if it hasn't changed since the last delivery

The result: your MQTT bandwidth is dominated by actual state changes, not redundant repetitions of the same value.

Practical Configuration Example

Here's how a well-configured Modbus device might look in JSON:

{
  "protocol": "modbus-tcp",
  "device_type": 1017,
  "plctags": [
    {
      "name": "mold_temp_actual",
      "id": 1,
      "type": "float",
      "addr": 400100,
      "ecount": 2,
      "interval": 5,
      "compare": true
    },
    {
      "name": "mold_temp_setpoint",
      "id": 2,
      "type": "float",
      "addr": 400102,
      "ecount": 2,
      "interval": 60,
      "compare": true
    },
    {
      "name": "pump_running",
      "id": 3,
      "type": "bool",
      "addr": 000010,
      "ecount": 1,
      "interval": 1,
      "compare": true,
      "do_not_batch": true
    },
    {
      "name": "alarm_word",
      "id": 4,
      "type": "uint16",
      "addr": 400200,
      "ecount": 1,
      "interval": 1,
      "compare": true,
      "do_not_batch": true
    }
  ]
}

Notice:

The two temperature tags (400100, 400102) are contiguous and will be read as one 4-register block
They have different intervals (5s vs 60s), so they'll only group when both are due
The alarm word uses do_not_batch: true — it's delivered immediately on change, not held for the next batch
The pump running tag reads a coil (address < 100000), so it uses FC 01 — it can't group with the holding registers

How machineCDN Optimizes Modbus Polling

machineCDN's edge daemon automatically sorts tags by address at configuration load time, groups contiguous registers with matching intervals and function codes, and caps each read at 50 registers to prevent timeouts on older PLCs. The firmware handles the address-to-function-code mapping transparently — engineers configure tags using the convention addresses from the PLC manual, and the gateway handles the rest.

For devices with mixed protocols (e.g., a machine with EtherNet/IP on the main PLC and Modbus RTU on the temperature controller), machineCDN runs independent polling loops per protocol, each with its own connection management and buffering — so a failure on the serial line doesn't affect the EtherNet/IP connection.

Conclusion

Modbus addressing doesn't have to be painful. The key takeaways:

Understand the four address ranges and how they map to function codes — this eliminates the #1 source of configuration errors
Sort tags by address at configuration time to enable contiguous grouping
Group contiguous registers into single multi-register reads — the performance improvement is 5–10× on typical deployments
Handle 32-bit values carefully — element count and byte ordering are the two most common float-reading bugs
Cap register counts at 50 per read to stay within PLC capabilities
Use change detection to minimize cloud bandwidth — only deliver values that actually changed
Insert 50ms delays between group reads to respect PLC processing requirements

Master these patterns, and you'll spend your time analyzing production data instead of debugging communication failures.

Modbus Polling Optimization: Register Grouping, Retry Logic, and Multi-Device Scheduling [2026 Guide]

March 1, 2026 · 15 min read

Modbus Register Polling Optimization

Modbus is 46 years old and still the most commonly deployed industrial protocol on the planet. It runs in power plants, water treatment facilities, HVAC systems, plastics factories, and pharmaceutical clean rooms. Its simplicity is its superpower — and its trap.

Because Modbus is conceptually simple (read some registers, write some registers), engineers tend to implement polling in the most straightforward way possible: loop through tags, read each one, repeat. This works fine for 10 tags on one device. It falls apart spectacularly at 200 tags across eight devices on a congested RS-485 bus.

This guide covers the polling optimization techniques that separate hobbyist implementations from production-grade edge gateways — the kind that power platforms like machineCDN across thousands of connected machines.

The Four Function Codes That Matter

Before optimizing anything, you need to understand how Modbus maps register addresses to function codes. This mapping is the foundation of every optimization strategy.

Address Range	Function Code	Read Type	Register Type
0 – 65,535	FC 01	Read Coils	Discrete Output (1-bit)
100,000 – 165,536	FC 02	Read Discrete Inputs	Discrete Input (1-bit)
300,000 – 365,536	FC 04	Read Input Registers	Analog Input (16-bit)
400,000 – 465,536	FC 03	Read Holding Registers	Analog Output (16-bit)

The critical insight: You cannot mix function codes in a single Modbus request. A read of holding registers (FC 03) and a read of input registers (FC 04) are always two separate transactions, even if the registers are numerically adjacent when you strip the prefix.

This means your first optimization step is grouping tags by function code. A tag list with 50 holding registers and 10 input registers requires at minimum 2 requests, not 1 — no matter how clever your batching.

Address Decoding in Practice

Many Modbus implementations use the address prefix convention to encode both the register type and the function code:

Address 404000 → Function Code 3, register 4000
Address 304000 → Function Code 4, register 4000
Address 4000 → Function Code 1, coil 4000
Address 104000 → Function Code 2, discrete input 4000

The register address sent on the wire is the address modulo the range base. So 404000 becomes register 4000 in the actual Modbus PDU. Getting this decoding wrong is the #1 cause of "I can read the same register in my Modbus scanner tool but not in my gateway" issues.

Contiguous Register Grouping

The single most impactful optimization in Modbus polling is contiguous register grouping — combining multiple sequential register reads into a single bulk read.

Why It Matters: The Overhead Math

Every Modbus transaction has fixed overhead:

Component	RTU (Serial)	TCP
Request frame	8 bytes	12 bytes (MBAP header + PDU)
Response frame header	5 bytes	9 bytes
Turnaround delay	3.5 char times (RTU)	~1ms (TCP)
Response data	2 × N registers	2 × N registers
Inter-frame gap	3.5 char times (RTU)	N/A
Total overhead per request	~50ms at 9600 baud	~2-5ms

For RTU at 9600 baud, each individual register read (request + response + delays) takes roughly 50ms. Reading 50 registers individually = 2.5 seconds. Reading them as one bulk request of 50 contiguous registers = ~120ms. That's a 20x improvement.

Grouping Algorithm

The practical algorithm for contiguous grouping:

Sort tags by function code, then by register address within each group
Walk the sorted list and identify contiguous runs (where addr[n+1] <= addr[n] + elem_count[n])
Enforce a maximum group size — the Modbus spec allows up to 125 registers (250 bytes) per FC 03/04 read, but practical implementations should cap at 50-100 to stay within device buffer limits
Handle gaps intelligently — if two tags are separated by 3 unused registers, it's cheaper to read the gap (3 extra registers × 2 bytes = 6 bytes) than to issue a separate request (50ms+ overhead)

Gap Tolerance: The Break-Even Point

When should you read through a gap versus splitting into two requests?

For Modbus TCP, the overhead of a separate request is ~2-5ms. Each extra register costs ~0.02ms. Break-even: ~100-250 register gap — almost always worth reading through.

For Modbus RTU at 9600 baud, the overhead is ~50ms. Each register costs ~2ms. Break-even: ~25 registers — read through anything smaller, split anything larger.

For Modbus RTU at 19200 baud, overhead drops to ~25ms, each register ~1ms. Break-even: ~25 registers — similar ratio holds.

Practical recommendation: Set your gap tolerance to 20 registers for RTU and 100 registers for TCP. You'll read a few bytes of irrelevant data but dramatically reduce transaction count.

Multi-Register Data Types

Many industrial values span multiple consecutive registers:

Data Type	Registers	Bytes	Common Use
INT16 / UINT16	1	2	Discrete values, status codes
INT32 / UINT32	2	4	Counters, accumulated values
FLOAT32 (IEEE 754)	2	4	Temperatures, pressures, flows
FLOAT64	4	8	High-precision measurements

A 32-bit float at register 4002 occupies registers 4002 and 4003. Your grouping algorithm must account for elem_count — reading only register 4002 gives you half a float, which decodes to a nonsensical value.

Byte Ordering Nightmares

This is where Modbus gets genuinely painful. The Modbus spec defines big-endian register ordering, but says nothing about how multi-register values should be assembled. Different manufacturers use different conventions:

Byte Order	Register Order	Name	Who Uses It
Big-endian	High word first	AB CD	Most European PLCs, Siemens
Big-endian	Low word first	CD AB	Some Asian manufacturers
Little-endian	High word first	BA DC	Rare
Little-endian	Low word first	DC BA	Some legacy equipment

A temperature reading of 42.5°C stored as IEEE 754 float 0x42AA0000:

AB CD: Register 4002 = 0x422A, Register 4003 = 0x0000 → ✅ 42.5
CD AB: Register 4002 = 0x0000, Register 4003 = 0x422A → ✅ 42.5 (if you swap)
BA DC: Register 4002 = 0x2A42, Register 4003 = 0x0000 → ❌ Garbage without byte-swap

The only reliable approach: During commissioning, write a known value (e.g., 100.0 = 0x42C80000) to a test register and verify that your gateway decodes it correctly. Document the byte order per device — it will save you hours later.

Production-grade platforms like machineCDN handle byte ordering at the device configuration level, so each connected machine can have its own byte-order profile without requiring custom parsing logic.

Intelligent Retry Logic

Network errors happen. Serial bus collisions happen. PLCs get busy and respond late. Your retry strategy determines whether a transient error becomes a data gap or a transparent recovery.

The Naive Approach (Don't Do This)

for each tag:
    result = read_register(tag)
    if failed:
        retry 3 times immediately

Problems:

Hammers a struggling device with back-to-back requests
Blocks all other reads while retrying
Doesn't distinguish between transient errors (timeout) and permanent errors (wrong address)

A Better Approach: Error-Classified Retry

Different errors deserve different responses:

Error	Severity	Action
Timeout (ETIMEDOUT)	Transient	Retry with backoff, reconnect if persistent
Connection reset (ECONNRESET)	Connection	Close connection, reconnect, resume
Connection refused (ECONNREFUSED)	Infrastructure	Back off significantly (device may be rebooting)
Broken pipe (EPIPE)	Connection	Reconnect immediately
Bad file descriptor (EBADF)	Internal	Recreate context from scratch
Illegal data address (Modbus exception 02)	Permanent	Don't retry — tag is misconfigured
Device busy (Modbus exception 06)	Transient	Retry after delay

Retry Count and Timing

For batch reads (contiguous groups), a reasonable strategy:

First attempt: Read the full register group
On failure: Wait 50ms (RTU) or 10ms (TCP), retry the same group
Second failure: Wait 100ms, retry once more
Third failure: Log the error, mark the group as failed for this cycle, move to next group
After a connection-level error (timeout, reset, refused):
- Close the Modbus context
- Set device state to "disconnected"
- On next poll cycle, attempt reconnection
- If reconnection succeeds, flush any stale data in the serial buffer before resuming reads

Critical detail for RTU: After a timeout or error, always flush the serial buffer before retrying. Stale bytes from a partial response can corrupt the next transaction's framing, causing a cascade of CRC errors.

Inter-Request Delay

Modbus RTU requires a 3.5 character-time silence between frames. At 9600 baud, this is approximately 4ms. At 19200 baud, it's 2ms.

Many implementations add a fixed 50ms delay between requests as a safety margin. This works but is wasteful — on a 100-tag system, you're spending 5 seconds just on inter-request delays.

Better approach: Use a 5ms delay at 9600 baud and a 2ms delay at 19200 baud. Monitor CRC error rates — if they increase, lengthen the delay. Some older devices need more silence time than the spec requires.

Multi-Device Polling Scheduling

When your edge gateway talks to multiple Modbus devices (common in manufacturing — one PLC per machine line, plus temperature controllers, VFDs, and meters), polling strategy becomes a scheduling problem.

Round-Robin: Simple but Wasteful

The naive approach:

while true:
    for each device:
        for each tag_group in device:
            read(tag_group)
    sleep(poll_interval)

Problem: If you have 8 devices with different priorities, the critical machine's data is delayed by reads to 7 other devices.

Priority-Based with Interval Tiers

A better model uses per-tag read intervals:

Tier	Interval	Typical Tags	Purpose
Critical	1-5 sec	Machine running/stopped, alarm bits, emergency states	Immediate operational awareness
Process	30-60 sec	Temperatures, pressures, RPMs, flow rates, power consumption	Trend analysis, anomaly detection
Diagnostic	5-60 min	Firmware version, serial numbers, configuration values, cumulative counters	Asset management

Implementation: Maintain a last_read_timestamp per tag (or per tag group). On each poll loop, only read groups where now - last_read > interval.

This dramatically reduces bus traffic. In a typical plastics manufacturing scenario with 8 machines:

Without tiers: 400 register reads every 5 seconds = 80 reads/sec
With tiers: 80 critical + 40 process + 2 diagnostic = ~14 reads/sec average

That's a 5.7x reduction in bus utilization.

Change-Based Transmission

For data going to the cloud, there's another optimization layer: compare-on-change. Many industrial values don't change between reads — a setpoint stays at 350°F for hours, a machine status stays "running" for the entire shift.

The strategy:

Read the register at its configured interval (always — you need to know the current value)
Compare the new value against the last transmitted value
Only transmit to the cloud if:
- The value has changed, OR
- A maximum time-without-update has elapsed (heartbeat)

For boolean tags (machine running, alarm active), compare every read and transmit immediately on change — these are the signals that matter most for operational response.

For analog tags (temperature, pressure), you can add a deadband: only transmit if the value has changed by more than X% or Y absolute units. A temperature reading that bounces between 349.8°F and 350.2°F doesn't need to generate 60 cloud messages per hour.

machineCDN's edge agent implements this compare-and-transmit pattern natively, batching changed values into optimized payloads that minimize both bandwidth and cloud ingestion costs.

RTU vs TCP: Polling Strategy Differences

Modbus RTU (Serial RS-485)

Half-duplex: Only one device can transmit at a time
Single bus: All devices share the same wire pair
Addressing: Slave address 1-247 (broadcast at 0)
Speed: Typically 9600-115200 baud
Critical constraint: Bus contention — you MUST wait for a complete response (or timeout) before addressing another device
CRC: 16-bit CRC appended to every frame

RTU polling tips:

Set timeouts based on maximum expected response size. For 125 registers at 9600 baud: (125 * 2 bytes * 10 bits/byte) / 9600 = ~260ms plus overhead ≈ 300-500ms timeout
Never set the timeout below the theoretical transmission time — you'll get phantom timeouts
If one device on the bus goes unresponsive, its timeout blocks ALL other devices. Aggressive timeout + retry is better than patient timeout + no retry.

Modbus TCP

Full-duplex: Request and response can overlap (on different connections)
Multi-connection: Each device gets its own TCP socket
No contention: Parallel reads to different devices
Speed: 100Mbps+ network bandwidth (practically unlimited for Modbus payloads)
Transaction ID: The MBAP header includes a transaction ID for matching responses to requests

TCP polling tips:

Use persistent connections — TCP handshake + Modbus connection setup adds 10-50ms per connection. Reconnect only on error.
You CAN poll multiple TCP devices simultaneously using non-blocking sockets or threads. This is a massive advantage over RTU.
Set TCP keepalive on the socket — some industrial firewalls and managed switches close idle connections after 60 seconds.
The Modbus/TCP unit identifier field is usually ignored (set to 0xFF) for direct device connections, but matters if you're going through a TCP-to-RTU gateway.

Bandwidth Optimization: Binary vs JSON Batching

Once you've read your registers, the data needs to get to the cloud. The payload format matters enormously at scale.

JSON Format (Human-Readable)

{
  "groups": [{
    "ts": 1709330400,
    "device_type": 5000,
    "serial_number": 1106336053,
    "values": [
      {"id": 1, "v": 4.4, "st": 0},
      {"id": 2, "v": 162.5, "st": 0},
      {"id": 3, "v": 158.3, "st": 0}
    ]
  }]
}

For 3 values: ~200 bytes.

Binary Format (Machine-Optimized)

A well-designed binary format encodes the same data in:

4 bytes: timestamp (uint32)
2 bytes: device type (uint16)
4 bytes: serial number (uint32)
Per value: 2 bytes (tag ID) + 1 byte (status) + 4 bytes (value) = 7 bytes

For 3 values: 31 bytes — an 84% reduction.

Over cellular connections (common for remote industrial sites), this difference is enormous. A machine reporting 50 values every 60 seconds:

JSON: ~3.3 KB/min → 4.8 MB/day → 144 MB/month
Binary: ~0.5 KB/min → 0.7 MB/day → 21 MB/month

That's the difference between a $10/month cellular plan and a $50/month plan — multiplied by every connected machine.

The batching approach also matters. Instead of transmitting each read result immediately, accumulate values into a batch and transmit when either:

The batch reaches a size threshold (e.g., 4KB)
A time threshold expires (e.g., 30 seconds since batch started)

This amortizes the MQTT/HTTP overhead across many data points and enables efficient compression.

Store-and-Forward: Surviving Connectivity Gaps

Industrial environments have unreliable connectivity — cellular modems reboot, VPN tunnels flap, WiFi access points go down during shift changes. Your polling shouldn't stop when the cloud connection drops.

A robust edge gateway implements a local buffer:

Poll and batch as normal, regardless of cloud connectivity
When connected: Transmit batches immediately
When disconnected: Store batches in a ring buffer (sized to the available memory)
When reconnected: Drain the buffer in chronological order before transmitting live data

Buffer Sizing

The buffer should be sized to survive your typical outage duration:

Data Rate	1-Hour Buffer	8-Hour Buffer	24-Hour Buffer
1 KB/min	60 KB	480 KB	1.4 MB
10 KB/min	600 KB	4.8 MB	14.4 MB
100 KB/min	6 MB	48 MB	144 MB

For embedded gateways with 256MB RAM, a 2MB ring buffer comfortably handles 8-24 hours of typical industrial data at modest polling rates. The key design decision is what happens when the buffer fills: either stop accepting new data (gap in the oldest data) or overwrite the oldest data (gap in the newest data). For most IIoT use cases, overwriting oldest is preferred — recent data is more actionable than historical data.

Putting It All Together: A Production Polling Architecture

Here's what a production-grade Modbus polling pipeline looks like:

┌─────────────────────────────────────────────────────┐
│                 POLL SCHEDULER                       │
│  Per-tag intervals → Priority queue → Due tags      │
└──────────────┬──────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────┐
│              REGISTER GROUPER                        │
│  Sort by FC → Find contiguous runs → Build groups   │
└──────────────┬──────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────┐
│              MODBUS READER                           │
│  Execute reads → Retry on error → Reconnect logic   │
└──────────────┬──────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────┐
│              VALUE PROCESSOR                         │
│  Byte-swap → Type conversion → Scaling → Compare    │
└──────────────┬──────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────┐
│              BATCH ENCODER                           │
│  Group by timestamp → Encode (JSON/binary) → Size   │
└──────────────┬──────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────┐
│           STORE-AND-FORWARD BUFFER                   │
│  Ring buffer → Page management → Drain on connect   │
└──────────────┬──────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────┐
│              MQTT PUBLISHER                          │
│  QoS 1 → Async delivery → ACK tracking             │
└─────────────────────────────────────────────────────┘

This is the architecture that machineCDN's edge agent implements. Each layer is independently testable, and failures at any layer don't crash the pipeline — they produce graceful degradation (data gaps in the cloud, not system crashes on the factory floor).

Benchmarks: Before and After Optimization

Real numbers from a plastics manufacturing deployment with 8 Modbus RTU devices on a 19200 baud RS-485 bus, 320 total tags:

Metric	Unoptimized	Optimized	Improvement
Full poll cycle time	48 sec	6.2 sec	7.7x faster
Requests per cycle	320	42	87% fewer
Bus utilization	94%	12%	Room for growth
Data gaps per day	15-20	0-1	Near-zero
Cloud bandwidth (daily)	180 MB	28 MB	84% reduction
Avg tag staleness	48 sec	6 sec	8x fresher

The unoptimized system couldn't even complete a poll cycle in under a minute. The optimized system polls every 6 seconds with headroom to add more devices.

Final Recommendations

Always group contiguous registers — this is non-negotiable for production systems
Use tiered polling intervals — not every tag needs the same update rate
Implement error-classified retry — don't retry permanent errors, do retry transient ones
Use binary encoding for cellular — JSON is fine for LAN-connected gateways
Size your store-and-forward buffer for your realistic outage window
Flush the serial buffer after errors — this prevents CRC cascades on RTU
Document byte ordering per device — test with known values during commissioning
Monitor bus utilization — stay below 30% to leave headroom for retries and growth

Modbus isn't going away. But the difference between a naive implementation and an optimized one is the difference between a system that barely works and one that scales to hundreds of machines without breaking a sweat.

machineCDN's edge agent handles Modbus RTU and TCP with optimized register grouping, binary batching, and store-and-forward buffering out of the box. Connect your PLCs in minutes, not weeks. Get started →

The Four Register Ranges​

The Great Addressing Confusion​

Automatic Function Code Selection​

16-Bit vs. 32-Bit Values: The Element Count Problem​

Element Count Configuration​

Byte Ordering (The Endianness Trap)​

Handling 8-Bit Values from 16-Bit Registers​

Sorted-Tag Optimization: Contiguous Register Grouping​

The Contiguous Grouping Algorithm​

Performance Impact​

Handling Gaps​

Inter-Read Delays: The 50ms Rule​

Change Detection: Read vs. Deliver​

Practical Configuration Example​

How machineCDN Optimizes Modbus Polling​

Conclusion​

The Four Function Codes That Matter​

Address Decoding in Practice​

Contiguous Register Grouping​

Why It Matters: The Overhead Math​

Grouping Algorithm​

Gap Tolerance: The Break-Even Point​

Multi-Register Data Types​

Byte Ordering Nightmares​

Intelligent Retry Logic​

The Naive Approach (Don't Do This)​

A Better Approach: Error-Classified Retry​

Retry Count and Timing​

Inter-Request Delay​

Multi-Device Polling Scheduling​

Round-Robin: Simple but Wasteful​

Priority-Based with Interval Tiers​

Change-Based Transmission​

RTU vs TCP: Polling Strategy Differences​

Modbus RTU (Serial RS-485)​

Modbus TCP​

Bandwidth Optimization: Binary vs JSON Batching​

JSON Format (Human-Readable)​

Binary Format (Machine-Optimized)​

Store-and-Forward: Surviving Connectivity Gaps​

Buffer Sizing​

Putting It All Together: A Production Polling Architecture​

Benchmarks: Before and After Optimization​

Final Recommendations​