3 posts tagged with "networking"

OPC-UA Pub/Sub Over TSN: Building Deterministic Industrial Networks [2026 Guide]

March 4, 2026 · 12 min read

OPC-UA Pub/Sub over TSN architecture

The traditional OPC-UA client/server model has served manufacturing well for decades of SCADA modernization. But as factories push toward converged IT/OT networks — where machine telemetry, MES transactions, and enterprise ERP traffic share the same Ethernet fabric — the client/server polling model starts to buckle under latency requirements that demand microsecond-level determinism.

OPC-UA Pub/Sub over TSN solves this by decoupling data producers from consumers entirely, while TSN's IEEE 802.1 extensions guarantee bounded latency delivery. This guide breaks down how these technologies work together, the pitfalls of real-world deployment, and the configuration patterns that actually work on production floors.

Why Client/Server Breaks Down at Scale

In a typical OPC-UA client/server deployment, every consumer opens a session to every producer. A plant with 50 machines and 10 data consumers (HMIs, historians, analytics engines, edge gateways) generates 500 active sessions. Each session carries its own subscription, and the server must serialize, authenticate, and deliver data to each client independently.

The math gets brutal quickly:

50 machines × 200 tags each = 10,000 data points
10 consumers polling at 1-second intervals = 100,000 read operations per second
Session overhead: ~2KB per subscription keepalive × 500 sessions = 1MB/s baseline traffic before any actual data moves

In practice, most OPC-UA servers in PLCs hit their connection ceiling around 15-20 simultaneous sessions. Allen-Bradley Micro800 series and Siemens S7-1200 controllers — the workhorses of mid-market automation — will start rejecting connections well before you've connected all your consumers.

Pub/Sub eliminates the N×M session problem by introducing a one-to-many data distribution model where publishers push data to the network without knowing (or caring) who's consuming it.

The Pub/Sub Architecture: How Data Actually Flows

OPC-UA Pub/Sub introduces three key concepts that don't exist in the client/server model:

Publishers and DataSets

A publisher is any device that produces data — typically a PLC, edge gateway, or sensor hub. Instead of waiting for client requests, publishers periodically assemble DataSets — structured collections of tag values with metadata — and push them to the network.

A DataSet maps directly to the OPC-UA information model. If your PLC exposes temperature, pressure, and flow rate variables in an ObjectType node, the corresponding DataSet contains those three fields with their current values, timestamps, and quality codes.

The publisher configuration defines:

Which variables to include in each DataSet
Publishing interval (how often to push updates, typically 10ms-10s)
Transport protocol (UDP multicast for TSN, MQTT for cloud-bound data, AMQP for enterprise messaging)
Encoding format (UADP binary for low-latency, JSON for interoperability)

Subscribers and DataSetReaders

Subscribers declare interest in specific DataSets by configuring DataSetReaders that filter incoming network messages. A subscriber doesn't connect to a publisher — it listens on a multicast group or MQTT topic and selectively processes messages that match its reader configuration.

This is the critical architectural shift: publishers and subscribers are completely decoupled. A publisher doesn't know how many subscribers exist. A subscriber can receive data from multiple publishers without establishing any sessions.

WriterGroups and NetworkMessages

Between individual DataSets and the wire, Pub/Sub introduces WriterGroups — logical containers that batch multiple DataSets into a single NetworkMessage for efficient transport. A single NetworkMessage might contain DataSets from four temperature sensors, two pressure transducers, and a motor current monitor — all packed into one UDP frame.

This batching is crucial for TSN. Each WriterGroup maps to a TSN traffic class, and each traffic class gets its own guaranteed bandwidth reservation. By grouping DataSets with similar latency requirements into the same WriterGroup, you minimize the number of TSN stream reservations needed.

TSN: The Network Layer That Makes It Deterministic

Standard Ethernet is "best effort" — frames compete for bandwidth with no delivery guarantees. TSN (IEEE 802.1) adds four capabilities that transform Ethernet into a deterministic transport:

Time Synchronization (IEEE 802.1AS-2020)

Every device on a TSN network synchronizes to a grandmaster clock with sub-microsecond accuracy. This is non-negotiable — without a shared time reference, scheduled transmission is meaningless.

In practice, configure your TSN switches as boundary clocks and your edge gateways as slave clocks. The synchronization protocol (gPTP) runs automatically, but you need to verify accuracy after deployment:

# Check gPTP synchronization status on a Linux-based edge gateway
pmc -u -b 0 'GET CURRENT_DATA_SET'
# Look for: offsetFromMaster < 1000ns (1μs)

If your offset exceeds 1μs consistently, check cable lengths (asymmetric path delay), switch hop count (keep it under 7), and whether any non-TSN switches are breaking the timing chain.

Scheduled Traffic (IEEE 802.1Qbv)

This is the heart of TSN for industrial use. 802.1Qbv implements time-aware shaping — the switch opens and closes transmission "gates" on a strict schedule. During a gate's open window, only frames from that traffic class can transmit. During the closed window, frames are queued.

A typical gate schedule for a manufacturing cell:

Time Slot	Duration	Traffic Class	Content
0-250μs	250μs	TC7 (Scheduled)	Motion control data (servo positions)
250-750μs	500μs	TC6 (Scheduled)	Process data (temperatures, pressures)
750-5000μs	4250μs	TC0-5 (Best Effort)	IT traffic, diagnostics, file transfers

The cycle repeats every 5ms (200Hz), giving motion control data a guaranteed 250μs window every cycle — regardless of how much IT traffic is on the network.

Stream Reservation (IEEE 802.1Qcc)

Before a publisher starts transmitting, it reserves bandwidth end-to-end through every switch in the path. The reservation specifies maximum frame size, transmission interval, and latency requirement. Switches that can't honor the reservation reject it — you find out at configuration time, not at 2 AM when the line goes down.

Frame Preemption (IEEE 802.1Qbu)

When a high-priority frame needs to transmit but a low-priority frame is already in flight, preemption splits the low-priority frame, transmits the high-priority data, then resumes the interrupted frame. This reduces worst-case latency from one maximum-frame-time (12μs at 1Gbps for a 1500-byte frame) to near-zero.

Mapping OPC-UA Pub/Sub to TSN Traffic Classes

Here's where theory meets configuration. Each WriterGroup needs a TSN traffic class assignment based on its latency and jitter requirements:

Motion Control Data (TC7, under 1ms cycle)

Servo positions, encoder feedback, torque commands
Publishing interval: 1-4ms
UADP encoding (binary, no JSON overhead)
Fixed DataSet layout (no dynamic fields — the subscriber knows the structure at compile time)
Configuration tip: Set MaxNetworkMessageSize to fit within one Ethernet frame (1472 bytes for UDP). Fragmentation kills determinism.

Process Data (TC6, 10-100ms cycle)

Temperatures, pressures, flow rates, OEE counters
Publishing interval: 10-1000ms
UADP encoding for edge-to-edge, JSON for cloud-bound paths
Variable DataSet layout acceptable (metadata included in messages)

Diagnostic and Configuration (TC0-5, best effort)

Alarm states, configuration changes, firmware updates
No strict timing requirement
JSON encoding fine — human-readable diagnostics matter more than microseconds

Practical Configuration Example

For a plastics injection molding cell with 6 machines, each reporting 30 process variables at 100ms intervals:

# OPC-UA Pub/Sub Publisher Configuration (conceptual)
publisher:
  transport: udp-multicast
  multicast_group: 239.0.1.10
  port: 4840

writer_groups:
  - name: "ProcessData_Cell_A"
    publishing_interval_ms: 100
    tsn_traffic_class: 6
    max_message_size: 1472
    encoding: UADP
    datasets:
      - name: "IMM_01_Process"
        variables:
          - barrel_zone1_temp    # int16, °C × 10
          - barrel_zone2_temp    # int16, °C × 10
          - barrel_zone3_temp    # int16, °C × 10
          - mold_clamp_pressure  # float32, bar
          - injection_pressure   # float32, bar
          - cycle_time_ms        # uint32
          - shot_count           # uint32

  - name: "Alarms_Cell_A"
    publishing_interval_ms: 0  # event-driven
    tsn_traffic_class: 5
    encoding: UADP
    key_frame_count: 1  # every message is a key frame
    datasets:
      - name: "IMM_01_Alarms"
        variables:
          - alarm_word_1  # uint16, bitfield
          - alarm_word_2  # uint16, bitfield

The Data Encoding Decision: UADP vs JSON

OPC-UA Pub/Sub supports two wire formats, and choosing wrong will cost you either bandwidth or interoperability.

UADP (UA DataPoints Protocol)

Binary encoding, tightly packed
A 30-variable DataSet encodes to ~200 bytes
Supports delta frames — after an initial key frame sends all values, subsequent frames only include changed values
Requires subscribers to know the DataSet layout in advance (discovered via OPC-UA client/server or configured statically)
Use for: Edge-to-edge communication, TSN paths, anything latency-sensitive

JSON Encoding

Human-readable, self-describing
The same 30-variable DataSet expands to ~2KB
Every message carries field names and type information
No prior configuration needed — subscribers can parse dynamically
Use for: Cloud-bound telemetry, debugging, integration with IT systems

The Hybrid Pattern That Works

In practice, most deployments run UADP on the factory-floor TSN network and JSON on the cloud-bound MQTT path. The edge gateway — the device sitting between the OT and IT networks — performs the translation:

Subscribe to UADP multicast on the TSN interface
Decode DataSets using pre-configured metadata
Re-publish as JSON over MQTT to the cloud broker
Add store-and-forward buffering for cloud connectivity gaps

This is exactly the pattern that platforms like machineCDN implement — the edge gateway handles protocol translation transparently so that neither the PLCs nor the cloud backend need to understand each other's wire format.

Security Considerations for Pub/Sub Over TSN

The multicast nature of Pub/Sub changes the security model fundamentally. In client/server OPC-UA, each session is authenticated and encrypted end-to-end with X.509 certificates. In Pub/Sub, there's no session — data flows to anyone on the multicast group.

SecurityMode Options

OPC-UA Pub/Sub defines three security modes per WriterGroup:

None — no encryption, no signing. Acceptable only on physically isolated networks with no IT connectivity.
Sign — messages are signed with the publisher's private key. Subscribers verify authenticity but data is readable by anyone on the network.
SignAndEncrypt — messages are both signed and encrypted. Requires key distribution to all authorized subscribers.

Key Distribution: The Hard Problem

Unlike client/server where keys are exchanged during session establishment, Pub/Sub needs a Security Key Server (SKS) that distributes symmetric keys to publishers and subscribers. The SKS rotates keys periodically (recommended: every 1-24 hours depending on sensitivity).

In practice, deploy the SKS on a hardened server in the DMZ between OT and IT networks. Use OPC-UA client/server (with mutual certificate authentication) for key distribution, and Pub/Sub (with those distributed keys) for data delivery.

Network Segmentation

Even with encrypted Pub/Sub, follow defense-in-depth:

Isolate TSN traffic on dedicated VLANs
Use managed switches with ACLs to restrict multicast group membership
Deploy a data diode or unidirectional gateway between the TSN network and any internet-facing systems

Common Deployment Pitfalls

Pitfall 1: Multicast Flooding

TSN switches handle multicast natively, but if your path crosses a non-TSN switch (even one), multicast frames flood to all ports. This can saturate uplinks and crash unrelated systems. Verify every switch in the path supports IGMP snooping at minimum.

Pitfall 2: Clock Drift Under Load

gPTP synchronization works well at low CPU load, but when an edge gateway is processing 10,000 tags per second, the system clock can drift because gPTP packets get delayed in software queues. Use hardware timestamping (PTP-capable NICs) — software timestamping adds 10-100μs of jitter, which defeats the purpose of TSN.

Pitfall 3: DataSet Version Mismatch

When you add a variable to a publisher's DataSet, all subscribers with static configurations will misparse subsequent messages. UADP includes a DataSetWriterId and ConfigurationVersion — increment the version on every schema change and implement version checking in subscriber code.

Pitfall 4: Oversubscribing TSN Bandwidth

Each TSN stream reservation is guaranteed, but the total bandwidth allocated to scheduled traffic classes can't exceed ~75% of link capacity (the remaining 25% prevents guard-band starvation of best-effort traffic). On a 1Gbps link, that's 750Mbps for all scheduled streams combined. Do the bandwidth math before deployment, not after.

When to Use Pub/Sub vs Client/Server

Pub/Sub over TSN isn't a universal replacement for client/server. Use this decision matrix:

Scenario	Recommended Model
HMI reading 50 tags from one PLC	Client/Server
Historian collecting from 100+ PLCs	Pub/Sub
Real-time motion control (under 1ms)	Pub/Sub over TSN
Configuration and commissioning	Client/Server
Cloud telemetry pipeline	Pub/Sub over MQTT
10+ consumers need same data	Pub/Sub
Firewall traversal required	Client/Server (reverseConnect)

The Road Ahead: OPC-UA FX

The OPC Foundation's Field eXchange (FX) initiative extends Pub/Sub with controller-to-controller communication profiles — enabling PLCs from different vendors to exchange data over TSN without custom integration. FX defines standardized connection management, diagnostics, and safety communication profiles.

For manufacturers, FX means the edge gateway that today bridges between incompatible PLCs will eventually become optional for direct PLC-to-PLC communication — while remaining essential for the cloud telemetry path where platforms like machineCDN normalize data across heterogeneous equipment.

Key Takeaways

Pub/Sub eliminates the N×M session problem that limits OPC-UA client/server at scale
TSN provides deterministic delivery with bounded latency guaranteed by the network infrastructure
UADP encoding on TSN, JSON over MQTT is the hybrid pattern that works for most manufacturing deployments
Hardware timestamping is non-negotiable for sub-microsecond synchronization accuracy
Security requires a Key Server — Pub/Sub's multicast model doesn't support session-based authentication
Budget 75% of link capacity for scheduled traffic to prevent guard-band starvation

The convergence of OPC-UA Pub/Sub and TSN represents the most significant shift in industrial networking since the migration from fieldbus to Ethernet. Getting the architecture right at deployment time saves years of retrofitting — and the practical patterns in this guide reflect what actually works on production floors, not just in vendor demo labs.

Time-Sensitive Networking (TSN) for Industrial Ethernet: Why Deterministic Communication Is the Future of IIoT [2026]

March 3, 2026 · 11 min read

If you've spent any time on a factory floor, you know the fundamental tension: control traffic needs hard real-time guarantees (microsecond-level determinism), while monitoring and analytics traffic just needs "fast enough." For decades, the industry solved this by running separate networks — a PROFINET or EtherNet/IP fieldbus for control, and standard Ethernet for everything else.

Time-Sensitive Networking (TSN) eliminates that compromise. It brings deterministic, bounded-latency communication to standard IEEE 802.3 Ethernet — meaning your motion control packets and your IIoT telemetry can share the same physical wire without interfering with each other.

This isn't theoretical. TSN-capable switches are shipping from Cisco, Belden, Moxa, and Siemens. OPC-UA Pub/Sub over TSN is in production pilots. And if you're designing an IIoT architecture today, understanding TSN isn't optional — it's the foundation of where industrial networking is going.

The Problem TSN Solves

Standard Ethernet is "best effort." When you plug a switch into a network, frames are forwarded based on MAC address tables, and if two frames need the same port at the same time, one waits. That waiting — buffering, queueing, potential frame drops — is completely acceptable for web traffic. It's catastrophic for servo drives.

Consider a typical plastics manufacturing cell. An injection molding machine has:

Motion control loop running at 1ms cycle time (servo drives, hydraulic valves)
Process monitoring polling barrel temperatures every 2-5 seconds
Quality inspection sending 10MB camera images to an edge server
IIoT telemetry batching 500 tag values to MQTT every 30 seconds
MES integration exchanging production orders and counts

Before TSN, this required at minimum two separate networks — often three. The motion controller ran on a dedicated real-time fieldbus (PROFINET IRT, EtherCAT, or SERCOS III). Process monitoring lived on standard Ethernet. And the camera system had its own GigE network to avoid flooding the process network.

TSN says: one network, one wire, zero compromises.

The TSN Standards Stack

TSN isn't a single protocol — it's a family of IEEE 802.1 standards that work together. Understanding which ones matter for industrial deployments is critical.

IEEE 802.1AS: Time Synchronization

Everything in TSN starts with a shared clock. 802.1AS (generalized Precision Time Protocol, or gPTP) synchronizes all devices on the network to a common time reference with sub-microsecond accuracy.

Key differences from standard PTP (IEEE 1588):

Feature	IEEE 1588 PTP	IEEE 802.1AS gPTP
Scope	Any IP network	Layer 2 only
Best Master Clock	Complex negotiation	Simplified selection
Peer delay measurement	Optional	Mandatory
Transport	UDP (L3) or L2	L2 only
Typical accuracy	1-10 μs	< 1 μs

For plant engineers, the practical implication is this: every TSN bridge (switch) participates in time synchronization. There's no "transparent clock" mode where a switch just passes PTP packets through. Every hop actively measures its own residence time and adjusts timestamps accordingly.

This gives you a synchronized time base across the entire network — which is what makes scheduled traffic possible.

IEEE 802.1Qbv: Time-Aware Shaper (TAS)

This is the core of TSN determinism. 802.1Qbv introduces the concept of time gates on each egress port of a switch. Every port has up to 8 priority queues (matching 802.1Q priority code points), and each queue has a gate that opens and closes on a precise schedule.

The schedule repeats on a fixed cycle — say, every 1ms. During the first 100μs, only the highest-priority queue (motion control) is open. During the next 300μs, process data queues open. The remaining 600μs is available for best-effort traffic (IIoT telemetry, file transfers, web browsing).

Time Cycle (1ms example):
├── 0-100μs:  Gate 7 OPEN (motion control only)
├── 100-400μs: Gate 5-6 OPEN (process monitoring, alarms)  
├── 400-1000μs: Gates 0-4 OPEN (IIoT, MES, IT traffic)
└── Cycle repeats...

The beauty of this approach is mathematical: if a motion control frame fits within its dedicated time slot, it's physically impossible for lower-priority traffic to delay it. No amount of IIoT telemetry bursts, camera image transfers, or IT traffic can interfere.

Practical consideration: TAS schedules must be configured consistently across all switches in the path. A motion control packet traversing 5 switches needs all 5 to have synchronized, compatible gate schedules. This is where centralized network configuration (via 802.1Qcc) becomes essential.

IEEE 802.1Qbu/802.3br: Frame Preemption

Even with scheduled gates, there's a problem: what if a low-priority frame is already being transmitted when the high-priority gate opens? On a 100Mbps link, a maximum-size Ethernet frame (1518 bytes) takes ~120μs to transmit. That's an unacceptable delay for a 1ms control loop.

Frame preemption solves this. It allows a switch to pause ("preempt") a low-priority frame mid-transmission, send the high-priority frame, then resume the preempted frame from where it left off.

The preempted frame is split into fragments, each with its own CRC for integrity checking. The receiving end reassembles them transparently. From the application's perspective, no frames are lost — the low-priority frame just arrives a bit later.

Why this matters in practice: Without preemption, you'd need to reserve guard bands — empty time slots before each high-priority window to ensure no large frame is in flight. Guard bands waste bandwidth. On a 100Mbps link with 1ms cycles, a 120μs guard band wastes 12% of available bandwidth. Preemption eliminates that waste entirely.

IEEE 802.1Qcc: Stream Reservation and Configuration

In a real plant, you don't manually configure gate schedules on every switch. 802.1Qcc defines a Centralized Network Configuration (CNC) model where a controller:

Discovers the network topology
Receives stream requirements from talkers (e.g., "I need to send 64 bytes every 1ms with max 50μs latency")
Computes gate schedules across all switches in the path
Programs the schedules into each switch

This is conceptually similar to how SDN (Software Defined Networking) works in data centers, adapted for the specific needs of industrial real-time traffic.

Current reality: CNC tooling is still maturing. As of early 2026, most TSN deployments use vendor-specific configuration tools (Siemens TIA Portal for PROFINET over TSN, Rockwell's Studio 5000 for EtherNet/IP over TSN). Full, vendor-agnostic CNC is coming but isn't plug-and-play yet.

IEEE 802.1CB: Frame Replication and Elimination

For safety-critical applications (emergency stops, protective relay controls), TSN supports seamless redundancy through 802.1CB. A talker sends duplicate frames along two independent paths through the network. Each receiving bridge eliminates the duplicate, passing only one copy to the application.

If one path fails, the other delivers the frame with zero switchover time. There's no spanning tree reconvergence, no RSTP timeout — the redundant frame was already there.

This gives you "zero recovery time" redundancy that's comparable to PRP (Parallel Redundancy Protocol) or HSR (High-availability Seamless Redundancy), but integrated into the TSN framework.

TSN vs. Existing Industrial Protocols

PROFINET IRT

PROFINET IRT (Isochronous Real-Time) achieves similar determinism to TSN, but it does so with proprietary hardware. IRT requires special ASICs in every switch and end device. Standard Ethernet switches don't work.

TSN-based PROFINET ("PROFINET over TSN") is Siemens' path forward. It preserves the PROFINET application layer while moving the real-time mechanism to TSN. The payoff: you can mix PROFINET devices with OPC-UA publishers, MQTT clients, and standard IT equipment on the same network.

EtherCAT

EtherCAT achieves extraordinary performance (sub-microsecond synchronization) by processing Ethernet frames "on the fly" — each slave modifies the frame as it passes through. This requires daisy-chain topology and dedicated EtherCAT hardware.

TSN can't match EtherCAT's raw performance in a daisy chain. But TSN supports standard star topologies with off-the-shelf switches, which is far more practical for plant-wide networks. The trend: EtherCAT for servo-level control within a machine, TSN for the plant-level network connecting machines.

CC-Link IE TSN

Mitsubishi's CC-Link IE TSN was one of the first industrial protocols to adopt TSN natively. It demonstrates the model: keep the application-layer protocol (CC-Link IE Field), replace the real-time Ethernet mechanism with standard TSN. This lets CC-Link IE coexist with other TSN traffic on the same network.

Practical Architecture: TSN in a Manufacturing Plant

Here's how a TSN-based IIoT architecture looks in practice:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Servo Drives │     │ PLC / Motion│     │ Edge Gateway │
│ (TSN NIC)    │────│ Controller   │────│ (machineCDN)  │
└─────────────┘     └─────────────┘     └──────┬───────┘
                           │                    │
                    ┌──────┴───────┐            │
                    │ TSN Switch   │            │
                    │ (802.1Qbv)   │────────────┘
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
       ┌──────┴──┐   ┌────┴────┐  ┌────┴─────┐
       │ HMI /   │   │ Vision  │  │ IT/Cloud │
       │ SCADA   │   │ System  │  │ Traffic  │
       └─────────┘   └─────────┘  └──────────┘

The TSN switch runs 802.1Qbv with a gate schedule that guarantees:

Priority 7: Motion control frames — guaranteed 100μs slots at 1ms intervals
Priority 5-6: Process monitoring, alarms — 300μs slots
Priority 3-4: MES, HMI, SCADA — allocated bandwidth in best-effort window
Priority 0-2: IIoT telemetry, file transfers — fills remaining bandwidth

The edge gateway collecting IIoT telemetry operates in the best-effort tier. It polls PLC tags over EtherNet/IP or Modbus TCP, batches the data, and publishes to MQTT — all without any risk of interfering with the control loops sharing the same wire.

Platforms like machineCDN that bridge industrial protocols to cloud already handle the data collection side — Modbus register grouping, EtherNet/IP tag reads, change-of-value filtering. TSN just means that data collection traffic coexists safely with control traffic, eliminating the need for separate networks.

Performance Benchmarks

Real-world TSN deployments show consistent results:

Metric	Typical Performance
Time sync accuracy	200-800 ns across 10 hops
Minimum guaranteed cycle	31.25 μs (with preemption)
Maximum jitter (scheduled traffic)	< 1 μs
Maximum hops for < 10μs latency	5-7 (at 1Gbps)
Bandwidth efficiency	85-95% (vs 70-80% without preemption)
Frame preemption overhead	~20 bytes per fragment (minimal)

Compare this to standard Ethernet QoS (802.1p priority queues without TAS): priority queuing gives you statistical priority, not deterministic guarantees. Under heavy load, even high-priority frames can experience hundreds of microseconds of jitter.

Common Pitfalls

1. Not All "TSN-Capable" Switches Are Equal

Some switches support 802.1AS (time sync) but not 802.1Qbv (scheduled traffic). Others support Qbv but not frame preemption. Check the specific IEEE profiles supported, not just the TSN marketing label.

The IEC/IEEE 60802 TSN Profile for Industrial Automation defines the mandatory feature set for industrial use. Look for compliance with this profile.

2. End-Device TSN Support Is Still Emerging

A TSN switch is only half the equation. For guaranteed determinism, the end device (PLC, drive, sensor) needs a TSN-capable Ethernet controller that can transmit frames at precisely scheduled times. Many current PLCs use standard Ethernet NICs — they benefit from TSN's traffic isolation but can't achieve sub-microsecond transmission timing.

3. Configuration Complexity

TSN gate schedules are powerful but complex. A misconfigured schedule can:

Create "dead time" where no queue is open (wasted bandwidth)
Allow large best-effort frames to overflow into scheduled slots
Cause frame drops if the schedule doesn't account for inter-frame gaps

Start simple: define two traffic classes (real-time and best-effort) before attempting multi-level scheduling.

4. Cabling and Distance

TSN doesn't change Ethernet's physical limitations. Standard Cat 5e/6 runs up to 100m per segment. For plant-wide TSN, you'll need fiber between buildings and proper cable management. Time synchronization accuracy degrades with asymmetric cable lengths — use equal-length cables for links between TSN bridges.

Getting Started

If you're designing a new IIoT deployment or modernizing an existing plant network:

Audit your traffic classes. Map every communication flow to a priority level. Most plants have 3-4 distinct classes: hard real-time control, soft real-time monitoring, IT/business, and bulk transfers.
Start with TSN-capable spine switches. Even if your end devices aren't TSN-ready, deploying TSN switches at the aggregation layer gives you traffic isolation today and a deterministic upgrade path for tomorrow.
Deploy IIoT data collection at the appropriate priority. Edge gateways that poll PLCs and publish to MQTT typically operate fine at priority 3-4. They don't need deterministic guarantees — they need reliable throughput. TSN ensures that throughput is available even when control traffic is present.
Plan for centralized configuration. As your TSN deployment grows beyond a single machine cell, manual switch configuration becomes untenable. Invest in network management tools that support 802.1Qcc configuration.

The Convergence Thesis

TSN's real impact isn't about making Ethernet faster — it's about eliminating the network boundaries between IT and OT.

Today, most factories have 3-5 separate network segments with firewalls, protocol converters, and data diodes between them. Each segment has its own switches, cables, management tools, and maintenance burden.

TSN collapses these into a single converged network where control traffic and IT traffic coexist with mathematical guarantees. That means:

Lower infrastructure cost (one network instead of three)
Simpler troubleshooting (one set of diagnostic tools)
Direct IIoT access to real-time data (no protocol conversion needed)
Unified security policy (one network to secure, one set of ACLs)

For plant engineers deploying IIoT platforms, TSN means the data you need is already on the same network — no bridging, no gateways, no proprietary converters. You connect your edge device, configure the right traffic priority, and start collecting data from machines that were previously on isolated control networks.

The deterministic network is coming. The question is whether your infrastructure will be ready for it.

Store-and-Forward Buffer Design for Reliable Industrial MQTT Telemetry [2026]

March 2, 2026 · 12 min read

Your edge gateway just collected 200 data points from six machines. The MQTT connection to the cloud dropped 47 seconds ago. What happens to that data?

In consumer IoT, the answer is usually "it gets dropped." In industrial IoT, that answer gets you fired. A single missed alarm delivery can mean a $50,000 chiller compressor failure. A gap in temperature logging can invalidate an entire production batch for FDA compliance.

The solution is a store-and-forward buffer — a memory structure that sits between your data collection layer and your MQTT transport, holding telemetry data during disconnections and draining it the moment connectivity returns. It sounds simple. The engineering details are anything but.

This article walks through the design of a production-grade store-and-forward buffer for resource-constrained edge gateways running on embedded Linux.

Store-and-forward buffer architecture for MQTT telemetry

Why MQTT QoS Isn't Enough

The first objection is always: "MQTT already has QoS 1 and QoS 2 — doesn't the broker handle retransmission?"

Technically yes, but only for messages that have already been handed to the MQTT client library. The problem is what happens before the publish call:

The TCP connection is down. mosquitto_publish() returns MOSQ_ERR_NO_CONN. Your data is gone unless you stored it somewhere.
The MQTT library's internal buffer is full. Most MQTT client libraries have a finite send queue. When it fills, new publishes get rejected.
The gateway rebooted. Any data in memory is lost. Only data written to persistent storage survives.

QoS handles message delivery within an established session. Store-and-forward handles data persistence across disconnections, reconnections, and reboots.

The Page-Based Buffer Architecture

A production buffer uses a paged memory pool — a contiguous block of memory divided into fixed-size pages that cycle through three states:

┌─────────────────────────────────────────────────────┐
│                  Buffer Memory Pool                  │
│                                                      │
│  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐  │
│  │Page 0│  │Page 1│  │Page 2│  │Page 3│  │Page 4│  │
│  │ FREE │  │ USED │  │ USED │  │ WORK │  │ FREE │  │
│  └──────┘  └──────┘  └──────┘  └──────┘  └──────┘  │
│                                                      │
│  FREE = empty, available for writing                 │
│  WORK = currently being filled with incoming data    │
│  USED = full, queued for delivery to MQTT broker     │
└─────────────────────────────────────────────────────┘

Page States

FREE pages form a linked list of available pages. When the buffer needs a new work page, it pulls from the free list.
WORK page is the single page currently accepting incoming data. New telemetry batches get appended here. There is always at most one work page.
USED pages form an ordered queue of pages waiting to be delivered. The buffer sends data from the head of the used queue, one message at a time.

Page Structure

Each page contains multiple messages, packed sequentially:

┌─────────────────────────────────────────────┐
│                    Page N                     │
│                                               │
│  ┌──────────┬──────────┬──────────────────┐  │
│  │ msg_id   │ msg_size │ message_data     │  │
│  │ (4 bytes)│ (4 bytes)│ (variable)       │  │
│  ├──────────┼──────────┼──────────────────┤  │
│  │ msg_id   │ msg_size │ message_data     │  │
│  │ (4 bytes)│ (4 bytes)│ (variable)       │  │
│  ├──────────┼──────────┼──────────────────┤  │
│  │          ... more messages ...          │  │
│  └──────────────────────────────────────────┘  │
│                                               │
│  write_p ──→ next write position              │
│  read_p  ──→ next read position (delivery)    │
│                                               │
└─────────────────────────────────────────────┘

The msg_id field is critical — it gets filled in by the MQTT library's publish() call, which returns a packet ID. When the broker acknowledges delivery (via the PUBACK callback in QoS 1), the buffer matches the acknowledged ID against the head of the delivery queue.

Memory Sizing

The minimum viable buffer needs at least three pages:

One page being filled (WORK)
One page being transmitted (USED, head of queue)
One page available for the next batch (FREE)

In practice, you want more headroom. The formula:

buffer_size = page_size × desired_holdover_time / batch_interval

Example:
- Page size: 32 KB
- Batch interval: 30 seconds
- Desired holdover: 10 minutes
- Pages needed: 32KB × (600s / 30s) = 20 pages = 640 KB

On a typical embedded Linux gateway with 256MB–512MB RAM, dedicating 1–4 MB to the telemetry buffer is reasonable.

The Write Path: Accepting Incoming Data

When the data collection layer finishes a polling cycle and has a batch of tag values ready to deliver, it calls into the buffer:

Step 1: Check the Work Page

If no work page exists, allocate one from the free list. If the free list is empty, steal the oldest used page — this is the overflow strategy (more on this below).

Step 2: Size Check

Before writing, verify that the message (plus its 8-byte header) fits in the remaining space on the work page:

remaining = page_size - (write_p - start_p)
needed = 4 (msg_id) + 4 (msg_size) + payload_size

if needed > remaining:
    move work_page to used_pages queue
    allocate a new work page
    retry

Step 3: Write the Message

Write 4 zero bytes at write_p    (placeholder for msg_id)
Write message size as uint32     (4 bytes)
Write message payload            (N bytes)
Advance write_p by 8 + N

The msg_id is initially zero because we don't know it yet — it gets assigned when the message is actually published to MQTT.

Step 4: Trigger Delivery

After every write, the buffer checks if it can send data. If the connection is up and no message is currently awaiting acknowledgment, it initiates delivery of the next queued message.

The Read Path: Delivering to MQTT

Delivery follows a strict one-message-at-a-time discipline. The buffer maintains a packet_sent flag:

if connected == false:  return
if packet_sent == true: return    (waiting for PUBACK)

message = used_pages[0].read_p
result = mqtt_publish(message.data, message.size, &message.msg_id)

if result == success:
    packet_sent = true
else:
    packet_sent = false           (retry on next opportunity)

Why One at a Time?

Sending multiple messages without waiting for acknowledgment is tempting — it would be faster. But it creates a delivery ordering problem. If messages 1, 2, and 3 are sent simultaneously and message 2's PUBACK arrives first, you don't know whether messages 1 and 3 were delivered. With one-at-a-time, the delivery order is guaranteed to match the insertion order.

For higher throughput, some implementations pipeline 2–3 messages and track a small window of in-flight packet IDs. But for industrial telemetry where data integrity matters more than latency, sequential delivery is the safer choice.

The Delivery Confirmation Callback

When the MQTT library's on_publish callback fires with a packet ID:

Lock the buffer mutex
Check that the packet_id matches used_pages[0].read_p.msg_id
Advance read_p past the delivered message
If read_p >= write_p:
     - Page completely delivered
     - Move page from used_pages to free_pages
     - Reset the page's write_p and read_p
Set packet_sent = false
Attempt to send the next message
Unlock mutex

This is where the msg_id field in the page pays off — it's the correlation key between "we published this" and "the broker confirmed this."

Overflow Handling: When Memory Runs Out

On a constrained device, the buffer will eventually fill up during an extended outage. The question is: what do you sacrifice?

Strategy 1: Drop Newest (Ring Buffer)

When the free list is empty, reject new writes. The data collection layer simply loses the current batch. This preserves historical data but creates gaps at the end of the outage.

Strategy 2: Drop Oldest (FIFO Eviction)

When the free list is empty, steal the oldest used page — the one at the head of the delivery queue. This preserves the most recent data but creates gaps at the beginning of the outage.

Which to Choose?

For industrial monitoring, drop-oldest is almost always correct. The reasoning:

During a long outage, the most recent data is more actionable than data from 20 minutes ago.
When connectivity returns, operators want to see current machine state, not historical state from the beginning of the outage.
Historical data from the outage period can often be reconstructed from PLC internal logs after the fact.

A production implementation logs a warning when it evicts a page:

Buffer: Overflow warning! Extracted USED page (#7)

This warning should be forwarded to the platform's monitoring layer so operators know data was lost.

Thread Safety

The buffer is accessed from two threads:

The polling thread — calls buffer_add_data() after each collection cycle
The MQTT callback thread — calls buffer_process_data_delivered() when PUBACKs arrive

A mutex protects all buffer operations:

// Pseudocode
void buffer_add_data(buffer, data, size) {
    lock(buffer->mutex)
    write_data_to_work_page(buffer, data, size)
    try_send_next_message(buffer)
    unlock(buffer->mutex)
}

void buffer_on_puback(buffer, packet_id) {
    lock(buffer->mutex)
    advance_read_pointer(buffer, packet_id)
    try_send_next_message(buffer)
    unlock(buffer->mutex)
}

The key insight: try_send_next_message() is called from both code paths. After adding data, the buffer checks if it can immediately begin delivery. After confirming delivery, it checks if there's more data to send. This creates a self-draining pipeline that doesn't need a separate timer or polling loop.

Connection State Management

The buffer tracks connectivity through two callbacks:

On Connect

buffer->connected = true
try_send_next_message(buffer)    // Start draining the queue

On Disconnect

buffer->connected = false
buffer->packet_sent = false      // Reset in-flight tracking

The packet_sent = false on disconnect is critical. If a message was in flight when the connection dropped, we have no way of knowing whether the broker received it. Setting packet_sent = false means the message will be re-sent on reconnection. This may result in duplicate delivery — which is fine. Industrial telemetry systems should be idempotent anyway (a repeated temperature reading at timestamp T is the same as the original).

Batch Finalization: When to Flush

Data arrives at the buffer through a batch layer that groups multiple tag values before serialization. The batch finalizes (and writes to the buffer) on two conditions:

1. Size Limit

When the accumulated batch exceeds a configured maximum size (e.g., 32 KB for JSON, or when the binary payload reaches 90% of the maximum), the batch is serialized and written to the buffer immediately:

if current_batch_size > max_batch_size:
    finalize_and_write_to_buffer(batch)
    reset_batch()

2. Time Limit

When the time since the batch started collecting exceeds a configured timeout (e.g., 30 seconds), the batch is finalized regardless of size:

elapsed = now - batch_start_time
if elapsed > max_batch_time:
    finalize_and_write_to_buffer(batch)
    reset_batch()

The time-based trigger is checked at the end of each tag group within a polling cycle, not on a separate timer. This avoids adding another thread and ensures the batch is finalized at a natural boundary in the data stream.

Binary vs. JSON Serialization

Production edge systems typically support two serialization formats:

JSON Format

{
  "groups": [
    {
      "ts": 1709341200,
      "device_type": 1018,
      "serial_number": 12345,
      "values": [
        {"id": 1, "values": [452]},
        {"id": 2, "values": [38]},
        {"id": 162, "error": -5}
      ]
    }
  ]
}

JSON is human-readable and easy to debug but verbose. A batch of 25 tag values in JSON might be 800 bytes.

Binary Format

0xF7              Command byte
[4B] num_groups   Number of timestamp groups
  [4B] timestamp  Unix timestamp
  [2B] dev_type   Device type ID
  [4B] serial     Device serial number
  [4B] num_values Number of values in group
    [2B] tag_id   Tag identifier
    [1B] status   0x00=OK, other=error
    [1B] count    Array size
    [1B] elem_sz  Element size (1, 2, or 4 bytes)
    [N×S bytes]   Packed values (MSB first)

The same 25 tag values in binary format might be 180 bytes — a 4.4× reduction. On cellular connections where bandwidth is metered per megabyte, this matters enormously.

The format choice is configured per device. Many deployments use binary for production and JSON for commissioning/debugging.

Monitoring the Buffer

A healthy buffer should have these characteristics:

Pages cycling regularly — pages move from FREE → WORK → USED → FREE in a steady rhythm
No overflow warnings — if you see "extracted USED page" in the logs, the buffer is undersized or the connection is too unreliable
Delivery timestamps advancing — track the timestamp of the last confirmed delivery. If it stops advancing while data is being collected, something is wrong with the MQTT connection

The edge daemon should publish buffer health as part of its periodic status message:

{
  "buffer": {
    "total_pages": 20,
    "free_pages": 14,
    "used_pages": 5,
    "work_pages": 1,
    "last_delivery_ts": 1709341200,
    "overflow_count": 0
  }
}

How machineCDN Implements Store-and-Forward

machineCDN's edge gateway implements the full page-based buffer architecture described in this article. The buffer sits between the batch serialization layer and the MQTT transport, providing:

Automatic page management — the gateway sizes the buffer based on available memory and configured batch parameters
Drop-oldest overflow — during extended outages, the most recent data is always preserved
Dual-format support — JSON for commissioning, binary for production deployments, configurable per device
Connection-aware delivery — the buffer begins draining immediately when the MQTT connection comes back up, with sequential delivery confirmation via QoS 1 PUBACKs

For multi-machine deployments on cellular gateways, the binary format combined with batch-and-forward typically reduces bandwidth consumption by 70–80% compared to per-tag JSON publishing — which translates directly to lower cellular data costs.

Key Takeaways

MQTT QoS doesn't replace store-and-forward. QoS handles delivery within a session. Store-and-forward handles persistence across disconnections.
Use a paged memory pool. Fixed-size pages with three states (FREE/WORK/USED) give you predictable memory usage and simple overflow handling.
One message at a time for delivery integrity. Sequential delivery with PUBACK confirmation guarantees ordering and makes the system easy to reason about.
Drop oldest on overflow. In industrial monitoring, recent data is more valuable than historical data from the beginning of an outage.
Finalize batches on both size and time. Size limits prevent memory bloat; time limits prevent stale data sitting in an incomplete batch.
Thread safety is non-negotiable. The polling thread and MQTT callback thread both touch the buffer. A mutex with minimal critical sections keeps things safe without impacting throughput.

The store-and-forward buffer is the unsung hero of reliable industrial telemetry. It's not glamorous, it doesn't show up in marketing slides, but it's the component that determines whether your IIoT platform loses data at 2 AM on a Saturday when the cell tower goes down — or quietly holds everything until the connection comes back and delivers it all without anyone ever knowing there was a problem.

Why Client/Server Breaks Down at Scale​

The Pub/Sub Architecture: How Data Actually Flows​

Publishers and DataSets​

Subscribers and DataSetReaders​

WriterGroups and NetworkMessages​

TSN: The Network Layer That Makes It Deterministic​

Time Synchronization (IEEE 802.1AS-2020)​

Scheduled Traffic (IEEE 802.1Qbv)​

Stream Reservation (IEEE 802.1Qcc)​

Frame Preemption (IEEE 802.1Qbu)​

Mapping OPC-UA Pub/Sub to TSN Traffic Classes​

Motion Control Data (TC7, under 1ms cycle)​

Process Data (TC6, 10-100ms cycle)​

Diagnostic and Configuration (TC0-5, best effort)​

Practical Configuration Example​

The Data Encoding Decision: UADP vs JSON​

UADP (UA DataPoints Protocol)​

JSON Encoding​

The Hybrid Pattern That Works​

Security Considerations for Pub/Sub Over TSN​

SecurityMode Options​

Key Distribution: The Hard Problem​

Network Segmentation​

Common Deployment Pitfalls​

Pitfall 1: Multicast Flooding​

Pitfall 2: Clock Drift Under Load​

Pitfall 3: DataSet Version Mismatch​

Pitfall 4: Oversubscribing TSN Bandwidth​

When to Use Pub/Sub vs Client/Server​

The Road Ahead: OPC-UA FX​

Key Takeaways​

The Problem TSN Solves​

The TSN Standards Stack​

IEEE 802.1AS: Time Synchronization​

IEEE 802.1Qbv: Time-Aware Shaper (TAS)​

IEEE 802.1Qbu/802.3br: Frame Preemption​

IEEE 802.1Qcc: Stream Reservation and Configuration​

IEEE 802.1CB: Frame Replication and Elimination​

TSN vs. Existing Industrial Protocols​

PROFINET IRT​

EtherCAT​

CC-Link IE TSN​

Practical Architecture: TSN in a Manufacturing Plant​

Performance Benchmarks​

Common Pitfalls​

1. Not All "TSN-Capable" Switches Are Equal​

2. End-Device TSN Support Is Still Emerging​

3. Configuration Complexity​

4. Cabling and Distance​

Getting Started​

The Convergence Thesis​

Why MQTT QoS Isn't Enough​

The Page-Based Buffer Architecture​

Page States​

Page Structure​

Memory Sizing​

The Write Path: Accepting Incoming Data​

Step 1: Check the Work Page​

Step 2: Size Check​

Step 3: Write the Message​

Step 4: Trigger Delivery​

The Read Path: Delivering to MQTT​

Why One at a Time?​

The Delivery Confirmation Callback​

Overflow Handling: When Memory Runs Out​

Strategy 1: Drop Newest (Ring Buffer)​

Strategy 2: Drop Oldest (FIFO Eviction)​

Which to Choose?​

Thread Safety​

Connection State Management​

On Connect​

On Disconnect​

Batch Finalization: When to Flush​

1. Size Limit​

2. Time Limit​

Binary vs. JSON Serialization​

JSON Format​

Binary Format​

Monitoring the Buffer​

How machineCDN Implements Store-and-Forward​

Why Client/Server Breaks Down at Scale

The Pub/Sub Architecture: How Data Actually Flows

Publishers and DataSets

Subscribers and DataSetReaders

WriterGroups and NetworkMessages

TSN: The Network Layer That Makes It Deterministic

Time Synchronization (IEEE 802.1AS-2020)

Scheduled Traffic (IEEE 802.1Qbv)

Stream Reservation (IEEE 802.1Qcc)

Frame Preemption (IEEE 802.1Qbu)

Mapping OPC-UA Pub/Sub to TSN Traffic Classes

Motion Control Data (TC7, under 1ms cycle)

Process Data (TC6, 10-100ms cycle)

Diagnostic and Configuration (TC0-5, best effort)

Practical Configuration Example

The Data Encoding Decision: UADP vs JSON

UADP (UA DataPoints Protocol)

JSON Encoding

The Hybrid Pattern That Works

Security Considerations for Pub/Sub Over TSN

SecurityMode Options

Key Distribution: The Hard Problem

Network Segmentation

Common Deployment Pitfalls

Pitfall 1: Multicast Flooding

Pitfall 2: Clock Drift Under Load

Pitfall 3: DataSet Version Mismatch

Pitfall 4: Oversubscribing TSN Bandwidth

When to Use Pub/Sub vs Client/Server

The Road Ahead: OPC-UA FX

Key Takeaways

The Problem TSN Solves

The TSN Standards Stack

IEEE 802.1AS: Time Synchronization

IEEE 802.1Qbv: Time-Aware Shaper (TAS)

IEEE 802.1Qbu/802.3br: Frame Preemption

IEEE 802.1Qcc: Stream Reservation and Configuration

IEEE 802.1CB: Frame Replication and Elimination

TSN vs. Existing Industrial Protocols

PROFINET IRT

EtherCAT

CC-Link IE TSN

Practical Architecture: TSN in a Manufacturing Plant

Performance Benchmarks

Common Pitfalls

1. Not All "TSN-Capable" Switches Are Equal

2. End-Device TSN Support Is Still Emerging

3. Configuration Complexity

4. Cabling and Distance

Getting Started

The Convergence Thesis

Why MQTT QoS Isn't Enough

The Page-Based Buffer Architecture

Page States

Page Structure

Memory Sizing

The Write Path: Accepting Incoming Data

Step 1: Check the Work Page

Step 2: Size Check

Step 3: Write the Message

Step 4: Trigger Delivery

The Read Path: Delivering to MQTT

Why One at a Time?

The Delivery Confirmation Callback

Overflow Handling: When Memory Runs Out

Strategy 1: Drop Newest (Ring Buffer)

Strategy 2: Drop Oldest (FIFO Eviction)

Which to Choose?

Thread Safety

Connection State Management

On Connect

On Disconnect

Batch Finalization: When to Flush

1. Size Limit

2. Time Limit

Binary vs. JSON Serialization

JSON Format

Binary Format

Monitoring the Buffer

How machineCDN Implements Store-and-Forward