Skip to main content

Sparkplug B Specification Deep Dive: Birth Certificates, Death Certificates, and Why Your IIoT MQTT Deployment Needs It [2026]

· 14 min read
MachineCDN Team
Industrial IoT Experts

MQTT is the de facto transport layer for industrial IoT. Every edge gateway, every cloud platform, and every IIoT architecture diagram draws that same line: device → MQTT broker → cloud. But here's the uncomfortable truth that anyone who's deployed MQTT in a real factory knows: raw MQTT tells you nothing about the data inside those payloads.

MQTT is a transport protocol. It delivers bytes. It doesn't define what a "temperature reading" looks like, how to discover which devices are online, or what happens when a device reboots at 3 AM. That's where Sparkplug B comes in — and understanding it deeply is the difference between a demo and a production deployment.

The Problem Sparkplug B Actually Solves

Let's be specific about what goes wrong without Sparkplug B.

The Wild West of MQTT Payloads

In a typical "vanilla MQTT" IIoT deployment, every device team invents their own payload format. One device publishes JSON like {"temp": 72.3, "unit": "F"}. Another publishes {"temperature_celsius": 22.4}. A third publishes a binary blob where bytes 4–7 contain an IEEE 754 float.

Now multiply this by 200 machines across 4 plants. Your cloud ingestion pipeline becomes a payload interpretation nightmare — custom parsers for every device type, no standard way to discover what's connected, and no reliable mechanism to know if a device is actually offline or just hasn't published in a while.

The State Awareness Gap

MQTT's Last Will and Testament (LWT) feature is supposed to handle device status. In theory, you set a will message when connecting, and the broker publishes it if the device disconnects ungracefully. In practice, LWT is insufficient for manufacturing because:

  • It only fires on ungraceful disconnects (TCP timeout, not clean disconnect)
  • There's no standard payload format for the "I'm alive" or "I'm dead" message
  • You can't distinguish between "device rebooted and is coming back" versus "device lost network permanently"
  • There's no mechanism to tell a newly-connected subscriber about the current state of all devices

Sparkplug B addresses every one of these gaps with a formal specification.

The Sparkplug B Topic Namespace

The foundation of Sparkplug B is a rigid, standardized topic namespace. Unlike vanilla MQTT where teams invent arbitrary topic hierarchies, Sparkplug B mandates a specific structure:

spBv1.0/{group_id}/{message_type}/{edge_node_id}/{device_id}

Let's break this down for a real manufacturing scenario:

ComponentPurposeExample
spBv1.0Namespace identifier — tells any subscriber this is a Sparkplug B payloadAlways spBv1.0
group_idLogical grouping (plant, line, area)plant-chicago, line-3
message_typeOne of: NBIRTH, NDEATH, DBIRTH, DDEATH, NDATA, DDATA, NCMD, DCMD, STATEDetermines payload semantics
edge_node_idThe edge gateway or smart devicegateway-01, cell-controller-7
device_idIndividual machine (optional, under a node)chiller-gp-1017, blender-bd-3

Message Types Explained

This is where most engineers first encounter Sparkplug B's elegance:

NBIRTH (Node Birth Certificate): Published when an edge node connects to the broker. Contains the complete metric definition for that node — every tag name, data type, and current value. Any application that subscribes to this topic immediately knows the full schema of the device.

NDEATH (Node Death Certificate): Published by the broker (via MQTT's LWT mechanism) when an edge node disconnects. The critical difference from raw LWT: Sparkplug B defines a sequence number in the death certificate that correlates with the birth certificate, allowing consumers to invalidate cached state.

DBIRTH / DDEATH (Device Birth / Death): Same pattern, but for individual devices behind an edge node. An edge gateway polling 12 PLCs via Modbus will issue one NBIRTH for itself, then 12 DBIRTH messages — one per PLC.

NDATA / DDATA (Node Data / Device Data): Telemetry payloads. Here's the key insight: NDATA/DDATA messages contain only changed values, not the full metric set. The birth certificate establishes the schema; data messages send deltas.

NCMD / DCMD (Node Command / Device Command): Commands from the host application back to devices (write setpoints, trigger actions). The command path enables closed-loop control.

STATE: A special message published by the primary host application to indicate whether it's online or offline. Edge nodes monitor this to know if their data is actually being consumed.

Why This Topic Structure Matters in Practice

Consider what happens when your SCADA/historian application restarts:

  1. Application connects to broker
  2. Subscribes to spBv1.0/#
  3. Receives retained NBIRTH/DBIRTH messages for all currently-connected nodes and devices
  4. Immediately has the full metric schema and last-known values for every device in the plant
  5. Subsequent NDATA/DDATA messages only contain changed values — bandwidth stays low

Without Sparkplug B, your application would connect, subscribe, and then... wait. Hope that devices publish something. Manually query each one. Parse whatever arbitrary JSON they send. It's the difference between a 2-second cold start and a 20-minute "let me figure out what's connected" phase.

Protocol Buffer Payloads: Beyond JSON

Sparkplug B mandates Google Protocol Buffers (protobuf) for payload encoding. This is a deliberate choice over JSON, and it matters enormously for industrial deployments.

The Size Problem

In production manufacturing, an edge gateway might poll 50–100 tags per PLC across 10 devices, publishing every 1–60 seconds. Let's look at what a typical batch looks like in JSON versus protobuf:

JSON payload (12 metrics):

{
"timestamp": 1709337600000,
"metrics": [
{"name": "delivery_temp", "type": "float", "value": 185.3},
{"name": "mold_temp", "type": "float", "value": 162.7},
{"name": "return_temp", "type": "float", "value": 148.2},
{"name": "flow_value", "type": "float", "value": 4.2},
{"name": "setpoint_1", "type": "float", "value": 185.0},
{"name": "pump_status", "type": "boolean", "value": true},
{"name": "heater_status", "type": "boolean", "value": true},
{"name": "vent_status", "type": "boolean", "value": false},
{"name": "pid_output_pct", "type": "float", "value": 72.4},
{"name": "heater_output_pct", "type": "float", "value": 85.1},
{"name": "cooling_output_pct", "type": "float", "value": 12.3},
{"name": "proportional_pct", "type": "float", "value": 45.6}
]
}

This is approximately 620 bytes. Readable, sure. But over a cellular connection at 60-second intervals across 10 devices, that's 6.2 KB per cycle, 372 KB per hour, 267 MB per month — for one gateway.

Sparkplug B protobuf equivalent: ~95 bytes. That's an 85% reduction. Over a cellular data plan, this is the difference between viability and absurdity.

How Sparkplug B Protobuf Works

The Sparkplug B payload is defined by a .proto schema. The key structure:

message Payload {
uint64 timestamp = 1;
repeated Metric metrics = 2;
uint64 seq = 3;
bytes body = 5;
}

message Metric {
string name = 1;
uint64 alias = 2;
uint64 timestamp = 3;
uint32 datatype = 4;
oneof value {
uint32 int_value = 7;
uint64 long_value = 8;
float float_value = 9;
double double_value = 10;
bool boolean_value = 11;
string string_value = 12;
bytes bytes_value = 13;
}
}

The alias field is where the real bandwidth savings come from. In the NBIRTH message, each metric is sent with its full name and assigned a numeric alias. In subsequent NDATA messages, only the alias is sent — a 2-byte integer instead of a 20-character string.

For a gateway polling a TCU with tags like "Delivery Temp", "Mold Temp", "Return Temp", "Flow Value", "Setpoint 1" — the birth certificate maps each to an alias (1, 2, 3, 4, 5...), and all subsequent data messages reference those aliases. On constrained networks — RS-232 serial links to cellular routers — this is not a nice-to-have. It's a requirement.

Metric Types and Industrial Data

Sparkplug B defines 22 metric data types, covering every data format you encounter on a real factory floor:

Datatype IDTypeIndustrial Use Case
1Int8Digital I/O status, Modbus coil values
2Int16Analog sensor readings (temperature, pressure)
3Int32Counters, production totals, timers
5UInt8Alarm word bytes, error codes
6UInt16Raw Modbus holding registers
7UInt32Serial numbers, accumulated energy counters
9FloatPID outputs, temperature setpoints, flow rates
10DoubleHigh-precision measurements
11BooleanPump running, heater on, alarm active
12StringDevice firmware version, status messages
13DateTimeLast maintenance timestamp

This matters because different PLC protocols expose data in wildly different type systems. A Modbus RTU device might give you uint16 register pairs that need to be combined into IEEE 754 floats. An EtherNet/IP controller speaks in Allen-Bradley tag types (SINT, INT, DINT, REAL). Sparkplug B's type system is the normalization layer — the edge gateway converts from native PLC types to Sparkplug metric types, and every consuming application speaks the same language.

Sequence Numbers and State Coherence

One of Sparkplug B's most underappreciated features is the sequence number (seq) in every payload. Here's why it exists:

In a distributed system with MQTT, messages can arrive out of order (especially with QoS 0), be duplicated (QoS 1 "at least once"), or be lost during broker failover. The seq field is a monotonically increasing counter (0–255, wrapping) that lets consumers detect:

  1. Missed messages: If you receive seq 5 followed by seq 8, you know 6 and 7 were lost.
  2. Out-of-order delivery: If seq 5 arrives after seq 7, you can reorder or discard.
  3. Stale birth certificates: If a node dies and is reborn with a new birth certificate, the seq resets, and consumers know to invalidate all cached metric aliases.

In factory environments where edge gateways run on constrained hardware — think embedded Linux on ARM SoCs with 256 MB RAM, connected over cellular with 200ms+ latency — sequence gaps happen regularly. Without seq, your data pipeline silently serves stale or incomplete data. With it, you can build proper state reconciliation.

The Birth/Death Lifecycle in Detail

Here's the exact sequence of events when an edge gateway boots up in a Sparkplug B deployment:

1. Edge node connects to MQTT broker
→ Registers LWT: topic = spBv1.0/plant-1/NDEATH/gw-01
payload = {timestamp, bdSeq: 0}

2. Edge node publishes NBIRTH
→ topic: spBv1.0/plant-1/NBIRTH/gw-01
→ payload: Full metric set with aliases, seq=0, bdSeq=0
→ retained: true

3. Edge node discovers connected PLCs (auto-detection)
→ Probes EtherNet/IP first (tag reads for device_type)
→ Falls back to Modbus TCP (input register reads)
→ Identifies device type and serial number

4. For each PLC: publishes DBIRTH
→ topic: spBv1.0/plant-1/DBIRTH/gw-01/chiller-1017-A3F2
→ payload: All tags for this device, types, current values

5. Periodic DDATA messages
→ topic: spBv1.0/plant-1/DDATA/gw-01/chiller-1017-A3F2
→ payload: Only changed values, using aliases, seq incrementing

6. If network drops:
→ Broker publishes NDEATH (from LWT)
→ bdSeq=0 matches the birth certificate
→ All subscribers invalidate cached state for gw-01 and all its devices

7. When connection restores:
→ Step 1-4 repeat with bdSeq=1
→ Subscribers detect new birth cycle, rebuild state

The bdSeq (birth-death sequence) is separate from the data seq and specifically links a death certificate to its corresponding birth certificate. This handles the case where a node dies and reconnects so fast that the NDEATH and new NBIRTH arrive at a subscriber nearly simultaneously.

Sparkplug B vs. Custom MQTT Payloads: When to Use Each

Not every IIoT deployment needs Sparkplug B. Here's a practical decision matrix:

Use Sparkplug B when:

  • Multiple consuming applications need the same data (historian + SCADA + analytics)
  • Devices are heterogeneous (different PLCs, different protocols)
  • You need device state awareness (what's online, what's offline)
  • More than 50 devices across your deployment
  • Interoperability with third-party platforms matters

Custom MQTT payloads may suffice when:

  • Single consuming application with a fixed device fleet
  • Extreme bandwidth constraints where even protobuf overhead matters (you're already at less than 100 bytes per message)
  • Legacy integration where the consuming system can't parse Sparkplug payloads
  • Proof-of-concept deployments where you'll refactor later

In our experience building edge gateways that bridge Modbus RTU, Modbus TCP, and EtherNet/IP to cloud, the Sparkplug B payload structure aligns naturally with how production data actually flows. A gateway that auto-detects PLCs, reads their tag configurations from JSON definitions, and publishes telemetry in batches is essentially implementing the Sparkplug B lifecycle — birth certificates when devices are discovered, death certificates when link state drops, data messages with only changed values. Whether you adopt the formal specification or implement the same patterns organically, the architecture is sound.

Implementing Sparkplug B at the Edge

For edge gateways running on resource-constrained hardware, here are the practical considerations:

Memory Management

Sparkplug B birth certificates for a device with 100 tags can be 4–8 KB. If your gateway serves 10 devices, you need 40–80 KB just for birth certificate state. On embedded Linux devices with limited RAM, this means:

  • Pre-allocate metric arrays at startup based on PLC configuration
  • Use fixed-size metric name buffers (32–64 characters is sufficient for most industrial tags)
  • Store aliases as a simple array indexed by tag ID — no hash maps needed when IDs are sequential

Batching Strategy

Sparkplug B allows multiple metrics in a single DDATA message. The optimal batch strategy depends on your network:

  • Wired Ethernet: Small batches, high frequency (1–5 metrics, every 1–5 seconds)
  • Wi-Fi: Medium batches (10–20 metrics, every 5–15 seconds)
  • Cellular (4G/LTE): Large batches (50–100 metrics, every 30–60 seconds)
  • Cellular (2G/3G): Maximum batching (all metrics, every 60–120 seconds, binary mode essential)

The batch size and timeout should be configurable — hard-coding these for "the factory" ignores the reality that connectivity varies by plant, by building, and even by machine location.

QoS Level Selection

Sparkplug B recommends QoS 1 for birth/death certificates (guaranteed delivery) and allows QoS 0 for data messages (best-effort). In practice:

  • NBIRTH/NDEATH/DBIRTH/DDEATH: Always QoS 1 with retain=true. Missing a birth certificate breaks state coherence.
  • NDATA/DDATA: QoS 0 for high-frequency telemetry (losing one reading in 60 is acceptable). QoS 1 for alarm-class data where every state change matters.
  • NCMD/DCMD: QoS 1 minimum. If you're writing a setpoint to a PLC, you need delivery confirmation.

Common Pitfalls

1. Not Handling the REBIRTH Command

Host applications can send an NCMD with a "rebirth" metric set to true, requesting an edge node to re-publish its birth certificates. If your edge gateway doesn't handle this, your host application loses the ability to recover state after a restart without bouncing every gateway.

2. Ignoring the STATE Topic

The primary host application should publish a retained STATE message indicating whether it's ONLINE or OFFLINE. Edge nodes should subscribe to this and can choose to buffer data locally when the host is offline. Most implementations skip this, leading to edge gateways pumping data into a broker that nobody is consuming — wasting bandwidth and broker resources.

3. Alias Conflicts After Reconnection

If your edge node reconnects but re-assigns different aliases to metrics (because it re-discovers devices in a different order), consumers holding cached aliases from the previous birth certificate will misinterpret data. The solution: always assign aliases deterministically (e.g., by tag ID from device configuration, not discovery order).

4. Oversized Birth Certificates

Including every possible metric in the birth certificate — even ones that rarely change (firmware version, serial number) — inflates the retained message. These "metadata" metrics should still be in the NBIRTH but can be excluded from regular DDATA publishes using the change-detection pattern.

The machineCDN Approach

Platforms like machineCDN handle Sparkplug B's complexity at the infrastructure layer. When an edge gateway auto-detects a PLC — whether it's speaking EtherNet/IP with Allen-Bradley tag paths or Modbus TCP with register addressing — the platform maps native device tags to a normalized metric schema. Birth certificates are generated from device configuration files that define tag names, data types, polling intervals, and change-detection rules. Death certificates fire when link state monitoring detects a communication failure.

The result is that plant engineers don't write protocol adapters or payload serialization code. They configure which PLCs to connect, what tags to monitor, and how often to poll. The Sparkplug B lifecycle — birth, data, death, rebirth — happens automatically.

Conclusion

Sparkplug B isn't a radical reinvention. It's a formalization of patterns that every production IIoT deployment eventually reinvents: typed metrics, device state awareness, efficient binary encoding, and state recovery after failures. The question isn't whether you need these patterns — you do, once you're past 10 devices — but whether you build them from scratch or adopt the specification.

For manufacturing engineers evaluating IIoT platforms, ask this: "What happens when my SCADA application restarts? How long until it knows which devices are online and has their current values?" If the answer involves waiting for the next polling cycle or manually querying each device, you're still in the vanilla MQTT era. Sparkplug B solves this in seconds.