Skip to main content

Binary vs JSON Payloads for Industrial MQTT Telemetry: Bandwidth, Encoding Strategies, and When Each Wins [2026]

· 14 min read

Every IIoT platform faces the same fundamental design decision for machine telemetry: do you encode data as human-readable JSON, or pack it into a compact binary format?

The answer affects bandwidth consumption, edge buffer capacity, parsing performance, debugging experience, and how well your system degrades under constrained connectivity. Despite what vendor marketing suggests, neither format universally wins. The engineering tradeoffs are real, and the right choice depends on your deployment constraints.

This article breaks down both approaches with the depth that plant engineers and IIoT architects need to make an informed decision.

The Core Problem: Representing Machine Data for Transit

An industrial edge gateway reads values from PLCs and sensors — temperatures, pressures, motor statuses, alarm words — and needs to package them for delivery to a cloud platform via MQTT. Each data point consists of:

  • A tag identifier (which sensor/register)
  • A value (the reading itself — could be bool, int16, uint32, float, etc.)
  • A status code (was the read successful?)
  • A timestamp (when was this value captured?)

Multiple data points are typically grouped into batches for efficiency. A single MQTT publish might contain 50-200 tag values from one poll cycle.

The question is how to serialize this structure for the wire.

JSON Encoding: The Readable Path

A JSON-encoded telemetry batch looks something like this:

{
"groups": [
{
"ts": 1709510400,
"device_type": 1018,
"serial_number": 25165824,
"values": [
{"id": 1, "values": [245]},
{"id": 2, "values": [189]},
{"id": 3, "values": [false]},
{"id": 4, "values": [72.5]},
{"id": 5, "error": -32}
]
}
]
}

JSON Advantages

Debuggability is king. You can pipe MQTT traffic through mosquitto_sub, see exactly what's being transmitted, and diagnose issues without a decoder ring. When a customer reports that "tag 14 shows the wrong temperature," you can subscribe to the topic, look at the payload, and immediately see the raw value. This alone saves hours during commissioning and support.

Schema evolution is free. Need to add a new field — say, a quality indicator alongside each value? Just add the key. Existing consumers that don't know about the field ignore it. Binary formats require version negotiation or reserved fields to achieve the same flexibility.

Type information is embedded. JSON distinguishes between integers (245), floats (72.5), booleans (false), and strings. The consumer doesn't need a separate tag dictionary to know how to interpret the data — the types are self-describing.

Standard tooling everywhere. Every programming language, every cloud platform, every database can parse JSON natively. There's no custom decoder to maintain, no serialization library to version-lock.

JSON Costs

Verbosity. The JSON encoding of {"id": 1, "values": [245]} is 25 bytes. The actual information content — a 16-bit tag ID and a 16-bit integer value — is 4 bytes. That's a 6.25x overhead. For a batch of 100 values, this bloat compounds to several kilobytes of redundant structural characters (braces, colons, quotes, key names).

Parsing cost. JSON parsing requires string scanning, Unicode handling, and dynamic memory allocation. On a resource-constrained edge device, deserializing a large JSON batch can take 5-10x longer than unpacking a binary buffer. This matters when the edge gateway is an ARM-based router with 64MB of RAM.

Floating-point precision. JSON represents all numbers as text. The float value 72.53125 stored as IEEE 754 takes exactly 4 bytes in binary. In JSON, it's the string 72.53125 — 9 bytes of ASCII characters that need to be parsed back into a float on the receiving end, potentially with rounding differences between strtof() implementations.

Binary Encoding: The Efficient Path

A binary-encoded telemetry batch uses a compact, predetermined structure:

[Header: 1 byte]
0xF7 = batch start marker

[Group Count: 4 bytes, big-endian uint32]
Number of time-stamped groups in this batch

For each group:
[Timestamp: 4 bytes, big-endian uint32]
Unix epoch seconds

[Device Type: 2 bytes, big-endian uint16]
Equipment identifier

[Serial Number: 4 bytes, big-endian uint32]
Unique device serial

[Value Count: 4 bytes, big-endian uint32]
Number of tag values in this group

For each value:
[Tag ID: 2 bytes, big-endian uint16]
[Status: 1 byte]
0 = success, non-zero = error code

If status == 0:
[Values Count: 1 byte]
[Value Size: 1 byte]
1 = bool/int8/uint8
2 = int16/uint16
4 = int32/uint32/float

For each value:
[Raw bytes: value_size bytes, big-endian]

Binary Size Analysis

Let's compare the same 5-value batch from above:

JSON version: ~180 bytes (including all structural characters)

Binary version:

  • Batch header: 1 + 4 = 5 bytes
  • Group header: 4 + 2 + 4 + 4 = 14 bytes
  • Tag 1 (int16, status OK): 2 + 1 + 1 + 1 + 2 = 7 bytes
  • Tag 2 (int16, status OK): 7 bytes
  • Tag 3 (bool, status OK): 2 + 1 + 1 + 1 + 1 = 6 bytes
  • Tag 4 (float, status OK): 2 + 1 + 1 + 1 + 4 = 9 bytes
  • Tag 5 (error, no value): 2 + 1 = 3 bytes
  • Total: 51 bytes

That's a 3.5x reduction for this small example. The ratio improves with larger batches because the group header cost is amortized across more values. For a batch of 100 integer values:

  • JSON: ~3,200 bytes
  • Binary: ~720 bytes

A 4.4x bandwidth savings — significant when you're pushing data over a cellular connection at $0.10/MB.

Binary Advantages

Bandwidth efficiency. On cellular-connected edge gateways (4G/LTE), bandwidth costs are real operating expenses. A gateway polling 200 tags every 60 seconds generates:

FormatPer batchPer hourPer dayPer month
JSON~6.4 KB384 KB9.2 MB276 MB
Binary~1.5 KB90 KB2.2 MB66 MB

At $0.10/MB on an industrial cellular plan, that's $27.60/month in JSON vs $6.60/month in binary — per gateway. Multiply by 50 gateways and the difference is $1,050/month.

Buffer capacity. Edge gateways use local buffers (often called store-and-forward buffers) to hold data during network outages. If your gateway has a 512KB ring buffer and the cloud connection drops:

  • JSON: Buffer holds ~80 batches = ~80 minutes of data at 1-minute poll intervals
  • Binary: Buffer holds ~340 batches = ~5.6 hours of data

Binary encoding gives you 4.25x more ride-through time before the buffer wraps and starts overwriting the oldest data. In environments where connectivity interruptions lasting 2-4 hours are common (remote sites, cellular dead zones), this can be the difference between complete data capture and significant data loss.

Deterministic parsing. Binary unpacking is a simple pointer walk — read 2 bytes as uint16, advance pointer, read 1 byte as status, advance pointer. No string scanning, no hash table lookups, no memory allocation. On embedded devices, binary deserialization is effectively free from a CPU perspective.

IEEE 754 fidelity. Float values are transmitted as their exact 4-byte IEEE 754 representation. No text-to-float conversion, no precision loss, no representation differences between platforms. The float value you read from the PLC register is bit-for-bit identical to what the cloud platform receives.

Binary Costs

Debugging requires tooling. You can't read F7 00000001 67A8E600 03FA 01800000 0001 ... in a mosquitto log and immediately understand what it means. You need a decoder — a script or tool that knows the packet structure and renders it as human-readable output. During commissioning, this friction is real.

Byte order matters. Big-endian? Little-endian? Mixed-endian (yes, some PLCs do this)? The encoder and decoder must agree exactly, and getting it wrong produces values that look almost right — a temperature reading of 25,600 instead of 100, for instance, when the byte order is swapped.

A common convention is to use big-endian (network byte order) for the telemetry payload, matching the Modbus convention. But this means the edge gateway (often running on a little-endian ARM or x86 processor) must explicitly convert values during encoding.

Schema changes require coordination. Adding a new field to a binary format means updating both the encoder (edge gateway firmware) and the decoder (cloud ingest pipeline) simultaneously. If the versions are out of sync, the entire batch becomes unparseable. Version bytes in the header help, but they add complexity that JSON handles implicitly.

Array and variable-length data. Some tags return arrays of values (e.g., a multi-zone temperature controller reporting 8 zone temperatures as a single tag). Binary encoding handles this with a count field, but it makes the structure more complex. JSON arrays are trivially flexible.

Hybrid Approaches: Getting the Best of Both

In practice, the most sophisticated IIoT platforms support both formats and select based on context:

Binary for Telemetry, JSON for Commands

Use binary encoding for the high-volume, upstream telemetry path (device → cloud). Use JSON for the low-volume, downstream command path (cloud → device). Commands — read-now requests, configuration updates, status queries — are infrequent, human-authored, and benefit from readability. Telemetry is machine-generated, high-volume, and benefits from compactness.

Format Negotiation

Let the cloud platform specify the desired format. Gateways deployed on high-bandwidth Ethernet connections can use JSON for easier debugging. Gateways on cellular connections automatically switch to binary. The same gateway firmware supports both; the cloud-side configuration determines which is active.

JSON for Development, Binary for Production

During the commissioning phase, run in JSON mode. Engineers can subscribe to MQTT topics and visually verify that tag mappings are correct, values are in expected ranges, and timestamps are accurate. Once the deployment is validated, switch to binary for production.

MQTT QoS Interactions

The choice of payload format interacts with MQTT Quality of Service levels in subtle ways.

QoS 0 (fire and forget): If a packet is lost, it's gone. With binary's smaller packets, more data fits into the TCP window, reducing the probability of packet loss under congestion. Marginal advantage to binary.

QoS 1 (at least once): The broker must acknowledge every published message. Smaller binary packets mean faster publish-ack cycles, which matters when draining a full store-and-forward buffer after a reconnection. A gateway with 300 buffered batches will reconnect and drain approximately 4x faster with binary payloads.

QoS 2 (exactly once): Rarely used in industrial telemetry due to the 4-step handshake overhead. But if required, the smaller binary payload reduces the time each message occupies the QoS 2 pipeline.

Most industrial deployments use QoS 1 — delivery is guaranteed but duplicates are possible (and handled by idempotent processing on the cloud side). The publish callback from the MQTT library confirms delivery, allowing the store-and-forward buffer to advance its read pointer and free space for new data.

Store-and-Forward Buffer Architecture

The interaction between payload format and buffer design deserves special attention, because this is where bandwidth savings translate directly into data integrity.

A typical edge buffer operates as a paged ring buffer: a fixed block of memory divided into pages of equal size. Each page holds one or more complete MQTT messages. When the network connection is active, pages are transmitted in FIFO order. When disconnected, new data is written to available pages.

The critical design parameters:

  • Page size: Determines the maximum single-message size. Must accommodate the largest possible batch.
  • Number of pages: Determined by total buffer memory ÷ page size.
  • Overflow behavior: When all pages are consumed, the oldest used page is reclaimed (overwriting the oldest data).

With binary encoding, each batch is smaller, so either:

  1. More batches fit per page (if pages are large), or
  2. More pages are available (if page size is tuned to batch size)

Either way, the buffer holds more data before overflow. For sites with intermittent connectivity, this directly translates to less data loss.

Buffer drain rate is equally important. When connectivity is restored after an outage, the gateway needs to drain buffered data while simultaneously handling new live data. Binary's smaller payloads mean each publish-ack cycle is faster, and the buffer drains sooner. With QoS 1, the gateway waits for the broker's PUBACK before sending the next buffered message — a process that's bounded by round-trip latency, not bandwidth, making smaller payloads advantageous even on fast connections.

How machineCDN Handles Payload Encoding

machineCDN's edge platform supports both binary and JSON telemetry formats, with binary as the default for production deployments. The platform automatically handles byte-order normalization, IEEE 754 float encoding, and type-aware serialization across all supported data types — from single-bit booleans to 32-bit floating-point values.

The store-and-forward buffer is engineered as a paged ring buffer with automatic drain-on-reconnect, ensuring that connectivity interruptions don't result in data loss. During normal operation, data is batched by configurable time windows and size thresholds, then published via MQTT with QoS 1 delivery guarantees.

For commissioning and debugging, the platform can be switched to JSON mode, giving field engineers full visibility into the data pipeline without requiring custom decoding tools.

Decision Framework: When to Choose Each Format

Choose Binary When:

  • Cellular connectivity: Bandwidth costs make every byte matter
  • Remote sites: Long outages require maximum buffer capacity
  • High tag counts: 200+ tags per gateway amplify the savings
  • Constrained hardware: Edge devices with limited CPU and RAM
  • Stable schema: Tag configurations don't change frequently

Choose JSON When:

  • Commissioning phase: Debugging and validation need readability
  • Ethernet connectivity: Bandwidth is effectively free
  • Small deployments: fewer than 50 tags where the absolute savings are minimal
  • Rapid prototyping: Schema is still evolving
  • Third-party integration: Consumers expect standard JSON payloads

Choose Hybrid When:

  • Multi-site deployments: Different sites have different connectivity profiles
  • Phased rollouts: Start with JSON, switch to binary after validation
  • Mixed traffic: Telemetry in binary, commands and status in JSON

Encoding Implementation Considerations

If you're building or evaluating an edge platform, here are the engineering details that separate robust implementations from fragile ones:

Byte Order Conventions

Standardize on big-endian (network byte order) for the binary payload. This matches the Modbus register convention and avoids confusion when values transit between different processor architectures. Use explicit byte-packing functions rather than casting struct pointers — the latter breaks on architectures with alignment requirements.

Batch Finalization Triggers

A batch should be finalized (sealed and queued for delivery) when either:

  1. The batch size exceeds a threshold (e.g., 4KB for binary, 8KB for JSON), or
  2. The batch age exceeds a time window (e.g., 60 seconds since first value was added)

The time trigger ensures that even slow-changing data gets delivered promptly. The size trigger prevents individual batches from growing too large for the buffer page size or MQTT maximum packet size.

CRC and Integrity

MQTT provides transport-level integrity (TCP checksums + TLS if configured). Adding an application-level CRC to the binary payload is unnecessary overhead in most deployments. However, if you're using QoS 0 over an unreliable transport, a CRC-16 or CRC-32 appended to the binary payload provides an integrity check.

Null/Error Handling

Tags that fail to read (communication error, device offline) should still appear in the batch with an error status code and no value field. This is straightforward in both formats:

  • JSON: {"id": 5, "error": -32} — absence of values key signals error
  • Binary: Status byte is non-zero, no value bytes follow

The consumer must handle missing values gracefully — typically by preserving the last-known-good value and flagging the tag as stale.

Conclusion

The binary vs JSON debate in industrial telemetry isn't about which format is "better" — it's about which constraints dominate your deployment. Binary wins on bandwidth, buffer capacity, and parsing efficiency. JSON wins on debuggability, schema flexibility, and tooling ecosystem.

The best IIoT platforms don't force a choice. They support both formats, default to the one that best serves the deployment's primary constraint, and make switching between them a configuration change rather than a firmware update.

For most production industrial deployments — especially those involving cellular connectivity or remote sites — binary encoding pays for itself quickly. For development, commissioning, and small-scale deployments, JSON's transparency is worth the bandwidth cost.

Understand the tradeoffs, choose deliberately, and don't let payload encoding become an afterthought. It's one of the most impactful architectural decisions in your telemetry pipeline.