MQTT Topic Architecture for Multi-Site Manufacturing: Designing Scalable Namespaces That Don't Collapse at 10,000 Devices [2026]
Every MQTT tutorial starts the same way: sensor/temperature. Clean, simple, obvious. Then you ship to production and discover that topic architecture is to MQTT what database schema is to SQL — get it wrong early and you'll spend the next two years paying for it.
Manufacturing environments are particularly brutal to bad topic design. A single plant might have 200 machines, each with 30–100 tags, across 8 production lines, reporting to 4 different consuming systems (historian, SCADA, analytics, alerting). Multiply by 5 plants across 3 countries, and your MQTT broker is routing messages across a topic tree with 50,000+ leaf nodes. The topic hierarchy you chose in month one determines whether this scales gracefully or becomes an operational nightmare.
Why Topic Architecture Matters More Than You Think
MQTT brokers maintain a topic tree — an in-memory trie structure where each topic level is a node. When a subscriber uses wildcards (+ for single-level, # for multi-level), the broker traverses this tree to match subscriptions to incoming publishes. The shape of your topic tree directly impacts:
- Broker memory usage: More unique topics = more nodes in the trie = more RAM
- Subscription matching speed: Deep trees with many branches slow down wildcard matching
- ACL complexity: Your security rules reference topic patterns — messy hierarchies create ungovernable ACLs
- Consumer flexibility: Can a historian subscribe to "all temperature readings across all plants" with one subscription? Or does it need 200 separate subscriptions?
The $0 Decision That Costs $100K
I've seen manufacturing companies rebuild their entire MQTT infrastructure after 18 months because the original topic design couldn't support:
- Adding a new plant without reconfiguring every subscriber
- Isolating one customer's data from another in a multi-tenant deployment
- Routing alarm data to a different pipeline than telemetry data
- Adding device metadata (firmware versions, serial numbers) without polluting the telemetry stream
These aren't exotic requirements. They're inevitable realities of any manufacturing IIoT deployment that survives past pilot stage.
The Anatomy of a Production Topic Hierarchy
After building data acquisition systems that serve dozens of manufacturing sites, here's the topic hierarchy pattern that scales:
{namespace}/{site}/{area}/{line}/{device_type}/{device_id}/{data_class}/{metric}
Let's populate this with a real example:
mcdn/plant-chicago/molding/line-3/chiller/GP-1017-A3F2/telemetry/discharge_pressure
mcdn/plant-chicago/molding/line-3/chiller/GP-1017-A3F2/telemetry/suction_pressure
mcdn/plant-chicago/molding/line-3/chiller/GP-1017-A3F2/alarms/high_temp_fault
mcdn/plant-chicago/molding/line-3/chiller/GP-1017-A3F2/status/link_state
mcdn/plant-chicago/molding/line-3/blender/BD-2019-B7C1/telemetry/motor_speed
mcdn/plant-chicago/molding/line-3/tcu/TT-5000-D4E8/telemetry/delivery_temp
Level-by-Level Design Rationale
Level 1: Namespace (mcdn)
A static prefix that identifies the system. Prevents collisions with other MQTT applications sharing the same broker. In multi-tenant deployments, this might be the customer ID: customer-42/plant-chicago/...
Level 2: Site (plant-chicago)
Physical location. Using human-readable names over coded identifiers (e.g., P01) pays off when you're debugging at 2 AM and need to know which plant is generating anomalous data. Recommendation: use kebab-case, include geographic identifier.
Level 3: Area (molding)
The functional area within a plant. Manufacturing facilities are typically organized into distinct areas — molding, extrusion, packaging, warehouse. This level enables area-level subscriptions: "give me everything from the molding area."
Level 4: Line (line-3)
Production line within the area. This is the smallest organizational unit that a plant manager typically cares about. Line-level dashboards are the bread and butter of manufacturing analytics.
Level 5: Device Type (chiller)
The class of equipment. This is critical for analytics: when you want to compare all chillers across all plants, you subscribe to mcdn/+/+/+/chiller/#. Without this level, you'd need to maintain a registry mapping device IDs to types — fragile and slow.
Level 6: Device ID (GP-1017-A3F2)
Unique device identifier. Best practice: include the device type code and serial number in the ID. GP-1017 tells you it's a GP Portable Chiller, type code 1017. A3F2 is derived from the serial number. This makes the topic self-documenting — you can identify the exact machine from the topic alone, without any lookup.
Level 7: Data Class (telemetry, alarms, status, config)
This separates the stream type, which is the most impactful design decision in the whole hierarchy. More on this below.
Level 8: Metric (discharge_pressure)
The individual data point. Optional — many implementations batch multiple metrics into the payload of the data class topic instead of creating one topic per metric. Trade-off analysis follows.
The Data Class Level: Why It Changes Everything
Separating telemetry, alarms, status, and config at the topic level enables fundamentally different handling for each stream:
Telemetry
High-frequency, high-volume, tolerates loss. Published at QoS 0 or 1. Consumed by historians and analytics engines. Typical volume: 80% of all messages.
.../chiller/GP-1017-A3F2/telemetry → QoS 0, no retain, 1-60s interval
Alarms
Low-frequency, high-criticality, zero-loss tolerance. Published at QoS 1 with retain. Consumed by alerting systems, SMS/email gateways, and SCADA alarm panels.
.../chiller/GP-1017-A3F2/alarms → QoS 1, retained, event-driven
In production PLC data acquisition, the distinction between telemetry and alarms maps directly to how values are polled. Status registers representing alarm conditions (compressor fault, high discharge pressure, pump overload) are polled at 1-second intervals with change-of-value detection — if the value hasn't changed, nothing is published. Temperature readings and analog values are polled at 60-second intervals and always published. This polling strategy should be reflected in the topic structure: alarm-class data gets its own topic because it requires different QoS, different retention, and different downstream handling.
Status
Device state information: link state (is the PLC responding?), connection uptime, firmware version. Published at QoS 1 with retain. Low volume.
.../chiller/GP-1017-A3F2/status → QoS 1, retained, on change
When an edge gateway detects that a PLC has stopped responding — perhaps the Modbus TCP connection timed out after 3 retry attempts, or the EtherNet/IP tag read returned an error code — it publishes a link state change to the status topic. Any dashboard or monitoring system subscribed to .../+/+/+/+/+/status/# immediately knows which devices are offline.
Config
Device configuration: polling intervals, tag definitions, setpoints that can be written. Published at QoS 1 with retain. Bidirectional (commands down, current config up).
.../chiller/GP-1017-A3F2/config → QoS 1, retained, bidirectional
Configuration updates are how you change a device's behavior without physically touching the gateway. Change the polling interval for a temperature tag from 60 seconds to 5 seconds? Publish to the config topic. The edge gateway subscribes to its own config topic and applies changes dynamically — no restart required.
Per-Metric Topics vs. Batched Payloads
This is the most debated design decision in MQTT topic architecture. Let's cut through the theory with production numbers.
Option A: One Topic Per Metric
.../chiller/GP-1017-A3F2/telemetry/discharge_pressure → 145.2
.../chiller/GP-1017-A3F2/telemetry/suction_pressure → 62.8
.../chiller/GP-1017-A3F2/telemetry/chiller_out_temp → 45.3
.../chiller/GP-1017-A3F2/telemetry/chiller_in_temp → 52.1
Pros:
- Consumers subscribe only to metrics they care about
- Broker can route individual metrics to specific consumers
- Natural for event-driven data (only publish when value changes)
Cons:
- 100 tags × 10 devices × 5 plants = 5,000 unique telemetry topics
- Each publish has MQTT overhead (fixed header, topic string, packet ID for QoS 1): ~30–60 bytes per message
- 100 individual publishes vs. 1 batch = 100× the MQTT framing overhead
Option B: Batched Payload Per Device
.../chiller/GP-1017-A3F2/telemetry → {ts: 1709337600, values: [{id: 82, v: 145.2}, {id: 83, v: 62.8}, ...]}
Pros:
- Dramatically fewer MQTT messages (1 per device per cycle instead of 100)
- Lower broker CPU (fewer publishes to match against subscriptions)
- More efficient for store-and-forward: one buffered message contains a complete device snapshot
Cons:
- Consumers must parse the payload to extract individual metrics
- All-or-nothing: you can't subscribe to just "discharge_pressure" at the topic level
The Production Answer
Use batched payloads for telemetry, per-metric topics for alarms.
Telemetry data is consumed in aggregate — dashboards show all metrics for a device, historians store complete snapshots, analytics engines process device-level time series. Batching is natural and efficient.
Alarm data is consumed selectively — the alerting system cares about "any alarm, any device, any plant," not "the complete alarm register for one device." Per-metric (or per-alarm) topics let you build wildcard subscriptions like mcdn/+/+/+/+/+/alarms/+ that fire on any alarm state change across the entire deployment.
In systems where edge gateways poll PLC alarm registers and extract individual alarm bits using shift-and-mask operations, each extracted alarm bit naturally becomes its own topic publication — it's already a distinct data point by the time it leaves the edge gateway.
Wildcard Subscription Patterns
Here's where good topic architecture pays dividends. These are the subscription patterns that different consumers need:
| Consumer | Subscription | What It Gets |
|---|---|---|
| Plant dashboard | mcdn/plant-chicago/# | Everything from one plant |
| Line display | mcdn/plant-chicago/molding/line-3/# | One production line |
| Chiller fleet analytics | mcdn/+/+/+/chiller/+/telemetry | All chiller telemetry, all plants |
| Global alarm system | mcdn/+/+/+/+/+/alarms/# | Every alarm, everywhere |
| Device detail page | mcdn/plant-chicago/molding/line-3/chiller/GP-1017-A3F2/# | One device, all streams |
| OT security monitor | mcdn/+/+/+/+/+/status/link_state | All connectivity changes |
| Config management | mcdn/+/+/+/+/+/config/# | All configuration data |
Every one of these works with a single subscription. No application code to filter. No client-side routing. The broker does the work, and it does it efficiently because the topic tree is organized around how data is actually consumed.
Multi-Tenant Isolation
For IIoT platform providers serving multiple manufacturing customers, the topic hierarchy must support isolation:
{tenant_id}/{site}/{area}/{line}/{device_type}/{device_id}/{data_class}/{metric}
MQTT broker ACLs then become straightforward:
# Customer A can only access their topics
user customer-a: allow subscribe tenant-a/#
user customer-a: deny subscribe tenant-b/#
# Platform analytics can access all tenants (read-only)
user analytics-engine: allow subscribe +/+/+/+/+/+/telemetry
The tenant ID at level 1 is essential. Putting it deeper in the hierarchy (e.g., as a payload field) means every consumer must parse payloads to enforce isolation — a security anti-pattern.
Per-Tenant Broker vs. Shared Broker
- Fewer than 10 tenants: Shared broker with ACL isolation. Simpler operations.
- 10–100 tenants: Shared broker with MQTT v5 shared subscriptions and strict ACLs. Consider broker clustering.
- 100+ tenants: Per-tenant broker instances (or virtual brokers). The ACL table becomes too large to manage on a shared broker.
Scaling to 10,000+ Devices
At industrial scale, topic architecture interacts with broker performance in non-obvious ways:
Retained Message Storage
Every retained message is stored by the broker (typically on disk for persistence). With our hierarchy, a deployment with 1,000 devices publishing retained messages for status, alarms, and config (3 topics per device with retained messages) stores 3,000 retained messages. At ~500 bytes average, that's 1.5 MB — trivial.
But if you use per-metric topics with retain (one retained message per metric per device), 1,000 devices × 100 metrics = 100,000 retained messages. At broker restart, all 100,000 must be loaded from disk and sent to re-subscribing consumers. This can take 30+ seconds on modest hardware and creates a thundering-herd problem.
Rule: Use retained messages sparingly. Birth certificates (device metadata, current config) — yes. Individual telemetry values — no.
Subscription Matching Performance
MQTT brokers match every incoming PUBLISH against all active subscriptions. With 500 unique subscriptions containing wildcards, the broker performs 500 trie traversals per publish. The depth and branching factor of your topic tree directly impact this.
Our 8-level hierarchy is at the practical limit. Adding a 9th or 10th level rarely adds value and measurably increases matching time. If you need more granularity, encode it in the payload, not the topic.
Client Connection Limits
Edge gateways should use one MQTT connection per gateway, not one per device. A gateway serving 10 PLCs publishes to topics for all 10 devices over a single connection. This keeps the broker's connection count manageable and simplifies authentication (one credential per gateway, not per device).
Common Anti-Patterns
1. Timestamps in Topics
❌ mcdn/plant-chicago/chiller/GP-1017/2026/03/02/14/30/discharge_pressure
This creates an infinitely growing topic tree. The broker's trie never shrinks. After a month, you have millions of topic nodes that will never receive another message. Use timestamps in payloads, never in topics.
2. Flat Topic Spaces
❌ plant-chicago-molding-line3-chiller-GP1017-discharge-pressure
No hierarchy means no wildcard subscriptions. Every consumer needs exact topic names. Adding a new metric requires updating every subscriber's configuration.
3. Action Verbs in Topics
❌ mcdn/plant-chicago/chiller/GP-1017/read/temperature
❌ mcdn/plant-chicago/chiller/GP-1017/write/setpoint
MQTT is pub/sub, not request/response. Use the data class level (telemetry for readings, config/cmd for writes) instead of HTTP-style verbs.
4. Device IP Addresses in Topics
❌ mcdn/192.168.5.5/modbus/register/4000
IP addresses change. Devices get re-addressed. Use logical identifiers (serial numbers, asset IDs) that persist across network changes.
Edge Gateway Topic Management
In production, the edge gateway is responsible for mapping PLC-native addressing to the topic hierarchy. Here's how this works in practice:
- Device configuration defines PLC tags with attributes: name, register address, data type, polling interval, and whether the tag is telemetry or alarm class
- Auto-detection probes the PLC to determine its type (via device type registers) and serial number
- Topic construction combines site configuration + detected device identity + tag metadata:
- Site/area/line come from gateway configuration
- Device type and ID come from auto-detection
- Data class comes from tag configuration (alarm tags →
alarms/, everything else →telemetry/)
- Dynamic updates — if the gateway receives a configuration update via MQTT (on its config topic), it adjusts polling intervals, adds/removes tags, and publishes updated birth certificates
This separation of concerns means plant engineers configure PLCs by specifying register addresses and tag names (which they already know from the PLC programming environment), and the topic hierarchy is generated automatically by the platform.
Practical Recommendations
-
Design for 10× your current scale. If you have 100 devices today, design a topic hierarchy that works for 1,000. You'll get there faster than you think.
-
Document your topic hierarchy as a contract. Publish a topic specification that consuming applications depend on. Breaking the topic structure in production is as disruptive as breaking an API.
-
Use MQTT v5 if your broker supports it. Topic aliases (header-level compression), shared subscriptions (consumer groups), and user properties (metadata without polluting the payload) all reduce the pressure on topic design.
-
Monitor your topic tree. Most MQTT brokers expose metrics on topic count, subscription count, and message routing time. Set alerts on topic count growth — linear growth is expected, exponential growth indicates a timestamp-in-topic or similar anti-pattern.
-
Test with realistic load. A topic hierarchy that works with 10 devices and 3 subscribers behaves very differently at 500 devices and 50 subscribers. Load test before committing to a design.
How machineCDN Handles This
machineCDN's edge infrastructure auto-generates topic hierarchies from device configuration and auto-detection. When a gateway detects a PLC — whether via EtherNet/IP tag path probing or Modbus TCP register reads — it constructs topic paths from the device type, serial number, and configured site/area metadata. Alarm-class tags (those configured with change-of-value detection and immediate delivery) are routed to alarm topics automatically. Telemetry tags are batched and published on telemetry topics at configurable intervals.
The result: plant engineers configure what to monitor, and the topic architecture takes care of itself. No manual topic management, no wildcard subscription debugging, no retained message cleanup scripts.
Conclusion
MQTT topic architecture is infrastructure that outlasts the code that publishes to it. A well-designed hierarchy enables every future consumer — the historian you haven't deployed yet, the analytics engine you'll evaluate next quarter, the alerting system the safety team will request — to subscribe with a single wildcard pattern and get exactly the data they need.
Invest the time upfront. Design around your organizational structure (sites → areas → lines → devices), separate your data classes (telemetry ≠ alarms ≠ status ≠ config), and resist the temptation to encode dynamic data (timestamps, versions, states) in topic paths. Your future self — the one debugging a 10,000-device deployment at 3 AM — will thank you.