Before cost can be modeled, the way that the system will scale needs to be considered and the characteristics of the architecture need to be determined. Essentially, the attributes of the previously mentioned scale-unit need to be defined.
There is a throughput ceiling for each of the components in the architecture, including each of the Service Bus entities. The reason to be cautious when evaluating throughput is that when dealing with distributed devices that send messages periodically, we cannot assume perfect, random distribution of event submissions across any given period. There will be bursts and we need to allow for ample capacity reserve to handle such bursts.
Assuming a scenario of a 10-minute event interval with one extra control interaction feedback message per device per hour, seven messages per hour from each device can be expected, and roughly 50,000 devices can be associated with each entity with a 100 messages per second average throughput capacity.
Having covered the flow rate, we can conclude that storage throughput is of little concern. However, storage capacity and the manageability of the event store are concerns. The per-device event data at a resolution of one hour for 50,000 devices amounts to some 438 million event records per year. Even if these event records are limited in size to only 50 bytes, the yearly payload data is still 22 GB per year for each of the scale-units. This underlines the need to keep an eye on the storage capacity and storage growth when thinking about sizing scale-units.
These considerations manifest in a capacity model in the deployment model, which informs how many entities must be created in the Service Bus namespace backing a partition for a given device population size like 50,000 devices and for a given load profile.
The load profile is currently informed by how many (telemetry-) messages a device is generally expected to send, how many commands or notifications the device is expected to receive per hour, and what the average size of these messages is. The inputs should be well-informed, but generous estimates because while changing the shape of a scale-unit layout at a later time is possible, doing so may require re-provisioning the devices.
Determining partitions is not only motivated by capacity concerns, however. Because a partition also forms a configuration scope, it provides a suitable mechanism to segregate device populations by region, country, owner, operator, product, or other concerns. As an example, one deployment can have up to 1,024 partitions.
Each partition corresponds to exactly one Service Bus namespace. Because there can only be 50 namespaces per Azure subscription, and other dependent services have similar quotas, a fully built-out architecture will therefore most likely span multiple subscriptions.
In summary, the attributes that we have found to determine the capacity model are:
Number of devices. This is the number of sensors supplying telemetry information to the scale-unit.
Average message interval ingress / egress. This represents the average number of messages that a given device emits per hour (ingress) / and the system emits per hour (egress).
Figure . Scale Units in the reference architecture
Average message size ingress / egress. This is the average size of the messages that a device emits (ingress) or the system sends (egress), in bytes.
b.Cost estimation
With the estimation of cost for a solution built on top of this architecture, there are many factors to consider. We will work through the list from the ingress of device data to sending commands. Cost is estimated based on architectural design and necessary scale for success. As such, cost estimation has variables for the scale that is needed applied to the formula for calculation.
Before we dig into the details, we feel the need to underscore the fact that cost modeling, like capacity modeling, are inputs for architectural decision making and business case modeling, where the combination of all inputs should always be considered as a whole. As an example, you might find using HTTP for communications will be somewhat less expensive from a cost modeling perspective. However, choosing HTTP over AMQP will inherently impact performance.
For all pricing related information in the cost estimation formulas outlined in this section, it is important to state that prices will vary over time and the examples are aimed only at explaining the formula itself. The latest pricing information can always be found at http://azure.microsoft.com/en-us/pricing/overview/.
Figure . The ingress path of the reference architecture
As events consumed from an Event Hub, as well as Management operations and “control calls” such as checkpoints, are not counted as billable ingress events, the formula for estimating cost for the architecture when using Event Hubs is a combination of:
Which expands into a more detailed formula we can work with to fill in the appropriate variables:
Equation - The cost estimation formula for the ingress path
It should be noted this formula is using the “Standard” tier offering of Event Hubs91, which offers additional brokered connections, filters, and additional storage capacity. The fixed pricing elements in the formula uses pricing from a point in time, susceptible to change. Also, the formula assumes a flat use of brokered connections while actual billing is based on peak use prorated per hour; the dynamics of your system will likely deviate.
The variables in this equation are:
Variable
|
Description
|
|
The cost of the ingestion of events, per month.
|
|
The total amount of hours connections to the system are made, summing all simultaneous connection time.
|
|
The number of throughput units92 needed to support the ingress of data into the system. A throughput unit is the combination of inbound bandwidth, temporary storage and outbound bandwidth, as described in the reference.
|
|
The number of deployed devices sending data to the system.
|
|
The average number of messages sent into the system, per device, per month.
|
|
The average size of each message sent into the system, per month.
|
|
The number of worker roles necessary to support the projected scale of the system. Normally, at least two (2) are needed to fall within SLA support of Microsoft Azure.
|
|
The cost per worker role for the ingress path when using custom protocols and for the telemetry pump, per hour.
|
|
The average amount of egress traffic, per gigabyte.
|
|
The cost of egress traffic93, per gigabyte.
| Example calculation
An example calculation where 1,000,000 deployed devices send a message averaging 128 bytes every 60 seconds, having an average number of 100,000 simultaneously connected devices during the entire month would yield the following results:
Variable
|
Value
|
|
100,000 (100,000 simultaneous connections for the full month).
|
|
17 (44,640 messages per device, per month. 44,640,000,000 messages per month, equaling 16,666.6per second. Given a single throughput unit supports up to 1,000 messages per second, rounding up 16,666.6/1,000 equals 17).
|
|
1,000,000
|
|
44,640 (744 hours * 3,600 equals 2,678,400 seconds per month. 1 message every 60 seconds equals 44,640 messages per month)
|
|
1KB (rounding up 128 bytes in KB (128 / 1,024 equals 0.125)).
|
|
50 (assuming a rough estimate of 20,000 devices would be supported per worker role). Note again, this is not a capacity modeling exercise, these numbers should come from performance tests on your specific scenario.
|
|
$0.08 per hour (assuming A1 worker role size).
|
|
0 (assuming all downstream processing happens inside the same region DC.
|
|
Not Applicable
|
Egress path cost
As with ingress, the egress path also has multiple components that incur cost. As sizes often vary between ingestion data and command & control, the message size is not the same value as used in the ingress path.
The components involved in egress are:
Command API Host. The process in charge of sending notifications and commands to devices and groups of devices. It encapsulates the notification/command router, and routes egress messages to the appropriate topic on Microsoft Azure Service Bus, depending on the type of request. It is hosted inside a worker role.
Figure . The egress path for the reference architecture
Subscriptions. There are two different types of messages that the Command API supports: notifications and commands. A command can both yield a single or multiple response messages. Notifications and commands can also target groups of devices. All of these messages incur cost. Response messages have not been accounted for in the egress calculation and should be estimated here. Command replies are not routed through the telemetry adapters.
-
Egress traffic. Each egress message will incur cost.
Given these components, the egress path cost can be calculated using the following formula:
Which also expands into a more detailed formula we can work with to fill in the appropriate variables:
Equation - The cost estimation formula for the reference architecture egress path
This calculation combines both single device notifications and commands, as well as group broadcast messaging. Determining the magnitude and distribution in order to figure out the averages within the formula is left to the reader as part of the capacity modeling for the system architecture.
The variables in this equation are:
Variable
|
Description
|
|
The number of roles necessary to support the projected scale of the system. Normally, at least two (2) are needed to fall within SLA support of Microsoft Azure.
|
|
The cost per worker role for the command API host, per hour.
|
|
The average number of notifications per month.
|
|
The average number of single response command messages per month.
|
|
The average number of multiple response command messages per month.
|
|
The average number of response messages to commands, per month.
|
|
The average response size, in kilobytes, averaged over all outbound message types.
|
|
The cost of egress traffic94, per gigabyte.
|
Example calculation
An example calculation using 100,000 notifications per month of 20 KB each, 130,000 commands of 35 KB each with single replies of 80 KB each, and 20,000 commands of 20 KB each with on average three (3) replies of 70 KB each would yield the following results:
Variable
|
Value
|
|
2
|
|
$0.08 (A1)
|
|
100,000
|
|
150,000
|
|
20,000
|
|
190,000 (130,000 + 3 * 20,000 equals 190,000)
|
|
$0.138
|
Management cost
Figure - The "master" component within the reference architecture
Besides the messaging related components in the reference architecture, there is also the concept of one or more masters for managing the system, as discussed previously in this paper. The master is tasked with provisioning devices, creating appropriate queues and topics, storing device information, provisioning security, and so on. The master contains the following cost components:
Provisioning Runtime. The component called by tooling to provision a device or a set of devices into the system, creating the necessary service bus, compute, and storage artifacts. It is hosted inside a worker role.
Device Repo. The datastore collecting the registered devices per partition.
Partition Repo. The datastore collecting partition registration information.
Given these components, the egress path cost can be calculated using the following formula:
Equation - The cost estimation formula for management of the reference architecture
The variables in this equation are:
Variable
|
Description
|
|
The cost of the management for the architecture, per month.
|
|
The number of roles necessary to support the projected scale of the system. Normally, at least two (2) are needed to fall within SLA support of Microsoft Azure.
|
|
The cost per worker role for the management host, per hour.
|
|
The number of gigabytes used in the partition repository for administrative purposes.
|
|
The number of gigabytes used in the device repository.
|
|
The number of partitions to allow for appropriate scale.
|
|
The cost for Geo Redundant Storage (GRS) table storage ($0.095 / GB at the time of writing).
|
|
The change for device information. Any change to the device information stored in the system and subsequently in a device repository inside a partition, will account for at least two operations on table storage.
|
|
The cost for storage transactions ($0.0036 / 100k transactions at the time of writing).
|
Example calculation
An example calculation using 10,000 changes to device registration per month (either new devices, changes in activation, or removed devices) leading to a total partition repo (assuming a single master instance is used) size of 256 MB and 128 MB device repository per partition, using 10 partitions, would yield the following results:
Variable
|
Value
|
|
2
|
|
$0.16 (medium)
|
|
0.25
|
|
0.125
|
|
10
|
|
$0.095 / GB
|
|
10,000
|
|
$0.0036 / 100k
|
As can be observed from the outcome of the formula, the cost of management for the reference architecture is mostly dependent on the worker roles running to support it.
Share with your friends: |