Building the Internet of Things



Download 258.06 Kb.
Page9/13
Date10.06.2017
Size258.06 Kb.
#20222
1   ...   5   6   7   8   9   10   11   12   13

k.Acquiring data


IoT data acquisition is frequently referred to as data ingestion. In literature about Big Data, the three Vs, volume, variety, and velocity are often cited74. There are other aspects to consider as well. In our initial engagements in IoT, we have seen that device bandwidth, connection speed, reliability, and cost have been major influencers in the solution choices made. But each item in this section is important, and the relative importance of each will vary depending on a project’s requirements. The following sections discuss many aspects of data ingestion.

Message size and format


Messages from devices are the lifeblood of IoT. In a world with no boundaries, we might collect all telemetry data and analyze it extensively, or simply save it in case we need it later. In the real world, we need to consider the size of the message, which will be affected by its number of attributes, the data types, the message formats, the message overhead, and the security overhead.

Many common message formats are in use today. Extensible Markup Language (XML) and JavaScript Object Notation (JSON) are common. Binary JSON (BSON), Protocol Buffers, and Avro are more compact formats that are often used when message size and bandwidth are constrained. XML is supported by all development tools, and easy to understand, but its tags can often cause message-size bloat. JSON is quickly becoming as ubiquitous as XML, and it is more compact than XML, but JSON retains the readability of XML.

In IoT there is often a premium on memory, bandwidth, and connection cost, so compact message formats can be useful. BSON is a binary encoded version of JSON. It allows you to encode binary data in the message, and it enables storing data as raw bytes versus text. Protocol Buffers define a method of serializing structured data. They were developed at Google, and then given to the open source community. Protocol Buffers are compact, but not self-describing like XML and JSON, so sender and receiver must understand the message being transmitted. Avro is another option for compact formats. It differs from BSON and Protocol Buffers in that it is not self-describing, but it is always accompanied by a schema, so now code generation or prior knowledge of schema is required for processing on the message receiving end. Ultimately, choosing one of these formats comes down to how to balance development environment support, device support, the need for compactness, and storage and processing requirements on the message-receiving side.

Message types


Your system may require different message types that can differ in schema, data type, or both of these. A real-world example of this is a connected vehicle system that predominantly sends telemetry information for predictive maintenance. This system might also be used to send audio or video clips for emergency management, accident recording, and so on. In these cases, the media files are often enhanced with metadata related to the collection of the media file. Additionally, the media messages may be of lower or higher priority and they may require splitting, compression, resumption on error, and temporary local storage. If different device types are involved, they may provide media files in formats or encoding levels that are optimized or specific to those devices, which could require normalization at the storage point.

Message priority


Different message types will often have different priorities in an IoT system. A message can be a standard telemetry message that is intended specifically for cataloging, and used for machine learning algorithms downstream. There can be other message types that are considered events and alarms. An event could be an elevator door opening, a car starting, or the temperature being increased in a home, whereas an alarm might be a broken window, a car crash, or a full engine failure.

Message priority will be handled either by providing a separate endpoint for priority messages, or by detecting attributes in the message itself to assign priority. Using a separate endpoint for priority massages can reduce the chance of a high priority message delivery being slowed by a flood of the standard flow messages. If the throughput of the initial point of ingestion is considered adequate, then downstream detection is an option, for instance creating a standard subscription and a high priority subscription on an Azure Service Bus Topic.

There are also cases where device priority may be required. In a connected vehicle scenario, there may be a premium service that has priority, or there may be sensors in a building with relative priority, such as one that detects a broken window on the first floor that has higher priority than one on the fifth floor of the building. In this case, the priority may be handled similarly to message priority. Another approach is to use a separate service that handles the higher priority devices.

Conditional messaging


In some of our projects, the solution required the message pattern to change based on conditions. In this case, if a service technician received an alert that an elevator needed attention, the technician could send a message to the device asking for it to increase the detail and frequency of messaging. This would continue for a configurable timeframe.

This type of requirement means that the solution must be scaled to handle the conditional events. For instance, if the devices could automatically increase the size and frequency of messages, they could cause a dramatic increase in traffic to the system. Safeguards and throttling should be considered to protect against unplanned data floods in such situations.


Contextual messaging


Similar to conditional messaging, there are use cases that require contextual messaging, which can follow multiple patterns. There may be situations where the device includes contextual information in the messages that it sends. The data may include GPS coordinates, and a vehicle may need to send additional telemetry when it travels above a certain altitude, or if the ambient temperature rises above a trigger level. The context may require more data in messages, the collection of data from other sensors on the device, or it may require more or less frequent message transmission.

Message batching


The natural inclination may be to send messages immediately when data is generated, but there are several reasons why messages may be batched. A device may be power constrained, so the connectivity may only be turned on for a limited amount of time. The connection may be unreliable, so it could make sense to batch the collected messages for a single transmission once connectivity is available. The device may move in and out of connectivity, or connectivity may be congested or less expensive at certain times of the day. If you allow batched messages, the message receiver must be designed to accept them as well as single messages. In this case, a message envelope that can contain multiple messages or a single message can simplify the solution.

Bandwidth and scale


Previous topics in this paper discuss bandwidth from the device. The bandwidth and scale of the collection points must also be considered. The size of the network pipe out of the device environment may be constrained. For example, if the solution is collecting building telemetry, and there are devices that are connected to an internal network and sent to an external collection point, the effect on the capacity of the building network should be evaluated. The collection points will also have an upper bound. For example, Microsoft Azure Storage and Service Bus have capacity targets. If your solution needs to extend beyond the targets of the enabling technology, then a scale-out approach should be designed for the project. This approach should include plenty of excess capacity for growth and unplanned spikes. In our projects, we typically plan for no more than 50 percent capacity at steady state.

If the connected devices are geographically distributed, consider scaling out the solution to multiple data centers. This can introduce the complexity of directing device traffic to the right collection points. In our projects, we have found success in assigning devices to data centers so that no single device traffic needs to “find” where its data should go. If the device moves geographically, then it may need to be reassigned. It is important to understand how the data will be used, and if it needs to be aggregated before use or if the data can be used autonomously in the data center where it was collected.




Download 258.06 Kb.

Share with your friends:
1   ...   5   6   7   8   9   10   11   12   13




The database is protected by copyright ©ininet.org 2024
send message

    Main page