The challenge of deploying updates to IoT infrastructure projectsby Bogdan Nitulescu
In the previous two articles in this series about IoT infrastructure development, we examined the challenges of choosing and securing a framework. In this entry, we’ll take a look at some of the considerations around rolling out updates to the established network.
Obstacles to transition
The complexity of an IoT network presents a number of challenges to rolling out updates. Software is distributed over multiple nodes and platforms — typically a large number of devices in the field, spanning a diverse product portfolio with multiple hardware generations: apps on iPhones and Android phones; the IoT platform; and the custom server applications on top of it.
While most IoT platforms are able to handle a rollout of device firmware, this is only part of a greater strategy that also includes web and mobile app updates, and in some cases a major upgrade — or even a wholesale replacement of the IoT platform itself.
The rollout must take into account the possibility that local update policies, user choice or other impediments will leave certain devices (or categories of devices) updated later than the rest of the network.
The analytics system scrutinizing the deployment will need to extract device-specific data, possibly based on IP addresses and hardware IDs, since it may be dealing with different versions of the app across different devices. As mentioned previously in this series, such identifiers are in any case essential to accommodate the security requirements of pushing fully encrypted and signed updates to the network.
Handling legacy devices
Additionally, the update may introduce functionality intended to leave behind certain legacy devices which are not equipped to utilize it. It is important that a strategy for supporting and phasing out legacy hardware is put in place well ahead of time, in order to avoid costly decisions being needed on the spot. For business and industrial deployments, it’s necessary to undertake a cost-benefit analysis around replacing the hardware or updating the software.
For consumer devices, however, the public expects that at some point no new features will be available, but that the devices will retain their existing functionality, rather than that the security of their home be compromised by the lack of security updates from the vendor.
Our experience was that adding new features to devices a few years old was a great way to increase the rating of our customers’ products. Often owners of older devices were the early adopters of IoT technology — and many are social and online influencers. If legacy support proves too costly, a trade-in program will greatly increase customer retention. An IoT platform with the right business intelligence features can be a great support tool for showing how many legacy users are out there, and who those users are.
The user experience during updates
Another consideration is the necessity to manage live transactions during the rollout. In the case of radical changes to the system, the infrastructure may even need the ability to pause activity on all clients while the changes are implemented, whilst providing enough feedback for users to understand that the system is not experiencing downtime but rather being upgraded. But in general, all efforts are towards a more transparent and non-disruptive experience.
The update must implement a strategy for at least logging and hopefully resuming interrupted user sessions since the nature of the project likely means that there will be no convenient moment of ‘zero activity’ during which to commence rollout.
Other issues can arise from devices that are offline when an upgrade is launched. These include devices that may be sitting in warehouses on the way to the customer or in locations with connectivity — or devices which have been powered off by users for a while. Though it is acceptable for the users to go through a software upgrade screen during unboxing, this is a scenario that has to be tested, and any upgrade plan should take it into account. What any manufacturer wants to avoid is having a large number of devices taken off the shelves, unpacked, upgraded and repackaged at a great cost.
Over the last ten years, the culture of updates has trended away from event-driven releases towards continuous deployment. Under this system, the updated software is deployed to a subset of infrastructure, and incremental sections of the user-base gradually rerouted from the old to the new version, allowing for A/B testing.
The initial migration groups might be geo-based, IP-based, opt-in alpha or beta testers, or based on other criteria, including being based on the hardware platform or server infrastructure used. Given successful results, the old version is eventually decommissioned and the new one becomes the template for the next rollout.
This scheme allows the testing environment to be as similar to a production environment as it can possibly be — as opposed to dealing with a completely separate staging environment (servers, apps, and firmware), for which it can be difficult to simulate all the challenges that can be found in live operation. Rollback is more easily achieved under this model, by re-routing the client base back to the previous version should issues become apparent.
Models which follow this dynamic are similar in concept to BlueGreen Deployment (for updates on the server/platform side) and Canary Deployment (also known as ‘phased rollout’ or ‘incremental rollout’). The price of this transparent and ongoing update stream is the requirement to ensure reliable roll-back procedures for all different types of software involved, including apps and platform or firmware which requires developers to ensure backward and forward compatibility during the update phase.
Taking advantage of a load balancer across the network allows the update to gracefully overtake the previous software version on a per-server basis. It is important to avoid an abrupt change that would force all devices to reconnect to the new servers immediately; the performance impact of thousands of devices trying to connect to a server at once can be too much for a reasonably dimensioned infrastructure to handle. Controlled and metered migration from old servers will make it much smoother for devices to gradually transition to the new servers.
Backwards incompatibility and IoT platform migration
Laggard updates to users’ devices and mobile apps can be mitigated by implementing parallel change into the update flow. This addresses backward-incompatible interface revisions by splitting the change into several stages and providing temporary support for multiple APIs in cases where the overlap between the new and old schema could otherwise conflict. These temporary bulwarks can later be revised to conform to the new schema and API as necessary.
A larger challenge is moving an existing device fleet from one IoT platform to a completely different one due to strategy, cost or other technical or business reasons. Ideally, this would require an instant update of the mobile apps, firmware in the devices and IoT platform servers to a new and completely incompatible ecosystem.
In practice, however, this is impossible. It takes days for most users to get their phone apps updated, and the devices are updated even later. We expect that some users will remain with old versions of apps and device firmware for a long time. In addition, during migration, there will be moments when user accounts will have a mix of old and new apps and devices.
Historically, we have used a multi-step strategy to overcome this :
- First, create an emulation layer that exposes the API of the new platform but uses the old platform as an implementation.
- Make sure that the requests intended for a device are redirected to either the new platform or to the emulation layer, depending on where the device is registered.
- Launch a new mobile app that will use only the new API. Monitor the progress of the mobile app updates, and also check the usage of the old API vs the new one. Wait until most users have updated and outdated API usage is negligible.
- Migrate the devices to the new platform. This is usually done via a firmware upgrade.
- Decommission the old platform. If there are any devices still using the old firmware keep only the software upgrade system of the old platform available where possible, to allow them to update later.
The benefits of a platform migration are not always immediately apparent to users, even if those benefits easily translate into an advantage for the device fleet operator. Failures during the migration process have a high impact because users can lose access to their devices, and they are not likely to be tolerated. It’s important that a strategy such as the one above be implemented, to make the transition completely transparent from the point of view of the end-user.