Coordinating IoT cluster with SignalR

Clustering is an important functionality in the Internet of Things (IoT). It allows multiple devices to act as one and coordinate the work between themselves.

For example, you may have IoT devices that are making real-time audio announcements at the departure gates of an airport. If the gates are in close proximity to each other, you may arrange the devices into a single cluster, so each device knows when any other device is making an announcement, which would allow the devices not to play audio over each other.

Another use could be to create a local system with redundancy and failover. The devices in the same cluster can synchronize the state and become aware if any of them goes offline.

.NET platform, since the release of .NET Core, allows applications to be compiled for Windows, Mac, or Linux. Moreover, it can be compiled for any suitable CPU architecture. The same code base can be made into an app for Raspberry Pi and an app for Windows PC. Therefore it is a suitable platform to build applications for IoT devices on.

Another benefit of using .NET for developing distributed IoT applications is a library called SignalR. This library facilitates real-time two-way communication between the client and the server with minimal setup code. So, once the client makes the initial connection request, there is no longer a request-response mechanism in place. Messages can travel both ways while the connection is live.

These characteristics make SignalR highly suitable as a communication mechanism between IoT devices and the controlling server. You no longer need your devices to send their connection status. The server application will know instantly if any of them has disconnected and hasn’t reestablished the connection. Likewise, you will no longer have to constantly ping the server from your device to see if there are any new commands available. The commands can be sent to relevant devices in real-time.

All of these characteristics of SignalR make it relatively easy to build a cluster of IoT devices that acts like a single distributed application. Now, you will see how this could be achieved.

The complete solution can be found in this GitHub repo.

Example SignalR Hub

In our example, SignalR is hosted by our own ASP.NET Core application. Although SignalR can also be hosted as an Azure service, a self-hosted example is easier to demonstrate the key points on.

The hub code is as follows:

using Microsoft.AspNetCore.SignalR;
using IotHubHost.Data;

namespace IotHubHost.Hubs;

public class DevicesHub : Hub
{
    public async Task ReceiveDeviceConnected(string deviceId, string areaName, string locationNumber)
    {
        UserMappings.AddDeviceConnected(deviceId, Context.ConnectionId);           
        LocationMappings.MapDeviceToLocation(locationNumber, Context.ConnectionId);
        await Groups.AddToGroupAsync(Context.ConnectionId, areaName);
    }

    public async Task BroadcastWorkStatus(string areaName, bool working)
    {
        await Clients.Groups(areaName).SendAsync("ReceiveWorkStatus", working);
    }

    public override async Task OnDisconnectedAsync(Exception? exception)
    {
        UserMappings.RemoveDeviceConnected(Context.ConnectionId);
        await base.OnDisconnectedAsync(exception);
    }
}

We have three methods in the hub:

  • ReceiveDeviceConnected(), which accepts the parameters of the device and registers it. The area name is used as a SignalR group name. Multiple devices that belong to each group will act as a single cluster, while the locationNumber parameter identifies an individual device.
  • BroadcastWorkStatus(), which broadcasts to all other devices in the cluster that one of the devices has either started or finished doing work.
  • OnDisconnectedAsync(), which is an override of one of the original SignalR hub methods. It removes the ConectionId mapping from a cache, so the system will not attempt to send individual requests to the device at a particular location number.

Caching device connection information

SignalR uses an arbitrarily defined ConnectionId attribute to identify an individual connected client. As well as being arbitrarily created, it changes every time a client establishes a new connection. But we need to be able to use something more constant and more readable, like a device identifier or an identifier of the location where the client is physically placed. Therefore we need to use some kind of cache to map ConnectionId to some other attribute.

To show this principle in the simplest way possible, I have created the following static class that acts as an in-memory mapper between location numbers and SignalR connection identifiers:

namespace IotHubHost.Data;

public static class LocationMappings
{
    private static readonly Dictionary<string, string> locationMappings = new Dictionary<string, string>();

    public static void MapDeviceToLocation(string locationNumber, string connectionId)
    {
        locationMappings[locationNumber] = connectionId;
    }

    public static string GetConnectionId(string locationNumber)
    {
        if (!locationMappings.ContainsKey(locationNumber))
            return string.Empty;

        return locationMappings[locationNumber];
    }
}

In addition to this, I have applied another mapper to assign locations to areas, the latter of which represents individual SignalR clusters. In a real-life solution, this would probably be handled by persistent data storage, but I have used a hard-coded dictionary for the simplicity of demonstration:

namespace IotHubHost.Data;

internal static class LocationsAreaMapper
{
    private static readonly Dictionary<string, string> locationsInAreas = new Dictionary<string, string>
    {
        {"1", "North Wing"},
        {"2", "North Wing"},
        {"3", "South Wing"},
        {"4", "South Wing"}
    };

    public static string GetLocationName(string locationNumber)
    {
        if (locationsInAreas.ContainsKey(locationNumber))
            return locationsInAreas[locationNumber];

        return string.Empty;
    }
}

So, there are four locations separated into two areas – North Wing and South Wing.

Client application on IoT device

For demonstration purposes, I have created the most basic application type to be placed onto an IoT device: a console application. Its entire code fits into a single file:

using Microsoft.AspNetCore.SignalR.Client;

bool holdOffWork = false;
int timeoutSeconds = 60;

Console.WriteLine("Please provide device identifier.");
string? identifier = Console.ReadLine();

Console.WriteLine("Please provide the area name for the device.");
string? areaName = Console.ReadLine();

Console.WriteLine("Please provide the location identifier for the device to be positioned at.");
string? gateNumber = Console.ReadLine();

HubConnection connection = new HubConnectionBuilder()
    .WithUrl("http://localhost:57100/devicesHub")
    .Build();

connection.On("ReceiveWork", DoWork);
connection.On<bool>("ReceiveWorkStatus", (working) => holdOffWork = working);

await connection.StartAsync();
await connection.InvokeAsync("ReceiveDeviceConnected", identifier, areaName, gateNumber);

async Task DoWork()
{
    var receiveTime = DateTimeOffset.Now;

    while (holdOffWork)
    {
        Console.WriteLine("Other device is doing work. Waiting...");
        if (DateTimeOffset.Now.AddSeconds(-timeoutSeconds) > receiveTime)
            holdOffWork = false;

        await Task.Delay(1000);
    }

    await connection.InvokeAsync("BroadcastWorkStatus", areaName, true);
    Console.WriteLine("Work Started");
    await Task.Delay(60000);
    Console.WriteLine("Work Finished");
    await connection.InvokeAsync("BroadcastWorkStatus", areaName, false);
}

The first thing that we do is establish a connection with the SignalR hub hosted by the server-side app. In our example, the address is hardcoded, but in real-life applications, it will be configured either by either a configuration file or an environment variables.

Then, we are prompted to provide a unique device identifier, location number, and area name. Again, this is done purely for the ease of testing on a PC. Once again, in an actual IoT application, all of these attributes will come from configuration.

Once the connection is established, the application will wait for an instruction from the server to do some work. Once such instruction is received, the application will notify the SignalR hub, so the message is broadcast to all other devices that belong to the same group.

In our case, we simulate work by simply waiting for a minute. But in a real-life application, this could be anything. Playing an audio could be one example. Downloading a large file from the server could be another. In the former case, we wouldn’t want other devices to start playing audio until the current device is finished, so the two devices don’t talk over each other. In the latter case, we would only have one device downloading the same file at the same time to make sure that we don’t use bandwidth inefficiently.

Once the work is finished, the device notifies all other devices in the cluster via the SignalR hub.

Each device holds an in-memory flag, so it always knows if any other device in the cluster is doing a type of work that only one device in the cluster should be permitted to do at any one time. So, if the device receives an instruction to do some work while this flag is set, it holds off until the other device has finished.

Scheduling work instructions from the server

Perhaps, the simplest way of getting your server-side ASP.NET Core application to proactively send tasks to the connected IoT devices is by running a background task via a hosted worker service. And this is what we have done here:

using Microsoft.AspNetCore.SignalR;
using Microsoft.Extensions.Hosting;
using IotHubHost.Data;
using Microsoft.AspNetCore.SignalR;
using IotHubHost.Data;
using IotHubHost.Hubs;

namespace IotHubHost;

internal class EventScheduler : IHostedService, IDisposable
{
    private readonly IHubContext<DevicesHub> _hubContext;

    private readonly string receiveWorkMethodName = "ReceiveWork";

    public EventScheduler(
        IHubContext<DevicesHub> hubContext)
    {
        _hubContext = hubContext;
    }

    public void Dispose()
    {
        
    }

    public async Task StartAsync(CancellationToken cancellationToken)
    {
        var locationsList = new List<string> { "1", "2", "3", "4" };

        while (!cancellationToken.IsCancellationRequested)
        {
            foreach (var location in locationsList)
            {
                var connectionId = LocationMappings.GetConnectionId(location);

                if (!string.IsNullOrWhiteSpace(connectionId))
                    await _hubContext.Clients.Client(connectionId).SendAsync(receiveWorkMethodName);
                else
                    await _hubContext.Clients.Groups(LocationsAreaMapper.GetLocationName(location)).SendAsync(receiveWorkMethodName);

                await Task.Delay(30000);
            }
        }
    }

    public Task StopAsync(CancellationToken cancellationToken)
    {
        return Task.CompletedTask;
    }
}

For demonstration purposes, we simply keep iterating through the list of location numbers. If any specific location number is found to have a SignalR connection id associated with it, then we send the request to do work to the individual device. Otherwise, we broadcast it to the entire cluster and the first device that has picked it up will take over.

It’s worth noting that in this example we are using IHubContext, which is used for sending messages to the hub’s client from outside the hub.

Taking it from here

Of course, this is a very simplistic example, but it does show the fundamental principles.

In a real-life scenario, you could make some further improvements to the architecture. For example, instead of connecting all devices to the SignalR hub over the internet, you could nominate one device as the leader and have it host its own SignalR hub that other devices will connect to via the internal network. This way, you will save on bandwidth usage.

Any messages can be routed to the relevant device via the device that acts as the leader. At the same time, the server no longer needs to be involved in letting the cluster know that one of the devices is busy. This entire process can be managed internally within the cluster.

If the leader can no longer connect to the server, then any other device should be able to take over as the new leader. This is similar to how resilience is managed in distributed large-scale applications.

Good luck with building your own IoT clusters!


P.S. If you want to learn more about SignalR, you can check out my book SignalR on .NET 6 – the complete guide.