Azure Event Grid, the heart of Azure

TL;DR - Azure Event Grid is a fully-managed event routing service which is a foundational service in Azure. Other Azure services start to emit events to it as well, but we need more of them to make the Azure ecosystem better.

Update May 29, 2019 - Azure Event Grid team announced a lot of new features and integrations which make it even better (blog post)

On August 16, 2017 Microsoft launched Azure Event Grid, a fully-managed event routing service which was the first of its kind!

Personally, I think it's one of the most powerful services Microsoft has released in the past couple of years, let's take a step back and look at its journey.

What is Azure Event Grid?

Event Grid allows you to publish events on an Event Grid Topic while in parallel consumers can subscribe for new events - Either with or without certain filtering criteria.

Event Grid Concept

This felt very similar to what Azure Service Bus Topics already had but totally different! As I wrote in 2017, Azure Service Bus Topics are not dead!

Event Grid is different in a lot of ways but the main ones are:

  • It's event-driven and not message-driven
  • It is push-based so you need to be ready for processing events and able to handle the load
  • It is not a durable broker which means that it's not transactional, if you fail to process it your events will be dropped

Clemens Vasters did a nice job on writing when you should use what in his 'Events, Data Points, and Messages - Choosing the right Azure messaging service for your data' post on the Azure Blog

CNCF CloudEvents

As of May 7, 2018 support for CNCF CloudEvents was announced allowing you to choose what event schema you'd like to use - Either Event Grid native or CloudEvents. The beauty is that you can use different schemas as a publisher than a consumer.

This allows you to either use native Azure or an open standard schema which works on multiple clouds - Awesome!

Event Domains

My good friend Bahram Banisadr announced Event Grid Domains on October 31, 2018, which allow you to use Event Grid in a multi-tenant environment.

Event Grid Domains

Producers can still push events as before, but Event Grid will dynamically determine to which tenants the event should be delivered or not based on a tenant key! However, consumers do only see the topic for their tenant and can subscribe just as before!

Event Grid is still going strong and evolving into an even more powerful service!

There is always space for improvements

It goes without saying that nothing is perfect and the service can still be improved!

The product group is always open for feedback and here are some things I'd like to see be added.

Introduction of an event metadata store

An event metadata store would allow me to simply document the events that are emitted when they will occur, what to expect and what data will be there.

As of today, you need to subscribe for all events and see what is flowing in, if they are already emitting them.

Improved event discovery

The operational perspective is already pretty ok as you have metrics on # events that are published, matched, dropped, etc but it does not tell you what the event type is.

Having this capability will make it easier to discover new events, correlate failures downstream or notice the lack of certain events.

Introduction of a self-service Event Portal

A self-service Event Portal would allow users to sign in and browse the events we publish, which would be an extension of the event metadata store.

This portal should be fully standalone similar to the Azure API Management Developer Portal and users should not require to have an Azure account.

By doing this, would allow us to do some nifty things:

  • Make it easier to collaborate with other teams
  • Use Event Grid as a central webhook engine for our applications running in Microsoft Azure. This is already possible but you need to collaborate with your clients. For Azure Deprecation I would love to open up our internal events over time, but don't want to do the bookkeeping myself and people should have a self-service experience.
  • Allow consumers to register webhooks themselves allowing me to defer the authentication handshake to them

The power of Event Grid lies in the Azure ecosystem

What I haven't mentioned up until now is that Azure services are becoming 1st class event publishers in Event Grid out-of-the-box and for free!

That means that Azure service will emit events about what is going on in your resources, both on the control & data plane, to a central Event Grid topic. This allows you to subscribe and react to what is going allowing you to close the gap between your Azure infrastructure and your app.

As of today, the following services already provide events:

Azure Event Grid is becoming the heart of the Azure ecosystem allowing you to stay up to date on what is going on in your infrastructure, just plug-in where you need to and extend.

Event Grid improves cross-service integration

Another great aspect of these built-in Azure events is that it does not only allow us as customers to react to events but also other product groups can plug into them as well.

This highly improves the cross-service integration which allows them to not only run their services in a more efficient way but also be more innovative and take over certain tasks we had to do in the past.

For example, Azure Service Bus emits events when a new message is added to a queue where nobody is listening. This event is used by the Azure Logic App team to improve their message processing rather than constantly polling, polling, polling, ...

Better performance and cheaper since we're not wasting CPU cycles anymore!

Event Grid is only as powerful as the ecosystem

We've just seen a nice list of 9 Azure services already emitting events to Azure Event Grid, but we need more of them!

As of today only 9 Azure services of the 100+ products in Azure are emitting them but we need all of them to plug into Grid!

As an Azure customer, it feels like the Azure ecosystem is not picking up the adoption of Event Grid. Unfortunately this is holding back Event Grid to show its true power in the ecosystem and show what it is really good at.

This is somewhat of a chicken & egg problem - Services do not really integrate because, I think, there is no real added value for their service. On the other hand, Event Grid cannot fully shine at what it does because of the other teams.

This feels a bit like a fundamental Azure ecosystem "issue" which requires a bit more guidance. For example, if you ask me, every new service should be emitting events to Azure Event Grid before they are allowed to go GA. (similar to Metrics in Azure Monitor, but that's a different story)

This will not only give customers the extension points they need but also improve the ecosystem and allow other services to plug in as we've discussed.

This opens up new scenarios which can improve the platform or simplify things for its customers:

  • Event-driven autoscaling (think KEDA, in Azure without Functions)
  • Improved Autoscaling awareness
  • Deploy ACI instance when an event occurs
  • Automated processes like certificate management, user revocation, alerts triggered, etc.

These are just simple scenarios but once we have more events in Event Grid, we can truly innovate.

A good example of this is automated certificate management.

My mind is too limited to think about all the scenarios but if we don't get the events, we cannot innovate.

Azure Key Vault shows the true power of Event Grid at the heart the Azure Ecosystem

A very good example of the power of 1st class event publishers is the Azure Key Vault.

During //BUILD/ 2019, Azure Key Vault team announced upcoming integration with Event Grid for the data plane!

As part of this you will have the capability to subscribe to the following events:

Key Vault Events

Now, why is this such a big deal? Did you ever manage certificates? It's pain right?!

Last year NuGet had an outage due to an expired certificate, but hey - We all struggle with that!

NuGet Outage

No need to worry about this no more!

With Key Vault events you can now fully automate your certificate renewal process by subscribing to the new Certificate Near Expiry / Secret Near Expiry event.

This can call an Azure Function where you generate a new certificate via PowerShell, store the new cert in Key Vault and your dependencies will use the newest version!

But there is more! Thanks to the Certificate New Version Created / Secret New Version Created event(s), other Azure Services such as Azure API Management & Azure App Services can now subscribe and automatically pull in the latest representation of your certification so you don't have to worry!

If the service does not support it, we could still call another Azure Function for that event and do it ourselves via the AzurePowerShell cmdlets.

Just by adding some extensibility to Azure Key Vault by emitting events to Azure Event Grid, customers cannot only streamline their processes but other Azure services can provide a better service!

Interested in these events? You can sign up for the preview here.

It's not only about the data, but also how it can be consumed

I ask for Event Grid support a lot, and that's an understatement. One of the main responses I've got from product groups was that you can already achieve the scenario if you would use feature X (*) to get the information and process it!

Wow, that's great! But frankly, that's not what I'm really asking for. It is not only crucial to get our hands on the data but also how we can do that.
All of the alternatives are pull-based meaning I have to schedule a process to check if there is new information, parse and process it.

This approach is a bit cumbersome but it's harder to justify as it increases the investment for the development, maintenance, operability and runtime cost of that integration. The beauty of Event Grid is that they let us know if they have events for us which are of our interest as they provide filtering as well.

A good example of this is Azure Cosmos Dbs Change Feed which gives you a history of what has changed to your documents. This is a great feature which allows you to extend what is going on with your documents, in an asynchronous matter.

However, you need to fetch all the changes yourself and checkpoint which part of the data stream you have already processed which brings some complexity. It would be good to have a similar approach with Event Grid and have them side-by-side. Event Grid would be good to react to smaller change event which needs to be reacted on immediately while change feed is a log of your data which you can fully replay on demand which is also important to have for things like auditing, etc.

So in most cases, it is not which data store should the team use, it's more of a question what is the use case and where shall we surface the data.

(*) Services can be Audit Logs, Activity Log, Alerts, Change feed, ARM, etc.

Events need to be straight-forward and easy to use

When I talk to product groups they often tell me that you can already get events about their service, because you have the ARM events! Unfortunately, I'd have to say that I don't fully agree with them.

Events should be straight-forward and easy to use. That means that every event should have a unique eventType on which you can subscribe that provides relevant data for that event, optionally with a subject field which tells me about the resource on which it applies.

Having unique event types makes it intuitive to process them as it indicates a given scenario. If I have to write my own parsing logic to know what is going on, then the events are "too complex" for me.

Why events can be too complex

Let's use an example here to why events can be too complex.

Azure Kubernetes Service allows you to scale the cluster in and out depending on your workloads. Since this has a cost impact, it would be good to be notified of this, unfortunately, AKS does not emit events.

Now, if AKS scales it actually scales the Azure infrastructure that they manage for you and is hidden. However, if you subscribe for ARM events you will see all those actions occur.

Here is a sample of a write event on the ARM API, but for Azure Storage which looks very similar.

{
    "subject": "/subscriptions/{subscription-id}/resourcegroups/{resource-group}/providers/Microsoft.ContainerService/managedClusters/{cluster-name}",
    "eventType": "Microsoft.Resources.ResourceWriteSuccess",
    "eventTime": "2019-05-26T17:15:29.8670898Z",
    "id": "668514e8-3ea5-4951-bab0-45f59a8f0caa",
    "data": {
        "authorization": {
            "scope": "/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.ContainerService/managedClusters/{cluster-name}",
            "action": "Microsoft.ContainerService/managedClusters/write",
            "evidence": {
                "role": "Subscription Admin"
            }
        },
        "claims": {
            "aud": "https://management.core.windows.net/",
            "iss": "https://sts.windows.net/c8819874-9e56-4e3f-b1a8-1c0325138f27/",
            "iat": "1558890042",
            "nbf": "1558890042",
            "exp": "1558893942",
            "http://schemas.microsoft.com/claims/authnclassreference": "1",
            "aio": "{obfuscated}",
            "altsecid": "1:live.com:000600009B458F6D",
            "http://schemas.microsoft.com/claims/authnmethodsreferences": "pwd",
            "appid": "{obfuscated}",
            "appidacr": "2",
            "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress": "tom.kerkhove@hotmail.com",
            "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname": "Kerkhove",
            "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname": "Tom",
            "groups": "{obfuscated}",
            "http://schemas.microsoft.com/identity/claims/identityprovider": "live.com",
            "ipaddr": "{obfuscated}",
            "name": "Tom Kerkhove",
            "http://schemas.microsoft.com/identity/claims/objectidentifier": "2f27cd8c-fb42-4af3-8f70-ece2b396f220",
            "puid": "1003BFFD876E720C",
            "http://schemas.microsoft.com/identity/claims/scope": "user_impersonation",
            "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier": "QFO588d0yWBTkNGeY3EcblOT8IrMMakb_sYoaNjoZBs",
            "http://schemas.microsoft.com/identity/claims/tenantid": "c8819874-9e56-4e3f-b1a8-1c0325138f27",
            "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name": "live.com#tom.kerkhove@hotmail.com",
            "uti": "EqkTzvzDjka9RtUKYE4aAA",
            "ver": "1.0",
            "wids": "62e90394-69f5-4237-9190-012177145e10"
        },
        "correlationId": "4907626b-2952-4ad6-91bb-17134c4b9611",
        "resourceProvider": "Microsoft.ContainerService",
        "resourceUri": "/subscriptions/{subscription-id}/resourcegroups/{resource-group}/providers/Microsoft.ContainerService/managedClusters/{cluster-name}",
        "operationName": "Microsoft.ContainerService/managedClusters/write",
        "status": "Succeeded",
        "subscriptionId": "{subscription-id}",
        "tenantId": "c8819874-9e56-4e3f-b1a8-1c0325138f27"
    },
    "dataVersion": "2",
    "metadataVersion": "1",
    "topic": "/subscriptions/{subscription-id}/resourcegroups/{resource-group}"
}

These events certainly give us insights, but only to a degree as it's not straightforward - We still need to interpret the generic ARM events, check if it's related to our Kubernetes being scaled and if so, try to determine what the new instance count is.

Ideally, I would receive one of the following events:

[
  {
    "topic": "/subscriptions/{subscription-id}/resourceGroups/demo/providers/Microsoft.ContainerService/managedClusters/toms-cluster",
    "subject": "/clusters/toms-cluster",
    "eventType": "Microsoft.KubernetesService.ClusterScaled",
    "eventTime": "2019-05-02T13:337:00.9584103Z",
    "id": "831e1650-001e-001b-66ab-eeb76e069631",
    "data": {
      "newInstanceCount": "10",
      "oldInstanceCount": "12",
      "initiatedBy": "bill.bracket@sello.com"
    },
    "dataVersion": "1.0",
    "metadataVersion": "1"
  }
]

Nice and simple, straight to the point!

For what it's worth, Azure Kubernetes Service is just an example of a service where this is the case. They are good folks and always open for feedback, it's just a matter of setting the right priorities based on demand.

This is only the beginning

I truly believe that we haven't seen the best of Azure Event Grid - It has so much potential, some of which I have already discussed.

Closer integration across Azure services

Closer integration across Azure services allowing Azure services to automate a lot more for their customers and make their services, even more, PaaS than they were before.

Using Event Grid as a webhook/event router for 3rd parties

Event Grid as a webhook/event router for 3rd parties would allow applications can just emit updates and our customers, who may or may not be Azure customers, to which they can subscribe.

Customers can then log in to an Event (Dev) Portal which allows them to browse for the events in our platform and get more information about when they occur, what the subject format is and what the event schema looks like. This could be an "Event-focused" version of Azure API Management's Developer Portal.

This information would be served from an Event Catalog which provides more information about the events itself. Think in terms of an OpenAPI spec, but for events, which is something that could be defined via the CloudEvents project or another CNCF project.

The goal is to allow customers to have a fully self-service model to discover and subscribe to the events they need via the Event Portal, while the service provider only has to push its events and annotate them in the Event Catalog.

Event Grid could form the foundation for a serverless business rules engine

Subscribing to events can feel like drinking from a firehose, so it's crucial that you know what you are doing and only subscribe to the events you need.

Event filtering is important here where you could only subscribe to a one or more event types, or event also filter based on the subject of the event.

However, I see a lot of value in a serverless business rules engine which allows me to take filtering & aggregation a step further by running business rules on top of events over a given timeframe - Think Stream Analytics, but for events!

Use Event Grid as a centralized event hub and build a 3rd party event ecosystem

Azure Event Grid is a centralized hub for all events, either emitted by your own applications with custom Event Topics or by Azure services.

But why stop there?

What if we would extend the idea and allow 3rd parties to emit events to your Event Grid topic, giving you a centralized hub for all the events that are important to your application?

I see a big opportunity to create a 3rd party event ecosystem which allows Azure customers to subscribe to events in 3rd party platforms such as GitHub, SAP, Office 365, Dynamics, Salesforce, DocuSign, etc. and flow these in our topic so we can use the same mechanics to process them.

This allows Azure customers to have a central place for all their events and very easily extend applications. However, this will also improve other Azure services such as Azure Logic Apps which can improve their runtime by instead of pulling these 3rd parties to just react on events inside of Azure without knowing how they should authenticate.

But what's in it for the 3rd parties? Not much at a first glance, other than making it easier for their customers to use and integrate with their platform allowing them to fully automate flows.

For example, if a deal is won in Salesforce we can automatically create a new agreement in DocuSign and send it out to them. Once it's signed, we can send a thank you email and start working by creating an Azure DevOps/GitHub project, etc.

Should all 3rd parties integrate with Azure to leverage this capability? Preferably yes! But if they don't have the bandwidth for this, Azure could provide a simple serverless "mapping" service which maps one contract to another or just handle this via Logic Apps behinds the scenes for its customers.

Conclusion

Azure Event Grid is a foundational service which serves as the central eventing router for Azure services and your application.

It does not only allow you to build reactive applications, but also lowers the barrier between your app and the infrastructure it is running on.

Another great aspect is how easy it is to start using and consuming it! Over time I see this as a built-in webhook router towards our 3rd party systems as well so that we don't need to do the bookkeeping, they just subscribe to whatever they want to know!

However, the success of Azure Event Grid lies in the ecosystem - Every Azure service should plug in and become a 1st party publisher.

Am I trying to bash the services that are not integrated yet? Certainly not! I'd just love to see all Azure services emit events to Event Grid.

Azure Event Grid is the foundation for the Azure ecosystem and makes it more successful; let's make it great(er) together.

Thanks for reading,

Tom.

unsplash-logoJuan Davila