GDPR, Event Sourcing and immutable events

Event Sourcing, Software Development

The privacy of the data has been addressed in the early days even before the internet era. There has never been a lack of legislation. The problem is that the legislation has not been much effective to prevent data breaches nor to enforce the rights of individuals in front of companies or governments.

Looking at a bit of history we can see how the US and Europe are diverging in their approach to data privacy.

Gdpr history

Gdpr history

In the US the legislation pays a greater attention to not to block companies to do their job. This is probably the reason why the US legislation is made up of several different pieces of law. The data privacy is maintained under a sort of patchwork that often allows companies and government agencies to work around limits and take advantages of exceptions.

In Europe, there is a more attention to the rights of individuals despite the need for data required by the businesses. The main law was defined by a Data Privacy Directive defining a set of principles. Now the GDPR is a directly applicable law that from the 25th of May will be the main umbrella to protect individuals and their data.

The GDPR contains and better defines the existing protection plus it enforce the rights of individuals to claim the respect of these rights in case of a legal battle. It is not a technical document. It doesn’t innovate much in term of techniques to avoid data breaches. In fact, it considers data breaches as something possible. Data breaches will happen. What will make the difference is if your company is compliant or not with the law when the data breach has happened.

GDPR Entities

GDPR Entities

How can a company be compliant with GDPR?

Review your Privacy Policy document and explicitly ask for consent to your customer. Under this law, you can only hold personal data that are strictly required and only for the time that are required. During the review of the policy, data try to define a map of all the personal data and eventually remove or anonymize data that are not strictly required. This work will help you also to comply with the right of data portability as the individual can ask you a document that aggregates all his data.

Appoint a Data Protection Officer that will be responsible to enforce the respect of the law.

Define a strategy to handle the cases where the individual can claim rights

The Right to Erasure (Art. 17)

It entitles the data subject to have the data controller erase his/her personal data, cease further dissemination of the data, and potentially have third parties cease processing of the data.

GDPR Right to be forgotten

GDPR Right to be forgotten

The Right to Data Portability (Art. 20)

You have the right to receive your personal data from an organization in a commonly used form so that you can easily share it with another

GDPR Data Portability

GDPR Data Portability

The Right to Restriction of Processing (Art. 18)

Unless it is necessary by law or a contract, decisions affecting you cannot be made on the sole basis of an automated processing

In my words, use this right if you want to avoid being tagged on Facebook ūüėČ

GDPR restrict data processing

GDPR restrict data processing

Your organizational GDPR strategies

  • Create or review your data privacy Policy and ask for Consent
  • If your company has >= 250 employees or works with sensitive data appoint a Data Protection Officer that will be responsible to ensure that the obligations under the GDPR are being met
  • Set a plan for each of the new rights that GDPR enforce

Software Strategies to comply with GDPR

…in case you are working with immutable events

  • Encryption
  • One stream per entity
  • Set retention period when possible
  • Provide a single view of the data

Always try to use

  • $correlationId‚Äôs
  • a map of where the personal data are stored
Encryption

Encrypt the related data for a particular person with a symmetric key
Delete the key when the person claims the Right to be Forgotten

GDPR streams encryption

GDPR streams encryption

One stream per entity

Keep the data in well defined separated streams Delete the related streams when the person claiming the Right to be forgotten

GDPR one stream per entity

GDPR one stream per entity

Set a retention period on incoming streams

Items older than the $maxage will automatically be removed from data collection streams

GDPR retention period

GDPR retention period

Provide a single view of the data
GDPR single view of data

GDPR single view of data

Data ingestion with Logstash and EventStore

Event Sourcing, Software Development

Event Store is a database to store data as events in data streams

It allows to implement CQRS and Event Sourcing. It also allows to exchange messages between components using simple Pub/Sub pattern

It’s open source and there is a community edition free to use and a commercial license with support as well

This article is the written version of a Webinar and it is about data ingestion.

The challenge

To be able to ingest data from any csv files load the content and send it to the Event Store Http Api

To be precise the challenge is to convert this

Europe,Italy,Clothes,Online,M,12/17/2013,278155219,1/10/2014,1165,109.28,35.84,127311.20,41753.60,85557.60

to that (this is an example of Http Post to the Event Store Http API)

Method POST: http://localhost:2113/streams/inbound-data
Content-Type: application/vnd.eventstore.events+json

[
{
“eventId”: “fbf4b1a1-b4a3-4dfe-a01f-ec52c34e16e4”,
“eventType”: “InboundDataReceived”,
“data”: {
“message”:¬† ¬† ¬† ¬† ¬† ¬†“Europe,Italy,Clothes,Online,M,12/17/2013,278155219,1/10/2014,1165,109.28,35.84,127311.20,41753.60,85557.60”
},
“metadata”: { “host”: “box-1”, “path”: “/usr/data/sales.csv” }
}
]

And as you can see we need to set several different parts of the http post. The unique eventId and enrich each message with some more metadata like the host and the file path.

These data will be precious later to find the specific mapping for that particular data-source and to keep track of the origin

Data Ingestion from CSV to Microservices

The Api Gateway is our internal data collection endpoint. In my opinion it must be a simple Http Endpoint. There is no need for an heavy weight framework that can potentially add complexity and little value.

In this case we are using the existing Event Store Http Api as Api Gateway and Logstash to monitor folders and convert the csv lines in json messages.

Logstash is a powerful tool that can keep track of the last ingested position in case something goes wrong and file changes.

There are other tools available for this task like Telegraf but I have more experience with Logstash.

Once that the message are flowing through the Gateway Api we can then implement Domain Components or Microservices that subscribes to these messages and provide validation, mapping and Domain business logic. They will then save results in their own data streams as Domain Events. To avoid persisting the data in the inbound stream you can also think about to optionally set a retention period after which the messages are automatically removed from the system.

  1. Download Logstash and Event Store
  2. Create a folder “inbound” and copy and paste in a new logstash.yml file my gist¬†https://gist.github.com/riccardone/18c176dab737631c9a216e89b79acdb7
  3. If you are on Windows create a logstash.bat file to easily run logstash. The content depend on your paths but in my case is:
    C:\services\logstash-6.2.4\bin\logstash -f C:\inbound\logstash.yml
  4. From a command line run logstash
  5. Create a subdirectory inbound/<youclientname>. Example: c:\inbound\RdnSoftware
  6. Copy a csv file in the folder and let the ingestion starts

With this Logstash input configuration, when you have a new client you can just add another folder and start ingesting the data. For production environment you can combine Logstash with Filebeat to distribute part of the ingestion workload on separate boxes.

In some scenarios expecially when the data are not yet discovered and you don’t have yet a¬† mapper it will be easy to direct the data straight into a semi structured data lake for further analysis.

Here is the webinar
https://youtu.be/2nyzvrIdnPg

Hope that you enjoy my little data ingestion example.

Domain Driven Design, Event Sourcing and Micro-Services explained for developers

Event Sourcing

And when I speak of the other division of the intelligible, you will understand me to speak of that other sort of knowledge which reason herself attains by the power of dialectic, using the hypotheses not as first principles, but only as hypotheses ‚ÄĒ that is to say, as steps and points of departure into a world which is above hypotheses, in order that she may soar beyond them to the first principle of the whole

Plato’s Republic

Here are my notes about Domain Driven Design, Event Sourcing and Micro-Service architecture and the relations between these concepts.

I’d like to write it down in¬†order to offer the people that I work with a way to get on with my software approach for Distributed Systems and how to shape them out of the darkness. It’s not a technical guide “how to” build a Micro-Service but instead it’s my¬†explanation of Domain Driven Design concepts and Event Sourcing. There are other concepts that are essentials for a Micro-Service architecture like horizontal scalability that are not discussed here.
Code example

Table of contents

What are MicroServices Bounded-Contexts Aggregates?
What are the relations between them?
What is a Process Manager and do I need it?
Micro-Services interactions with the external world
Event Sourcing in a Micro-Service architecture
Difference between internal and external events
Ubiquitous language
Behaviors vs Data structures
Keep out of the Domain the external dependencies
Stay away from Shared Libraries across components

What are MicroServices Bounded-Contexts Aggregates?

  • A Micro-Service is a process that does some work around a relevant business feature
  • A Component is a process that does some work around a relevant business feature
  • A Bounded Context is a logical view of a relevant business feature
  • An Aggregate is where the behaviours (functions) related to a relevant business feature are located.
  • A relevant Business Feature for the sake of this article is composed by one or more related tasks that execute some logic to obtain some results. From a more technical point of view, a task is a function. From a less technical¬†point of view it could be a¬†requirement.

What are the relations between them?

A Bounded Context is a relevant business feature. But it doesn’t represent a running process. A running process belongs to the “physical” world and can be better represented by the terms Micro-Service or Component. The Bounded Context meaning belongs to the “logical” world. These 3 terms can be confused together but they represent a different point of view of the same “relevant” business feature like “As a user I want to fill a basket with Products So I can checkout the bill and own them”
What does relevant mean then? This term is used to differentiate infrastructure services from services that the business domain experts expect and can understand. Is a proxy service that calls a third party service a relevant business feature? Probably not. It is better to define a more understandable Basket Bounded Context contained in a Micro-Service interacting with the other components.
In the terminology that I use day to day, I often mix the physical with the logical using the term “Domain Component”.

To summarize

  • I use both¬†Micro-Service or Component terms to indicate a physical running process
  • I use Bounded Context term to indicate the boundary of a relevant business feature
  • I use the term Domain Component to indicate¬†a relevant business feature within a running process when there is not a need to differentiate between physical and logical views

The size of a component or a Micro-Service can be narrowed down to a single task or function but there could be a wealth of behaviour if the requirement is complex. Within the same Micro-Service there is an Application Service that behave as an orchestrator deciding which tasks or functions need to be called depending on received messages.

There is a good metaphor describing this layering as an Onion Architecture.
If we want to describe on a whiteboard the Onion we can draw the external onion’s layer as an Application Service Endpoint that subscribes and listens for events and converts them to commands. The middle layer Handler handle¬†the commands and calls the behaviours (functions) exposed by the Aggregate. The Aggregate contains the logic related to the relevant business feature.

Event Sourcing, Domain Driven Design, CQRS

Event Sourcing, Domain Driven Design, CQRS

Generally speaking it’s better to keep the size small as much as we can. When I start shaping the Domain Model (Domain Model = group of components) I probably start with grouping more than one Aggregate in the same component. After a little while from the daily¬†discussions with other devs and business stake holder I can see if they are using the components names easily or not. This can drive me onto a more granular size and it’s easier to split a fat Bounded Context in small parts.

What is a Process Manager and do I need it?

Another important aspect of a distributed architecture is the need for some sort of ‘Process Managers’. If you are building a simple ingestion system that goes from a set of files down to the processing component and out in the report synchronizers (CQRS) then maybe you can avoid building a Process Manager.
It can be useful when there are multiple components or legacy subsystems involved and the coordination between them depends on unpredictable conditions. In that case…

  • Domain Components are responsible to process commands and raise events
  • A¬†Process Manager listens for events and sends commands.

Consider a Process Manager as a sort of Application Service on top of the other components. It can contains one or more stateful aggregate that keep the state of a long running process depending on requirements.

Micro-Services interactions with the external world

Through messages of course. The pattern could be a simple Event Driven Pub Sub. Why not “Request Response”? We can use the Request Response pattern defining an HTTP Api but this is more suitable when we need to expose service endpoints outside our domain and not within it. Within our domain, the components can better use a Publish Subscribe mechanism where a central message broker is responsible to dispatch commands and events around the distributed system. The asynchronous nature of Pub Sub is also a good way to achieve resiliency in case our apps are not always ready to process or send messages to others.

Event Sourcing in a Micro-Service architecture

With this Event Driven Pub Sub architecture we are able to describe the interaction between components. Event-Sourcing is a pattern to describe how one component processes and stores its results.

In other words, Event-Sourcing describes something happened within a single component whether Pub Sub is a pattern used to handle interactions between components.

Moving¬†down our focus inside a single Event Sourced bounded context we can see how it stores time’s ordered internal domain events into streams. The business results are represented by a stream of events. The stream becomes the boundary of the events correlated to a specific instance of the relevant business feature.

Event Sourcing example

Event Sourcing example

In a Distributed Architecture not all components are Event Sourced. There could be components used as proxy services to call external services. There could be components that are not required to keep track of changes across the time. In that case they can store their results updating the current state in a store like MongoDb, Neo4J or even Sql Server and publish external events to communicate with the rest of the interested components.

Difference between internal and external events

In a Distributed Architecture any component can potentially publish External Events when something has happened. Any interested component can subscribe to those events and handle them if it is required. In that scenario when I change the content of an external event I must carefully consider any breaking change that can stop other components to handle those events correctly.

In a Distributed Architecture any Event Sourced component stores its internal Domain Events in its streams. Considering one Event Sourced component, if I change the scheme of its internal events I can only¬†break itself. In that scenario I’m more free to change¬†the Internal¬†Domain Events.

It can be beneficial having in place a versioning strategy for changing the External Events in order to reduce the friction with other components.

But not all changes are breaking changes. In case of breaking changes, like if you remove or change an existing field it’s easier¬†create a new Event.

Ubiquitous language

There is the need to define an ubiquitous language when you start shaping your domain components. In my experience as a developer you can get the right terminology discussing day after day with business stakeholders and other expert devs about use cases. You can also introduce some more structured type of discussions using Event Storming techniques.

Slowly you can see the picture coming out and then just start building the components. If I dedicate to much time drawing¬†components and interactions on the papers I don’t get it right so it’s better to balance theories with early proof of concepts.

Behaviours vs Data structures

Another difference doing DDD compared with an old school layered architecture is that I need to stop thinking about the data. The data is not important. What is important are the behaviours. Think about what you have to do with the data, what is the business result that they expect instead of which tables and relations I need to create to store the data. The behaviours that you identify will be exposed from one or more aggregates as functions.

Keep out of the Domain the external dependencies

My domain project contains one or more aggregates. It is a good approach to try to keep this project clean from any external dependencies. Expand the list of references. Can you see a data storage library or some proxy client for external services in there? Remove them and instead define an abstraction interface in the Domain project. Your external host process will be responsible to inject concrete implementation of these dependencies.

In other words everything points in to the Domain. The Domain doesn’t point out.

Stay away from Shared Libraries across components

The DRY coding principle is valid within one¬†domain component but not between different components. This principle, Don’t Repeat Yourself, keep on the right track during development of related classes with a high level of cohesion. This is a good approach. In a distributed architecture made of multiple separate Domain Components it’s much better if you don’t apply the DRY principle across them in a way that all the components are tight to that shared library (language, version, features).

I personally broke this approach and I always regret having done that. As an example, I created a shareable infrastructure library containing a few classes to offer an easy and fast way to build up well integrated aggregates and message handlers . This library doesn’t contains any external dependency. It’s just a bunch of a few classes and functions.

Once it is stable I could probably accept¬†to attach it to all the C# components and follow the same approach in Java. In F#, Scala or Clojure, I don’t really need it or I can provide a slightly different set of features.

Instead of sharing it, it’s better to include the code in each single Domain Component (copy and paste). Keep it simple and let each single Domain Component evolve independently.

Enjoy!

External References

Related code from my Github repository

Thank you

A big thank you to Eric Evans, Greg Young, Udi Dahan for all their articles and talks. Whatever is contained in this post is my filtered version of their ideas.

Run EventStore OSS on an EC2 AWS instance

Event Sourcing

This post is a quick step by step guide on running a single EventStore https://geteventstore.com/ node on a free EC2 micro Aws instance. You can spin up quite easily this in order to start using the EventStore Web console and play with it for testing. In a real scenario, you must set a larger EC2 instance type as a single node or as a cluster of nodes behind a proxy and load-balancer.

There is this document in the EventStore documentation that you can follow. This micro-article is just a variation of it.

Let’s start.

Launch a AWS instance

  1. Open AWS Web console and select the “Launch instance” action
  2. Select “Ubuntu Server …” from the Quick Start list
  3. Select t2.micro free tier
  4. Click next until the “Configure Security Group”
  5. Add the following rules
    22 (SSH) 0.0.0.0/0
    1113 0.0.0.0/0
    2113 0.0.0.0/0
  6. Select “Review and Launch” and “Launch”
  7. Choose an existing Key Pair or create a new one. You will use this later to connect with the instance. Select “Launch” to finally launch the instance.

Install and Run EventStore

In order to connect with the AWS ubuntu instance you can select the instance from the list in your EC2 Web console and select the “Connect” action. This will open a windows with the link that can be used from a command line. I’m using windows and I have Git installed in my machine so I can use Git Bash command line and ssh.

  1. Open Git Bash on the foleder where your Key Pair key is stored
  2. To connect with your instance, select your running instance in the aws web console, open the Connection window and copy and paste the SSH Example connection link in your bash command line
  3. Get the latest OSS EventStore binaries. You can find the link to the latest from the GetEventStore website Downloads page. At the moment of this writing the Ubuntu link is EventStore-OSS-Ubuntu-14.04-v3.9.3.tar.gz . Copy this link address and in your connected bash command line run the following command to download the tar.gz with the EventStore binaries
> wget http://download.geteventstore.com/binaries/EventStore-OSS-Ubuntu-14.04-v3.9.3.tar.gz

Run the following command to unpack the tar.gz

> tar xvzf *.tar.gz

cd EventStore and start the single EvenStore node with a parameter to bound the service listening to all ip’s

> ./run-node.sh --ext-ip 0.0.0.0

Now you can connect to your public dns instance address on port 2113 and see the EventStore Web Console

Functional Domain Driven Design and Event Sourcing example in C#

Event Sourcing

This is an example of an aggregate written in C# following a functional approach similar to what it is possible to achieve with languages like F#, Scala or Clojure.

What it IS NOT in the Domain code example:

  • An AggregateBase class
  • An Aggregate Stateful class

What it IS in the Domain code example:

  • An Aggregate with some functions that take a command as an input and return correlated events

You can see the codebase here https://github.com/riccardone/EventSourcingBasket

Current State is a Left Fold of previous behaviours

Using a ‘pure’ functional language you probably follow the simplest possible way that is

f(history, command) -> events
(where history is the series of past events for a single Aggregate instance)

C# example from the Basket Aggregate Root

public static List Buy(List history, AddProduct cmd)
 {
     // Some validation and some logic
     history.Add(new ProductAdded(Guid.NewGuid().ToString(), cmd.Id, cmd.CorrelationId, cmd.Name, cmd.Cost));
     return history;
 }

Outside the domain, you can then save the events using a Repository composing the left fold of previous behaviours

yourRepository.Save(Basket.CheckOut(Basket.Buy(Basket.Create(cmd1), cmd2), cmd3));

CausationId and CorrelationId and the current state of an Aggregate Root

The CorrelationId is used to related events together in the same conversation. The CausationId is the way to know which command caused that event. The same CorrelationId is shared across all the correlated events. In a conversation, events can share a command Id as CausationId or different events can have different commands Ids depending on the execution flow. CorrelationId and CausationId can become an integral part of a Domain Event beside the unique EventId.

CausationId And CorrelationId example

CausationId And CorrelationId example

Domain Driven Design: Commands and Events Handlers different responsibilities

Event Sourcing

“Commands have an intent of asking the system to perform an operation where as events are a recording of the action that occurred…”
Greg Young, CQRS Documents Pag. 26

In this post I’d like to state something that I’ve just recalled from a Greg Young’s video. When I started creating an aggregate root I often ask myself… where do I put this logic? where do I have to set these properties? Is the Command or the Domain Event Handler responsible?

As we know, an aggregate root can be considered a boundary for a relevant business case. The public interface of an aggregate exposes behaviours and raises events.

Commands and Events responsibilities

Commands and Events responsibilities

With this in mind, when we design an aggregate root, we start adding behaviours/commands to it and then apply events. The following is a list of what a Command and a Domain-Event handler can and cannot do.

A Command handler is allowed to:

  • Throw an exception if the caller is not allowed to do that thing or passed parameters are not valid
  • Make calculations
  • Make a call to a third party system to get a state or any piece of information

A Command handler is not allowed to:

  • Mutate state (state transition in commands can’t be replayed later applying events)

It behaves like a guard against wrong changes.

A Domain Event Handler is allowed to:

  • Mutate state

A Domain Event Handler is not allowed to:

  • Throw an exception
  • Charge your credit card
  • Make a call to a third party system to get a piece of information
    (logic here can be replayed any¬†time… for example your card can be charged any time you replay events)

They apply the required changes of the state. No validation, no exceptions, no logic.

For example, if we have a command handler method called Deposit that receives an amount of money and updates the balance, we can implement a simple algorithm in the method and raises an event MoneyDeposited with the result of the calculation and any other information that we want to keep stored in our event store.

Example:

public void Deposit(decimal quantity, DateTime timeStamp, Guid transactionId, bool fromATM = false)
{
  if (quantity <= 0)
    throw new Exception("You have to deposit something");
  var newBalance = balance + quantity;
  RaiseEvent(new MoneyDeposited(newBalance, timeStamp, ID, transactionId, fromATM));
}
private void Apply(MoneyDeposited obj) 
{
  balance = obj.NewBalance;
}

Hope this helps

CQRS WITH EVENT SOURCING USING EasynetQ, EVENT STORE, ELASTIC SEARCH, ANGULARJS AND ASP.NET MVC

Event Sourcing

“The magic for a developer is doing something cool that no one else can understand” (myself few minutes ago).

The first step into this maze of cool messaging terms and tools left me with the feeling that most of them are not really relevant. After years of studying and practising, all these messaging and service oriented concepts melted with DDD and CQRS are making a lot of sense.
In this post I’d like to take out of my personal repository something that helped me in that journey.

Some time ago I read a very interesting blog post by Pablo Castilla https://pablocastilla.wordpress.com/2014/09/22/cqrs-with-event-sourcing-using-nservicebus-event-store-elastic-search-angularjs-and-asp-net-mvc/
I started studying it and I found it simple and complex enough to be taken as base template for my DDD + CQRS + Bus projects. Unfortunately after a while the trial version of NServiceBus expired on my home PC and then I decided to change the bus infrastructure from NServiceBus/MSMQ to EasynetQ/RabbitMq. Everything else is the same as in the original blog by Pablo except that I have updated all projects and libraries to the latest versions.

The solution’s work flow…

Bus and Event Sourcing

Bus and Event Sourcing

When you switch from a Bus generic framework like NServiceBus to a Bus Specific tool like EasynetQ a lot of configuration settings disappear.

For example you don’t need any more EndPoint config classes implementing those curious interfaces like IWantCustomInitialization
If you decide to adopt a Broker based solution you don’t have to waste hours and set every aspect of all little nice features that NServiceBus provides using xml config files or its fluent configuration. Instead you can immediately start to be productive and without paying much attention to configuration, you can start coding your app features and you publish and subscribe messages. Pay attention to some of the bad side effects of message patterns and take care of problematic scenarios if you have to. RabbitMq is a mature and solid infrastructure piece of software and it offers many solutions. EasynetQ is a little easy to use tool that offer a quick and clean Publish Subscribe mechanism.

Another consideration on NServiceBus here is that using Event Store you can rely on a feature built in it that allows you to manage long running processes. In this way you don’t need one of the nice features that NServiceBus provides called SAGA http://docs.particular.net/nservicebus/sagas/
Instead of using a Saga with NSB that finally is a class that stores everything happening around a process in a single storage like SqlServer with Event Store, is like defining an Aggregate that contains a stream of events and define projections running continuously and that allows you to react when something happens.

Said that, NServiceBus is an elegant library which is full of message features that can solve a lot of distributed architecture problems.

This is the code: https://github.com/riccardone/CQRS-NServiceBus-EventStore-ElasticSearch

References:
EasyNetQ
http://easynetq.com/

RabbitMQ
https://www.rabbitmq.com/