Dataflow in .NET

In this blog post I’m going to take a deeper dive into the Dataflow .NET (Core + Framework) Library I alluded to in my previous post.

Code for this post can be found here: https://github.com/ry8806/Blog-DataflowExample/

After stumbling across the Dataflow Library for .NET I’ve been looking for an excuse to use it. So when a project for a well known betting site came along, this seemed like the perfect solution.

The Dataflow Library is described as “components to help increase the robustness of concurrency-enabled applications” and “dataflow components are useful when you have multiple operations that must communicate with one another asynchronously or when you want to process data as it becomes available”.

Essentially, Dataflow is used for In-App (or In-Proc) Messaging/Queueing. You create a number of “Blocks” and stitch them together, to form a pipeline. There are many different types of blocks:

ActionBlock - runs an Action<T> for every data element it’s given
TransformBlock - similar to the ActionBlock however this block can return a value
TransformManyBlock - similar to the TransformBlock, but can produce 0 or more values from one input
BufferBlock - a FIFO (first in first out) messaging structure - essentially a queue.
BroadcastBlock - a block which takes many messages, but only keeps the newest (last) message received. The value is not removed, once read
WriteOnceBlock - receives only one (first) value, after the initial value has been received, other values sent to this block are ignored

You can also create your own Custom blocks if none of the above fulfill your needs.

My Dataflow use case

For the project I need a simple datastore, I’m only recording the following details about a CompetitionEntry:

Username
Email Address
Date the User entered the competition
The IP Address of the User

I’m using the Dataflow Library so I don’t have to use a Database. Don’t get me wrong, databases are great, however they’re not always the silver bullet for data storage. SQL Databases are great for related data, many tables and performing in-depth queries on said data. Document Database stores are great for semi-structed and flexible data. Either way, they’re definitely over-kill for my needs (4 items of data per “row”), I don’t want or need the complexity of EF Core, the DI setup, creating the DbContext class etc.

Databases can deal with concurrent access without problem, however a single file on a server won’t. If two (or more) people enter the competition at the same time, 2 instances (or Threads) of our code will need to make changes to the file at the same time. There’s likely to be a race-condition on one, both or more request(s) might fail and the user’s entry will not be saved. This is unacceptable and we can’t ask users to retry, when we could have prevented the failure. We can use a Datablock to throttle access to the file. Imagine 100 users all enter the competition at the same time, we can then process these one after the other, ensuring that we’re not trying to make multiple changes to the file at once. We can queue up multiple entries and then process them one by one, ensuring we’re only accessing the file in a safe way.

The agreed output of the project (when the competition ends) was a CSV file of all users who entered the competition. Why not store the data in a CSV file, instead of having to transform the data if it was coming from a database?

The code

If you want to clone the git repository please use this link: https://github.com/ry8806/Blog-DataflowExample/

The code for adding the Dataflow library, and ensuring only one Thread/Task is accessing the file at the same time is really simple. Here’s how it’s done with comments in-line to explain what’s going on and why:

That’s the setup done. If you look carefully on line 27, I’ve added the producer to the DI Container, now I’ll be able to inject this into a controller to Send messages (CompetitionEntry) to.

My HomeController, has an enter route, which accepts (via POST) the competition entry from the webpage. It then takes that model and sends it to the BufferBlock:

Apart from some really simple webpage code, to show two HTML inputs, and send (POST) them to the server, that’s essentially it.

Running this in Azure

I ran a version of this in Azure for 2 weeks and it cost the grand total of £10. Using a DB would have cost more (not a lot though), and it would have meant that this application was over-engineering/would have increased complexity for this simple, short-lived application with one job. I’m glad I chose to keep this simple as the Web Application was a good success.

Standout Stats

Over the 2 weeks the Web Application:

Served well over 10GB of data
Had 0 HTTP (Server) errors
Averaged a response time of 22.2ms
Easily handled thousands of users visiting the page, and all submitting entries to the competition

This application did not take very long to build at all, the tech choices I made along the way, kept the complexity down and allowed me to just solve a very simple business need. The client and myself were very very happy with the outcome.

I hope that you’ve found this post useful, and maybe opened your eyes to a simpler way of doing a task which might help optimise your development workflow for smaller sites.

If you’ve got any questions or improvements, then leave them below, or connect with my on Twitter