Saturday, December 9, 2017

KAFKA - The Future Of Modern Architecture

Software architecture has evolved to completely new dimension in last few years. One of the major change with the advent of  open source technology, this has enabled a whole new dimension to software architecture style. I must say 10 years back it was easier to design a system, did you ask why? Imagine having options of choosing within a limited set of technology. There is always a known set of technology which plays perfectly with each other ex-.Net, SQL Server, SSIS etc. Compare this to today's stack, we need to more option than ever to communicate with disparate tech stack making it harder to manage inter-communication and to that, an added complexity of each tech stack have their own future growth path, while no one ensuring the compatibility with another stack. This poses a new maintainability challenge to architects to make all these stacks as loosely coupled as possible and thus introducing Kafka to solve most of our coupling woes.

I am going to take an example of a typical data-driven architecture -

This highlights the typical flow of a data-driven application Import raw data, batch process and correlates it to transform into facts, pass over to data warehouse layer for analytics, serve front layer and let other consume the fact using pub-sub. This traditional designing suffers from various downsides like slow not real-time, low maintainability index, low scalability index and not to mention complex to code. While others are obvious to let's talk about maintainability - In this example, all the stack should know how the communicating stack work like Nodejs, the importing service, and pub-sub service should know how to read/write to SQL, database engines should know each other etc. Trust me it becomes more complex if you start adding more producers  and consumers not to mention stuff like different analytical layer and machine learning will cause it to probably lead to demise. 

Now let's introduce Kafka into this. First Kafka is real-time, scalable and distributed streaming platform. What it means that there is a minimum data processing performance impact, power of load-sensitive scaling.
This serves as a great intermediate layer and perfect solution to our disparate tech stack problem as all the stack has to interact with just one technology Kafka. Let's try to see if we could introduce Kafka to our broken design and make it awesome.

It does look a lot better! All the stacks are now loosely coupled, independently scalable and guess what, adding any other layer is fun. If you want to add a machine learning or Hadoop layer just write it off the Kafka and there is no impact on performance. There are thousands of open source projects supports Kafka producer and consumer in almost any language. I use  Net as producer and has all types of consumers Spark, Kafka Stream, Nodejs, Drool etc.  Being real-time I publish in real-time to cloud for machine learning and other clients using pub-sub. Its stable and my current peak workload is around 50,000 messages a second dealing with TB worth of data. 

No comments:

Post a Comment