Analysis of data in real time that allow better decisions to be made. This is what the so-called «Data Streaming Platforms» and that have become very popular in recent years, due to the increasing volume and speed of data generated by companies and connected devices.
In this sense, Open Source developments such as Apache Kafka, or platforms such as Confluent, AWS Kinesis, IBM Streams or Microsoft Azure Stream Analytics, have found different use cases when it comes to detecting fraud in financial transactions, monitoring infrastructures, analyzing logs, improving customer experience, etc.
Today we are going to talk about the main platforms that you can find in this space and how they allow you to carry out precisely that analysis of data as it is being transmitted, instead of having to wait for it to be stored and processed in batches.
Apache Kafka is an open source data streaming platform used for the collection, storage, and processing of data in real time. It was originally developed by LinkedIn engineers in 2010 to meet their data processing needs and became an open source project of the Apache Software Foundation in 2011.
The platform stands out for its scalability that allows it to process huge amounts of data and high-performance workloads. To do this, use a model publish-subscribe for data transmission. That is, the data is sent to a central cluster (data producers) and later sent to its consumers (applications).
Among the most common use cases, we find the integration of different applications in heterogeneous systems, the processing of events in real time (such as financial transactions, user clicks, server registrations… with the aim of detecting patterns), infrastructure monitoring (through log analysis), or real-time data management in IoT scenarios.
Confluent is a company that specializes in the development of real-time data management solutions. Its main product is Confluent Platform, which is precisely based on Apache Kafka. It is no coincidence since in fact, Confluent was founded in 2014 by the same engineers who had developed Kafka a few years earlier, that is to say: Jay Kreps, Neha Narkhede and Jun Rao.
In addition to capabilities that we already found in the original development, such as its ability to process large amounts of data or its scalability, Confluent stands out for its extensive connectivity capabilities (it integrates with a wide range of tools), it is designed to support microservices architectures and offers additional security features.
For companies, it is an interesting solution since it offers a ready-to-use platform, tools for cluster management and monitoring, and pre-built connectors for the integration of all types of systems.
Amazon Web Services’ bet for real-time data management is AWS Kinesis. Its operation is similar to Apache Kafka and it is made up of three main services: Kinesis Data Streams (data transmission and storage), Kinesis Data Firehose (allows data to be sent in real time to different destinations without the need to configure and manage infrastructure) and Kinesis Data Analytics (real-time data processing using SQL).
Kinesis offers additional features that enable integration with other AWS services, such as Lambda for data processing, CloudWatch for metrics monitoring, or Identity and Access Management (IAM) for resource access and permissions management.
IBM’s bet in this field is IBM Streams. Like the previous ones, IBM’s is a platform designed to facilitate data analysis in real time. However, it presents some differences that are worth taking into account. While the former are based on event stream messaging systems, IBM’s is a distributed platform based on data stream processing technology.
Regarding integration, while Apache Kafka, for example, integrates with data systems (applications), IBM Stream can do it with sources, such as databases, social networks or IoT sensors. It is also an especially interesting solution in the case that we need to scale horizontally, since it allows us to process the data in parallel and in different nodes.
IBM Streams integrates with other company tools such as IBM Watson Studio, IBM Cloud or IBM BigSQL among others.
Microsoft Azure Stream Analytics
Finally, companies have at their disposal Microsoft Azure Streams Analytics, a platform that shares many of the characteristics that we have seen in both IBM Streams and AWS Kinesis, perhaps highlighting its strong integration with other Microsoft products as its main differentiating factor.
Choosing between one or the other platform will depend to a large extent on the previous infrastructure that the company has, but also on the requirements of the project, the data source characteristics that we will use and the necessary data processing capabilities.
In addition to the ones we have seen, there are many other data streaming platforms available on the market, such as Google Cloud Pub/Sub, Apache Flink, or Spark Streaming, among others.