Differences Between Batch and Streaming Data

In this article, we will be discussing about the batch and streaming data, what they are and which are their main differences.


What is Batch Data and Batch Processing?

Batch data, is a term used to characterize large data sets with millions of records stored in files, records, etc.

Batch data is processed by the respective “Batch Processing Model”, based on which, large volumes of data are processed at once (i.e. at specific times, such as: end of day, end of month, etc.).


What is Streaming Data and Stream Processing?

Streaming data, is a term used to characterize data, that is being generated continuously by different sources. Another term that can be used for characterizing data that is being generated continuously is “Real-Time data”.

Streaming data, is being processed incrementally using stream processing techniques and tools, that is technologies that can actually process data in real time. This process is known as “Stream Processing” or “Real-Time Processing”.


The main differences between batch and streaming data are:

  • In Batch Processing, data is being collected over a period of time and then the data is processed on specific times, usually by an analytics system (i.e. Data Warehouse).
  • Streaming data is being processed by stream processing tools, in a real-time manner, since as mentioned before, this data is being generated continuously.


