Lambda architecture is an approach to big data management that provides access to batch processing and near real-time processing with a hybrid approach. The basic architecture of Lambda has three layers: Batch, speed and serving. The batch layer, which typically makes use of Hadoop, is the location where all the data is stored. MapReduce runs regular batch processing on the totality of this data. This information is sent to a data store and is used to gain insights into historic data trends.
Alongside this slower layer, new data is captured and processed as it comes in. The speed layer provides business users with the ability to adjust decision making and respond quickly to rapidly emerging trends. Data that passes into this real-time layer is also copied into the larger data set for slower, batch processing. Once the real-time processing is complete, the data is cleared from the speed layer to clear the way for more incoming data. The real-time layer can operate efficiently even with a steady stream of complex data because it only has to handle the volume of data that comes in between rounds of batch processing.
The speed and batch layers are merged together for querying through the serving layer which features a massively parallel processing query engine. Having access to this combined data set helps ensure that accurate reporting is available at all times with low latency.