Did you know that Facebook stores over 1000 terabytes of data generated by users every day? That’s a huge amount of data, and I’m only talking about one application! And hundreds of quintillion bytes of data are generated every day in total.
With so much data being generated, it becomes difficult to process data to make it efficiently available to the end user. And that’s why the data pipeline is used.
So, what is a data pipeline? Because we are talking about a huge amount of data, I will be talking about the data pipeline with respect to Hadoop.
Continue reading “What Is a Data Pipeline in Hadoop? Where and How to Start”