How Facebook stores it’s 500+ TeraByte daily data?

Nishant Bhosale
4 min readSep 17, 2020
Big Data Image

How Big Data Solves this issue?

So, Facebook receives 500 terabyte+ data daily.

The question arises is how Facebook stores this amount of data!!

Because of this massive data is not easy to store and data and again give it to the user as they need days to compute, store, and give back the data to the user.

So how they make this happen in seconds?

The answer is Big Data.

First, we see what is data?

what is data

Data are characteristics or information, usually numerical, that are collected through observation. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more

persons or objects, while a datum is a single value of a single variable

Apache Hadoop

Hadoop

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

As to understand what exactly is Hadoop, we have to first understand the issues related to Big Data and the traditional processing system. Advancing ahead, we will discuss what is Hadoop, and how Hadoop is a solution to the problems associated with Big Data. We will also look at the CERN case study to highlight the benefits of using Hadoop.

Big Data is emerging as an opportunity for organizations. Now, organizations have realized that they are getting lots of benefits from Big Data Analytics, as you can see in the below image. They are examining large data sets to uncover all hidden patterns, unknown correlations, market trends, customer preferences, and other useful business information.

These analytical findings are helping organizations in more effective marketing, new revenue opportunities, better customer service. They are improving operational efficiency, competitive advantages over rival organizations, and other business benefits.

Distributed Storage

Distributed Storage

What is Distributed Storage? A distributed storage system is an infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.

What is Big Data

Big Data is a term that describes the large volume of data — both structured and unstructured — that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important.

Current usage of the term big data tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. “There is little doubt that the quantities of data now available are indeed large, but that’s not the most relevant characteristic of this new data ecosystem.”Analysis of data sets can find new correlations to “spot business trends, prevent diseases, combat crime, and so on.” Scientists, business executives, practitioners of medicine, advertising, and governments alike regularly meet difficulties with large data-sets in areas including Internet searches, fintech, urban informatics, and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics, connectomics, complex physics simulations, biology, and environmental research.

--

--