Big Data

Definition

Field that treats ways to extract, store, analyse & process for creating value for an organization, meeting the 5 V's: volume, velocity, variety, veracity & value.

5 V's

Volume

Refers to the sheer amount of data generated or collected. This can range from terabytes to petabytes and beyond. The amount of data is growing exponentially nowadays, because of IoT, social media, ...

Velocity

Velocity refers to the speed at which data is generated, collected, and processed. This speed can be very high in the case of real-time data streams, such as social media updates, sensor data, or financial transactions.

Variety

Different types of data, structured (e.g. databases), semi-structured (e.g. XML, JSON) or unstructured (e.g. text, video, audio, ...).

Veracity

Quality of data, which can be trusted or not. It's really important to have trustworthy data, because it can be used to make decisions, train AI models, ...

Value

Usefulness of data, which can be used to make decisions. Extracting meaningful information from data is the goal of big data.

Data management

Back in the days, each applications had its own database, and was asking for exports (or copy) of data from other applications. This was a problem, because data was duplicated, and it was hard to keep data consistent.

Nowadays, we prefer to centralize data, in order to have a single source of truth. Note that this is just a concept: in reality, data is still duplicated (but for a good reason), even distributed, but it's managed by a single system within the organization.

Last updated