Big Data
Definition
Field that treats ways to extract, store, analyse & process for creating value for an organization, meeting the 5 V's: volume, velocity, variety, veracity & value.
5 V's
Volume
Refers to the sheer amount of data generated or collected. This can range from terabytes to petabytes and beyond. The amount of data is growing exponentially nowadays, because of IoT, social media, ...
Velocity
Velocity refers to the speed at which data is generated, collected, and processed. This speed can be very high in the case of real-time data streams, such as social media updates, sensor data, or financial transactions.
Variety
Different types of data, structured (e.g. databases), semi-structured (e.g. XML, JSON) or unstructured (e.g. text, video, audio, ...).
Veracity
Quality of data, which can be trusted or not. It's really important to have trustworthy data, because it can be used to make decisions, train AI models, ...
Value
Usefulness of data, which can be used to make decisions. Extracting meaningful information from data is the goal of big data.
Data management
Back in the days, each applications had its own database, and was asking for exports (or copy) of data from other applications. This was a problem, because data was duplicated, and it was hard to keep data consistent.
Nowadays, we prefer to centralize data, in order to have a single source of truth. Note that this is just a concept: in reality, data is still duplicated (but for a good reason), even distributed, but it's managed by a single system within the organization.
Last updated