Data munging is the practice of preparing data sets for reporting and analysis.
Data transformation is the process of changing the format, structure, or values of #data and it works on the basic objective of extracting data, converting it into a functioning format, and then delivering it back to the destination system
Demographic data is information about groups of people according to certain attributes such as age, gender, and place of residence, and can include socioeconomic factors such as occupation, family status, or income.
In this week’s Know Your Data {KYD}, we present you with a brief overview of the smart contract.
Structured data is the information that has been organized in a standardized format, has a defined structure, complies with a data model, and can easily be accessible by humans and data models.
Data swamp is the unmanaged and unstructured data that is present in the data lake, due to the badly designed, inadequately documented, and poorly maintained data lake.
Descriptive analytics is a statistical technique used to search and summarize historical data in order to find patterns or significance by incorporating data science.
Unstructured data is the information that has not been organized using a predetermined data model or schema and cannot be stored in a conventional relational database system (RDBMS).
Software-as-a-Service is a method of licensing software in which access to the program is granted on a subscription basis and the program is stored on external servers rather than internal ones.
Data orchestration is the process of gathering siloed data from various data storage points, organizing it, and then making it available and ready for processing by data analysis tools.
Data enrichment is the process of appending and expanding the first-party database from internal sources with the data collected from other internal sources or by third-party external sources.
Data collaboration is a strategy for digital innovation using "zero-copy" IT ecosystems to allow all stakeholders to collaborate on data-centric issues without ceding control of the data they supply.
Data augmentation is a technique for increasing the amount of data by adding copies of current data that have been minimally modified or for creating new synthetic data from existing data.
Data aggregation is the process of gathering large amounts of information from a specific database and organizing it into a more accessible and exhaustive medium.
Streaming data is the continuous flow of information from a source to a destination where it is processed and evaluated in near real-time.
Electronic data interchange is the intercompany communication of business documents from one computer system to another in a standard format without the need for human intervention.
Data quality is a measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability and whether it's up to date.
The data lifecycle is the sequence of stages that a particular unit of data goes through from its initial generation or capture to its eventual archival and/or deletion at the end of its useful life.
Metadata is a set of information that describes the structure, nature, data elements, interrelationships, and context of the data and is commonly known as “data about data”.
Zero-Party Data is the data that a user shares intentionally and proactively with a brand or the data that a business collects outside the standard procedures of eCommerce transactions.
Data Warehouse is a central repository of information that is designed to enable and support business intelligence (BI) activities for making more informed decisions.
Data center is a facility that centralizes an organization's shared IT operations and equipment to store, process, and disseminate data and applications.
Data Democratization is the process of simplifying organizational data for the employees and the stakeholders to help them make critical business decisions regardless of their limited knowledge of any technical expertise on the data set.
Data Modeling is a process of creating a visual representation of either a whole information system or parts of it to communicate connections between data points and structures.
Data curation is the process of creating, organizing, and maintaining data sets so they can be accessed and used by people looking for information.
Customer Data Platform is a software that centralizes & organizes customer data from various sources into a single database & then makes this data available to other marketing systems.
Big data is a combination of structured, unstructured and semi-structured data collected by organizations that can be mined for information using advanced analytic techniques.
First-Party Data is defined as the information about a company’s customers solely owned by the company and collected directly from the customers with their consent.
API is a set of programming codes that enables data transmission between two software platforms & essentially allows two unrelated applications to communicate with each other.
Ransomware is a form of a malware attack where the attacker locks and encrypts files and demands a payment to decrypt the data within a specific time or the ransom increases.
Data exploration is the initial step of data analysis where users examine an enormous data set in an unstructured fashion to identify early patterns, traits, and areas of interest.
Data privacy is the ability of an organization or an individual to restrict the amount to which their personal data can be shared with other parties.
Data ingestion is the process of collecting data from multiple data sources where it is transformed into a uniform format using an ETL process to be accessed and analyzed.
Cloud storage is a cloud computing architecture in which data is stored on the Internet and managed and operated by a cloud computing provider.
Graph database is a database that represents and stores data using graph structures for semantic searches using nodes, edges, and properties.
Edge computing is a distributed computing platform that puts business applications closer to data sources like IoT devices and local edge servers.
Data sanitization is the process of deliberate, secure & irreversible deletion of sensitive data from memory devices, ensuring no residual data can be retrieved even through forensic investigation.
Data catalog
is a detailed inventory of all data assets in an organization, designed to help data professionals quickly find the most appropriate data for any analytical or business purpose.
Data-as-a-Service is a data management approach that utilizes the cloud to supply data storage, integration, processing, and/or analytics services across a network.
Geospatial data is the time-based information that describes objects, events or other features with a location on or near the surface of the Earth.
Data intelligence is the practice of using artificial intelligence and machine learning tools to analyze and transform massive datasets into intelligent data insights.
Blockchain is a method of recording information in a way that makes it difficult or impossible to change, hack, or cheat the system.
Virtual reality data visualization is the computer generated, highly interactive 3D projects that rely on virtual reality to visualize big data.
Data masking is the process of modifying sensitive data in such a way that it is of no or little value to unauthorized intruders while still being usable by software or authorized personnel.
Data discovery is a process of exploring data through visual tools that can help non-technical business leaders find new patterns & outliers to help businesses better understand their data insights.
Non-Fungible Token (NFT) is a non-interchangeable unit of data stored on a blockchain, a form of digital ledger, that can be sold and traded.
Knowledge graph is a connected graph of data and associated metadata applied to model, integrate and access an organization’s information assets.
Data dictionary is a centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format.
Neural networks is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.
Metaverse is a hypothetical iteration of the Internet as a single, universal virtual world that is facilitated by the use of virtual and augmented reality headsets.
Quantum computing is a type of computation that harnesses the collective properties of quantum states, such as superposition, interference, and entanglement, to perform calculations.
Data mesh is a paradigm based on distributed architecture that enables users to access, query & discover analytical data from any data source rather than relying on getting output from a data lake.
Explainable AI is a set of processes and methods that allows human users to comprehend and trust the results and output created by machine learning algorithms.
Synthetic data is a fundamental concept in new data technologies that makes use of non-authentic, invented or automatically generated data that are not event-generated in the real world.
Data salting is a concept that pertains to password hashing using salt, which a randomly generated, fixed-length value that is designed to be unique with each user password.
Database sharding is the practice of optimizing database management systems by separating the rows or columns of a larger database table into multiple smaller tables.
Data pipeline is the process of extracting data from many disparate source systems, transforming, combining and validating that data, and loading it into the target repository.
OCR is a business solution for automating data extraction from printed or written text from a scanned document or image file & then converting the text into a machine-readable form.
Natural Language Processing is the branch of artificial intelligence concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.
Unsupervised learning is the use of machine learning algorithms to identify patterns in data sets containing data points that are neither classified nor labeled.
Generative Adversarial Network (GAN) is a machine learning model in which two neural networks compete with each other to become more accurate in their predictions.
Data provenance is the documentation of where a piece of data comes from and the processes and methodology by which it was produced.
Data credibility means that data is complete and accurate, and it is a crucial foundation for building data trust across the organization.
Data observability is an organization’s ability to fully understand the health of the data in their system by using automated monitoring, alerting & triaging to identify and evaluate data quality.
Data encryption is the process of transforming information, referred to as plaintext, using an algorithm to make it unreadable for anyone except those who possess the decryption key.
Data Security is the process of protecting digital data, such as those in a database, from unauthorized access, theft & data corruption throughout its lifecycle.
Data Forensics, also known as computer forensics refers to the study or investigation of digital data and how it is created and used.
Data cleansing is the process of preparing data for analysis by removing or modifying data that is incorrect, corrupted, incomplete, irrelevant, duplicated, or improperly formatted.
Data wrangling is the process of gathering, selecting, and programmatically transforming data into a format that makes it easier to work with.
Data fabric is a combination of architecture & technology that is designed to ease the complexities of managing many different kinds of data, using multiple database management systems.
Internet of things is the concept of connecting any device to the internet and the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction.
Data lake is the storage repository that holds a vast amount of raw data including structured, semi-structured, unstructured, and binary data in their native format until they are needed.
Data mart is a condensed version of data warehouse, contain repositories of summarized data, and is focused on a single functional area or unit within an organization.
Data anonymization is the method of preserving private or confidential information by deleting or encoding identifiers that link individuals to the stored data.
Data governance is a collection of people, processes, roles, policies, standards & metrics needed to manage and protect an organization's data assets.
Dark data refers to the information assets that organizations generate, collect, & store during regular business activities, but generally fail to use for other purposes like analytics, etc.
Data monetization is the process of identifying and marketing data or data-based products to generate quantifiable economic benefits.
Data exhaust is the data generated as trails or information by-products resulting from all online or digital activities, behavior, and transactions.
Data tokenization is the process of turning sensitive data into non-sensitive data, also known as "token" with no intrinsic value. It prevents data breaches and protect organisations.
Data Mining is the process of identifying anomalies, patterns, and correlations in large data sets in order to predict future outcomes. It enables organizations to make informed decisions.
Data Breach is a security incident in which malicious insiders or external attackers gain unauthorized access to confidential and protected information.
ESG data is used to assess a company's or a country's sustainability progress. It is widely regarded as the bedrock of responsible investing.
Data Asset is a structured, comprehensive, and validated database of information built for a specific use case & in response to a problem.
Alternative data is the non-traditional data that can provide an indication of future performance of a company outside traditional sources.