Powered By Blogger

Data Wiping & Its Phases: Against Data Quality Problems

 

Data Wiping

&

Its Phases: Against Data Quality Problems

 


The Data Wiping, Data Cleansing or Data Shredding is a necessary process to ensure the quality of data to be used for analytics. This step is essential to minimize the risk of basing decision-making on inaccurate, erroneous or incomplete information.

 

The Data Wiping deals with solving problems of data quality at two levels:

 

·         Problems related to data from a single source: at this level are the issues related to the lack of integrity of the constraints or the precariousness of the schema design; which in turn will affect the uniqueness of the data and its referential integrity, mainly. Although, in a more practical sense, this section could also encompass issues related to data entry, in terms of redundancies or contradictory values, among others.

 

·         Problems related to data from various sources: as a general rule, they arise as a result of the heterogeneity of data models and schemas, which can cause structural conflicts; although, at the instance level, they are related to duplications, contradictions and inconsistencies in the data.

 

The phases of Data Wiping

The ultimate goal of any Data Wiping action is to improve the organization's trust
in its data. To carry out a comprehensive data cleaning action, it is necessary to follow the following steps:

 

1. Data analysis: your mission is to determine what kind of errors and inconsistencies should be eliminated. In addition to manual inspection of the data samples, automation is necessary, in other words, the incorporation of programs that act on the metadata to detect data quality problems that affect its properties.

 

2. Definition of the transformation flow and mapping rules: depending on the number of data source sources, their heterogeneity and the anticipation of data quality problems, it will be necessary to execute more or fewer steps in the transformation and adaptation stage. The

most appropriate thing is to propose an action at two levels, one at an early stage, that corrects problems related to data from a single source and prepares them for good integration; and another, to intervene later, dealing with data problems from a variety of sources. To improve control over these procedures, it is convenient to define the ETL processes by framing them within the specific framework.

 

3. Verification: the level of adequacy and effectiveness of a transformation action must always be tested and evaluated; one of the principles of Data Wiping. As a general rule, this validation is applied through multiple iterations of the analysis, design and verification steps; since some errors only become of evidence after applying to the data a certain number of transformations.

 

4. Transformation: consists of proceeding to execute the ETL flow to load and refresh the data warehouse, or during the response to queries, in cases of multiple sources of origin.

 

5. Clean data back-flow: once quality errors have been eliminated, the "clean" data should replace the unclean data in the original sources, so that legacy applications can benefit from them as well, avoiding the need for the application of Data Wiping actions in the future.

 


No comments:

Post a Comment

Necessity of a Secure Data Wipe

  Necessity of a Secure Data Wipe According to projections from  The Radicati Group , in 2021 we will be sending 320,000 million emails pe...