Abstract:
Good quality data is an essential part for the purpose of reaching an accurate
and trusted machine learning model , However the present gained datasets in the real
world usually contains some serious issues like wrong values , missing data , outliers
or data noises , which can lead to the problem of producing wrong machine learning
algorithms . the research explore the effectiveness of different data cleaning
techniques in improving data quality for machine learning works . the research
compares and estimate the vary ways for data cleaning technics and their
performance such as handling missing values, outlier detection and removal, data
normalization, and feature scaling. Through comparing between different datasets
and observing their behavior , the research analyses the effect of each technics in the
datasets and the subsequent impact in the production in the machine learning
model. The result of this research is going to contribute and assets data scientists in
the process of making a better design when preparing datasets for a machine
learning model . by dedicating the correct data cleaning technics , the world can
improved the reliability and the consistency of a machine learning models which
fundamentally will lead to the improvement of decision making in a different ranges