Data Cleaning in HR [5 Steps] | AIHR Learning Bite

December 8, 2019 0 By Stanley Isaacs

Welcome back. In today’s bite, you’ll learn how to clean
your HR analytics data in only five steps. Stay tuned. Data cleaning is a key
element in HR analytics. Before you can analyze your
data, it needs to be clean. In this bite, I will explain
why data cleaning is important and how you can do it. I’ve also included a link to
a helpful in-depth article in the description below in which you can find a
data cleaning infographic to guide you further into the process. Let’s now dive into it. A common saying in data analysis
is garbage in, garbage out. What this means is that you
can put a lot of thought and effort into your data analysis and come up with lots of results. However, these results will mean nothing if the input data is not accurate. In fact, the results may even be harmful as they can misrepresent reality and lead to wrong decision making. HR data is oftentimes dirty. Dirty data is any data
record that contains errors which is something that can
happen for various reasons. Data cleaning helps to
run a smooth analysis. It also helps normal HR reporting as clean data can be fed
back into the HR systems. This will help improve data quality and is extremely beneficial
for later data analysis and data aggregation efforts making data cleaning a necessary step in the HR analytics process. Now that you’ve become familiar
with data cleaning caveats, here are six practical
steps for data cleaning. One, always check if
the data is up-to-date. Two, check for recurring
unique identifiers. Some people hold more than one position. Systems often create separate
records for each position. These people thus end up
having multiple records in a single database
which is a common reason why large organizations have
inaccurate headcount numbers. Depending on the situation,
these records may be condensed. Three, count missing values. When missing values are overrepresented in specific parts of the organization, they may skew your results. In addition, an analysis
with too many missing values runs the risk of becoming inaccurate which will impact the
generalizable of your results. Four, check for numerical outliers. Calculate the descriptive statistics and the values of the quantiles. These enable you to
calculate potential outliers. The minimum and maximum values
are a good starting point as well as the mode which is the most
frequently occurring value. In addition, you can calculate
the interquartile range. If this seems to be
complicated, don’t worry. I’ll share with you a link that
will help you in calculating the interquartile range
in the video description. Finally five, define valid data output and remove all invalid data values. This is useful for all data. Character data is clearly defined. For example, gender is defined by M or F. These are the valid data values. Any other values are
presumed to be invalid. These data can be easily
flagged for inspection. In doing so, remember that numeric data
is often limited in range. For example, working age
is between 15 and 100 and so numeric data that falls
outside the predefined range can be flagged the same way. Creating clean data doesn’t stop here. Although it is possible to
feed the clean data back into the source systems, the data practices that
led to this dirty data will still be in place. This means that newly inputted data will continue to be as
dirty as the old data. The only way to fix this
is to optimize the systems to ensure future data quality. For more insight into this, check out the link to a very useful global data integrity course that you can find in
the video description. That’s it for today’s bite
in which you’ve learned about the importance of data cleaning and how to perform it in six steps. If you’d like to get started right away, I included a link into an in-depth article in the video description in which you can find a
data cleaning infographic to help you dive straight
into the process. Remember to stay up-to-date
with our learning bites by subscribing to our channels. And if you like this video, make sure you like and share
it and I’ll see you soon in our next learning bite. (electronic music)