Connect with us


Inspecting the Quality of Data and Drawing Inferences Using Machine Learning –



PGP AIML success story blog banner Sep 2021.pptx 8

I work as a DevOps Technical lead and am responsible for automating various tasks which are done on day to day basis, creating processes for performing various tasks and checking for data quality of items that we receive from various tools on regular basis. Data quality analysis which is performed on a regular basis is having a lot of manual effort and hence I decided to minimize the manual effort over there and can shift focus to various other productive tasks which will help businesses in achieving their goals. There was a need to check the quality of data received from different sources. I used machine learning for taking inferences from various data and applying technical aspects for fixing process gaps and automating daily activities in order to maintain accuracy.

Having inaccurate data in the machine, it was sending false outage information to the customer and was impacting the billing. This was making a lot of effort for the company in sending explanation emails for each false alert. By fixing this gap, the company is now able to focus on enhancing the applications to continue the business with us. Python was the scripting tool which is used for automating various tasks and SQL was used for fetching data from databases. I also implemented a machine learning mechanism for reading through the data and providing insight on various issues being faced in the current process, which will help us in taking corrective action and do better in our job. As python is an open source, which we can use for automating various tasks locally, has easily made me choose that. The database used is the oracle database in our organisation. Since the database queries are almost the same, the understanding which I got from this program helped me in framing queries easily and being able to fetch the required data. Every single solution will improve the process a little bit. Which doesn’t mean that the very first method we applied is the correct one. In order to finalise the process, need to test each method/model using multiple data sets to finalise the model. As different datasets behave differently in different models, we need to stick to the best possible model, which will give adequate results for any type of dataset.

I have created a process in order to validate the data and also have defined a predefined way of correcting those. This way, every day it will now take appropriate actions for fixing the data quality issue on its own and send out the results after completing the process which I created. As a result of this, the company is able to gain customer confidence and the ability to get more business. The exercise saved 30% of the manhours utilized in the job. This exercise helped in understanding the data in a better way. Got a depth knowledge of the project which I am working on and am able to perform much better in my team and able to close the gaps within the organization.