I wrote (Exploratory Data Analysis) this blog because there are so many enthusiastic learners who want to learn Data Science, ML, AI.
Diving into this field most of all gets stuck while doing the data analysis.
Because they don’t how to filter the required data from junk.
I will share some points that will help you get to understand the data, what’s the prediction behind that?
Exploratory Data Analysis refers to the critical process of performing initial investigations on data to discover patterns, spot anomalies, test hypotheses, and check assumptions with the help of summary statistics and graphical representations.
This is one of the most important parts of EDA or Feature Engineering. The understanding business outcome with required features. For that you have to gather relevant survey data, you also need to do some research on the module.
There are two way’s to do this first Primary Research and Secondary Research.
Primary Research :
Ask the questions and gather information from the stakeholders. If possible take a dry run of the problem you are trying to investigate.
Secondary research :
Read reports and studies by government agencies, trade associations, or other businesses in your industry. Go through any previous work and findings related to our problem.
Note: ” Quality of inputs decides the quality of modal output ”
The next step should be to use the acquired business knowledge to search for relevant data.
Now, let’s start with a real-life problem statement, Here I am using the last properties transaction to predict future property capital.
So, here price is our dependent variable, and after researching primary and secondary ways which independent variable impacting our price of the property.
Import the data in juypter.
You have to do EDD ( EXTENDED DATA DICTIONARY ), such an observation helps you to find the presence of outliers, the presence of missing value so on.
Using df. describe(), df. shape() you’ll get all EDD analysis.
Observer the Univariant data all most all in the range 1 to -1.
Which pair hold negativity that means both columns are not affected by our dependent variable.
Attention: Here I cleaned the data by myself, I am sharing code in the following screenshots!
Lastly, to sum up, all Exploratory Data Analysis is a philosophical and an artistical approach to gauge every nuance from the data at the early encounter.