Data Discovery through Exploratory Data Analysis

Data Discovery through Exploratory Data Analysis

A beginners guide

ยท

3 min read

Let me start with how it was created. So long before you and I were born in around 1970 John Tukey proposed this term. EDA stands for exploratory data analysis is a statistical approach to analyzing the data. While working with a Machine Learning or Data Science project being familiar with data is super super important.

If we know the data very well before even processing it then the later processing steps will work smoothly like butter. I mean just think about it like, If you are solving a puzzle of say 1000 pieces. You are definitely not going to pick a piece by piece without looking at other pieces. You first need to have a look at all the puzzle pieces then you will get a clear idea of how there are similarities lying in some pieces & eventually it will help you to solve the puzzle quickly with minimum effort.

If a person working with have to solve a Machine Learning task then he has to look at data very carefully to find a perfect understanding of data. EDA involves getting a clear understanding of the data and summarizing the data through visuals and summaries.

This understanding is not possible by looking at numbers in a table or in an Excel sheet. Instead in the EDA process, data are presented by plots. The graphs and plots provide crisp insights into the data. The visual presentation helps humans to understand any concept in less time and perfectly. The graphs give the first set of inferences regarding the data.

EDA is also about finding answers to certain questions regarding the dataset with the aim of gaining insights and for better understanding of the data. The tools and techniques of EDA help analyze both small and large datasets running into many rows and columns. They help decide the future course of action and which model to apply. Irrespective of the type of data and the model to be built, performing the initial data preprocessing steps are essential. Skipping this step would result in biased model output. The accuracy of the model would also get distorted if the necessary data preprocessing activities are not performed.

EDA is performed to:

  1. Identify trends and patterns.

  2. Develop an understanding of the data.

  3. Understand the relationship between variables.

  4. Find answers to questions relating to the data.

  5. Decide on the appropriate models to be executed on the data.

  6. Test assumptions.

Following are some EDA projects that I have performed. These projects include the most commonly used python code to perform Exploratory Data Analysis.

EDA on Spotify dataset.

EDA on Autompg dataset

EDA on Zomato dataset

Thank you for reading.

Have a nice day๐Ÿ˜!

For more such content make sure to subscribe to my Newsletter ๐Ÿ‘‰ here
Follow me on โฌ‡๏ธŽ

Twitter

GitHub

Linkedin

Did you find this article valuable?

Support writtenbykaushal by becoming a sponsor. Any amount is appreciated!

ย