SDS project weather prediction
Describe your dataset. Explain the meaning of the columns that is there in the dataset.
Data Cleaning – Handle the missing data for both categorical and numerical variables (by dropping and imputing).
(Note: The missing values cannot be just ignored or deleted without examining)
Remove unwanted observations – Duplicate/irrelevant/repetitive.
Fix the typos and inconsistent capitalization.
Visualize the dataset to exhibit meaningful insights from it. Use any three graph visualization techniques. Filter unwanted outliers. Numerical – Box plot / Histogram. Categorical – Bar chart.
Compute the mean and variance for each of the columns. Normalize all the numeric columns, to make mean 0 and variance 1 Discuss why is normalization is needed? How does it affect dataset? Use graphs used to check whether the data is normal.
State the research hypothesis. Perform statistical tests. Freedom to make your own hypothesis based on the columns. Decide whether the null hypothesis is supported or rejected.
Find the correlation between variables that are positively and negatively related. State inferences about it.