Skip to content
#

cleaning-data

Here are 382 public repositories matching this topic...

Simple and automatic data cleaning in one line of code! It performs one-hot encoding, date & time casting to datetime dtype, detects binary columns, safely convert non-numeric columns to numeric dtypes, cleaning dirty/empty values, normalizing values and removing unwanted columns all in one line of code. Get your data ready for model training an…

  • Updated May 22, 2021
  • Python

💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api 🌺

  • Updated Nov 19, 2024
  • Jupyter Notebook

Some little notes from the author for everyone who wants to know or learn about the process that a data scientist must do from the beginning of data collection to making predictions with a model that has been built. These notes are based on the knowledge that the authors have learned and implemented. Enjoy it!

  • Updated Sep 29, 2020
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the cleaning-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cleaning-data topic, visit your repo's landing page and select "manage topics."

Learn more