diff --git a/.ipynb_checkpoints/Classification_Telco_customer_churn-checkpoint.ipynb b/.ipynb_checkpoints/Classification_Telco_customer_churn-checkpoint.ipynb deleted file mode 100644 index 65ae9a5..0000000 --- a/.ipynb_checkpoints/Classification_Telco_customer_churn-checkpoint.ipynb +++ /dev/null @@ -1,2276 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Project Title: \n", - "Telecommunications Customer Churn Prediction Analysis" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Business Understanding" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1. Introduction\n", - "This project aims to assist a telecom company in understanding their data and predicting customer churn. The company has provided access to three different datasets: the first dataset with 3000 records stored in a remote Microsoft SQL Server database, the second dataset with 2000 records stored on OneDrive, and the third dataset hosted on a GitHub repository.\n", - "\n", - "### 1.1. Objectives\n", - "Understand the data: Our first objective is to provide insights into the telecom company's data, including customer demographics, services availed, and payment details. This understanding will enable the company to make informed business decisions.\n", - "\n", - "Find the lifetime value of each customer: By analyzing the data, we aim to identify factors that influence the rate at which customers churn. Understanding customer behavior and identifying key predictors will help the telecom company estimate the lifetime value of each customer.\n", - "\n", - "Predict customer churn: The primary objective is to develop a predictive model that accurately determines whether a customer is likely to churn or not. We will employ machine learning algorithms such as logistic regression, decision trees, support vector machines, random forest, etc., to build a model that effectively predicts customer churn.\n", - "\n", - "### 1.2. Methodology\n", - "To achieve our objectives, we will follow the CRISP-DM framework, which consists of the following steps:\n", - "\n", - "Data exploration: We will thoroughly explore the datasets to gain insights into the available variables, their distributions, and relationships. This step will provide us with an initial understanding of the data and help identify any data quality issues.\n", - "\n", - "Missing value computations: We will identify missing values in the datasets and decide on an appropriate strategy for handling them. This may involve imputing missing values or removing data points with missing values.\n", - "\n", - "Feature engineering: We will perform feature engineering to transform and create new variables that can potentially improve the predictive power of our models. This step may include encoding categorical variables, scaling numerical variables, or creating interaction terms.\n", - "\n", - "Model development: We will utilize various machine learning algorithms such as logistic regression, decision trees, support vector machines, random forest, etc., to develop predictive models for customer churn. We will train the models on a subset of the data and evaluate their performance using appropriate metrics.\n", - "\n", - "Model evaluation and interpretation: We will evaluate the trained models using evaluation metrics such as accuracy, precision, recall, and F1-score. Additionally, we will interpret the models to understand the factors driving customer churn and their relative importance.\n", - "\n", - "Model optimization and hyperparameter tuning: We will fine-tune the models by optimizing their hyperparameters to improve their performance. This step may involve techniques like grid search or random search to find the optimal combination of hyperparameters.\n", - "\n", - "By following this methodology, we aim to provide valuable insights to the telecom company and develop a reliable predictive model for customer churn." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Load Datasets" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Installations" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: pyodbc in c:\\users\\user\\anaconda3\\lib\\site-packages (4.0.34)\n", - "Note: you may need to restart the kernel to use updated packages.\n", - "Requirement already satisfied: openpyxl in c:\\users\\user\\anaconda3\\lib\\site-packages (3.0.10)\n", - "Requirement already satisfied: et_xmlfile in c:\\users\\user\\anaconda3\\lib\\site-packages (from openpyxl) (1.1.0)\n", - "Note: you may need to restart the kernel to use updated packages.\n" - ] - } - ], - "source": [ - "%pip install pyodbc\n", - "%pip install openpyxl" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load first dataset from SQL database" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "# Import necessary libraries\n", - "import pyodbc\n", - "import pandas as pd\n", - "import warnings\n", - "warnings.filterwarnings(\"ignore\")" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "# Establish a connection\n", - "server = 'dap-projects-database.database.windows.net'\n", - "database = 'dapDB'\n", - "username = 'dataAnalyst_LP2'\n", - "password = 'A3g@3kR$2y'\n", - "\n", - "# Create the connection string using the ODBC driver format\n", - "conn_str = f'DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}'\n", - "\n", - "# Establish the connection using the connection string\n", - "conn = pyodbc.connect(conn_str)" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "# Query the database to retrieve the data\n", - "query = 'SELECT TOP 3000 * FROM LP2_Telco_churn_first_3000'\n", - "df_db = pd.read_sql(query, conn)" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "# Close connection\n", - "conn.close()" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurity...DeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalChargesChurn
07590-VHVEGFemaleFalseTrueFalse1FalseNoneDSLFalse...FalseFalseFalseFalseMonth-to-monthTrueElectronic check29.85000029.850000False
15575-GNVDEMaleFalseFalseFalse34TrueFalseDSLTrue...TrueFalseFalseFalseOne yearFalseMailed check56.9500011889.500000False
23668-QPYBKMaleFalseFalseFalse2TrueFalseDSLTrue...FalseFalseFalseFalseMonth-to-monthTrueMailed check53.849998108.150002True
37795-CFOCWMaleFalseFalseFalse45FalseNoneDSLTrue...TrueTrueFalseFalseOne yearFalseBank transfer (automatic)42.2999991840.750000False
49237-HQITUFemaleFalseFalseFalse2TrueFalseFiber opticFalse...FalseFalseFalseFalseMonth-to-monthTrueElectronic check70.699997151.649994True
\n", - "

5 rows × 21 columns

\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure \\\n", - "0 7590-VHVEG Female False True False 1 \n", - "1 5575-GNVDE Male False False False 34 \n", - "2 3668-QPYBK Male False False False 2 \n", - "3 7795-CFOCW Male False False False 45 \n", - "4 9237-HQITU Female False False False 2 \n", - "\n", - " PhoneService MultipleLines InternetService OnlineSecurity ... \\\n", - "0 False None DSL False ... \n", - "1 True False DSL True ... \n", - "2 True False DSL True ... \n", - "3 False None DSL True ... \n", - "4 True False Fiber optic False ... \n", - "\n", - " DeviceProtection TechSupport StreamingTV StreamingMovies Contract \\\n", - "0 False False False False Month-to-month \n", - "1 True False False False One year \n", - "2 False False False False Month-to-month \n", - "3 True True False False One year \n", - "4 False False False False Month-to-month \n", - "\n", - " PaperlessBilling PaymentMethod MonthlyCharges TotalCharges \\\n", - "0 True Electronic check 29.850000 29.850000 \n", - "1 False Mailed check 56.950001 1889.500000 \n", - "2 True Mailed check 53.849998 108.150002 \n", - "3 False Bank transfer (automatic) 42.299999 1840.750000 \n", - "4 True Electronic check 70.699997 151.649994 \n", - "\n", - " Churn \n", - "0 False \n", - "1 False \n", - "2 True \n", - "3 False \n", - "4 True \n", - "\n", - "[5 rows x 21 columns]" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Display the dataframe\n", - "df_db.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load second dataset (excelfile)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurityOnlineBackupDeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalCharges
07613-LLQFOMale0NoNo12YesYesFiber opticNoNoNoNoYesNoMonth-to-monthYesElectronic check84.451059.55
14568-TTZRTMale0NoNo9YesNoNoNo internet serviceNo internet serviceNo internet serviceNo internet serviceNo internet serviceNo internet serviceMonth-to-monthNoMailed check20.40181.8
29513-DXHDAMale0NoNo27YesNoDSLYesNoYesYesYesYesOne yearNoElectronic check81.702212.55
32640-PMGFLMale0NoYes27YesYesFiber opticNoNoNoYesNoNoMonth-to-monthYesElectronic check79.502180.55
43801-HMYNLMale0YesYes1YesNoFiber opticNoNoNoNoYesYesMonth-to-monthNoMailed check89.1589.15
\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n", - "0 7613-LLQFO Male 0 No No 12 Yes \n", - "1 4568-TTZRT Male 0 No No 9 Yes \n", - "2 9513-DXHDA Male 0 No No 27 Yes \n", - "3 2640-PMGFL Male 0 No Yes 27 Yes \n", - "4 3801-HMYNL Male 0 Yes Yes 1 Yes \n", - "\n", - " MultipleLines InternetService OnlineSecurity OnlineBackup \\\n", - "0 Yes Fiber optic No No \n", - "1 No No No internet service No internet service \n", - "2 No DSL Yes No \n", - "3 Yes Fiber optic No No \n", - "4 No Fiber optic No No \n", - "\n", - " DeviceProtection TechSupport StreamingTV \\\n", - "0 No No Yes \n", - "1 No internet service No internet service No internet service \n", - "2 Yes Yes Yes \n", - "3 No Yes No \n", - "4 No No Yes \n", - "\n", - " StreamingMovies Contract PaperlessBilling PaymentMethod \\\n", - "0 No Month-to-month Yes Electronic check \n", - "1 No internet service Month-to-month No Mailed check \n", - "2 Yes One year No Electronic check \n", - "3 No Month-to-month Yes Electronic check \n", - "4 Yes Month-to-month No Mailed check \n", - "\n", - " MonthlyCharges TotalCharges \n", - "0 84.45 1059.55 \n", - "1 20.40 181.8 \n", - "2 81.70 2212.55 \n", - "3 79.50 2180.55 \n", - "4 89.15 89.15 " - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Read the excel file into a pandas dataframe\n", - "df_excel = pd.read_excel('Telco-churn-second-2000.xlsx')\n", - "\n", - "# Display the dataframe\n", - "df_excel.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load third dataset (csv file)" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurity...DeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalChargesChurn
05600-PDUJFMale0NoNo6YesNoDSLNo...NoYesNoNoMonth-to-monthYesCredit card (automatic)49.50312.7No
18292-TYSPYMale0NoNo19YesNoDSLNo...YesYesNoNoMonth-to-monthYesCredit card (automatic)55.001046.5Yes
20567-XRHCUFemale0YesYes69NoNo phone serviceDSLYes...YesNoNoYesTwo yearYesCredit card (automatic)43.952960.1No
31867-BDVFHMale0YesYes11YesYesFiber opticNo...NoNoNoNoMonth-to-monthYesElectronic check74.35834.2Yes
42067-QYTCFFemale0YesNo64YesYesFiber opticNo...YesYesYesYesMonth-to-monthYesElectronic check111.156953.4No
\n", - "

5 rows × 21 columns

\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n", - "0 5600-PDUJF Male 0 No No 6 Yes \n", - "1 8292-TYSPY Male 0 No No 19 Yes \n", - "2 0567-XRHCU Female 0 Yes Yes 69 No \n", - "3 1867-BDVFH Male 0 Yes Yes 11 Yes \n", - "4 2067-QYTCF Female 0 Yes No 64 Yes \n", - "\n", - " MultipleLines InternetService OnlineSecurity ... DeviceProtection \\\n", - "0 No DSL No ... No \n", - "1 No DSL No ... Yes \n", - "2 No phone service DSL Yes ... Yes \n", - "3 Yes Fiber optic No ... No \n", - "4 Yes Fiber optic No ... Yes \n", - "\n", - " TechSupport StreamingTV StreamingMovies Contract PaperlessBilling \\\n", - "0 Yes No No Month-to-month Yes \n", - "1 Yes No No Month-to-month Yes \n", - "2 No No Yes Two year Yes \n", - "3 No No No Month-to-month Yes \n", - "4 Yes Yes Yes Month-to-month Yes \n", - "\n", - " PaymentMethod MonthlyCharges TotalCharges Churn \n", - "0 Credit card (automatic) 49.50 312.7 No \n", - "1 Credit card (automatic) 55.00 1046.5 Yes \n", - "2 Credit card (automatic) 43.95 2960.1 No \n", - "3 Electronic check 74.35 834.2 Yes \n", - "4 Electronic check 111.15 6953.4 No \n", - "\n", - "[5 rows x 21 columns]" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Read third dataset\n", - "df_csv = pd.read_csv('LP2_Telco-churn-last-2000.csv')\n", - "\n", - "# Display the dataframe\n", - "df_csv.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Questions and Hypothesis" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Hypothesis\n", - "\n", - "H0: The churn rate of customers in the telecom company is not significantly influenced by various factors related to their \n", - "demographics, services, and payment methods.\n", - "\n", - "H1: The churn rate of customers in the telecom company is influenced by various factors related to their demographics, services, and payment methods." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Questions\n", - "1. Is there a relationship between the monthly charges and the likelihood of churn?\n", - "2. Do customers who have internet service, specifically fiber optic, exhibit a higher churn rate compared to those with DSL or no internet service?\n", - "3. Does the availability of online security, online backup, device protection, and tech support impact the churn rate?\n", - "4. How does the churn rate vary based on the customers' gender?\n", - "5. Does the presence of a partner influence the likelihood of churn?\n", - "6. Is there a correlation between the tenure of customers and their churn rate?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Data Exploration" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explore individual datasets " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explore The Dataframe from the SQL Database(df_db)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": { - "scrolled": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "RangeIndex: 3000 entries, 0 to 2999\n", - "Data columns (total 21 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 customerID 3000 non-null object \n", - " 1 gender 3000 non-null object \n", - " 2 SeniorCitizen 3000 non-null bool \n", - " 3 Partner 3000 non-null bool \n", - " 4 Dependents 3000 non-null bool \n", - " 5 tenure 3000 non-null int64 \n", - " 6 PhoneService 3000 non-null bool \n", - " 7 MultipleLines 2731 non-null object \n", - " 8 InternetService 3000 non-null object \n", - " 9 OnlineSecurity 2349 non-null object \n", - " 10 OnlineBackup 2349 non-null object \n", - " 11 DeviceProtection 2349 non-null object \n", - " 12 TechSupport 2349 non-null object \n", - " 13 StreamingTV 2349 non-null object \n", - " 14 StreamingMovies 2349 non-null object \n", - " 15 Contract 3000 non-null object \n", - " 16 PaperlessBilling 3000 non-null bool \n", - " 17 PaymentMethod 3000 non-null object \n", - " 18 MonthlyCharges 3000 non-null float64\n", - " 19 TotalCharges 2995 non-null float64\n", - " 20 Churn 2999 non-null object \n", - "dtypes: bool(5), float64(2), int64(1), object(13)\n", - "memory usage: 389.8+ KB\n" - ] - } - ], - "source": [ - "# Column information\n", - "df_db.info()" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "df_db has all 21 columns arranged as decribed in the readme file. The format for CustomerID in row 2999 is different from the rest. " - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(3000, 21)" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_db.shape" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Column: customerID - Unique Values: ['7590-VHVEG' '5575-GNVDE' '3668-QPYBK' ... '1891-FZYSA' '4770-UEZOX'\n", - " '1A1:U3001038-RQOST']\n", - "=============================================\n", - "Column: gender - Unique Values: ['Female' 'Male']\n", - "=============================================\n", - "Column: SeniorCitizen - Unique Values: [False True]\n", - "=============================================\n", - "Column: Partner - Unique Values: [ True False]\n", - "=============================================\n", - "Column: Dependents - Unique Values: [False True]\n", - "=============================================\n", - "Column: tenure - Unique Values: [ 1 34 2 45 8 22 10 28 62 13 16 58 49 25 69 52 71 21 12 30 47 72 17 27\n", - " 5 46 11 70 63 43 15 60 18 66 9 3 31 50 64 56 7 42 35 48 29 65 38 68\n", - " 32 55 37 36 41 6 4 33 67 23 57 61 14 20 53 40 59 24 44 19 54 51 26 0\n", - " 39]\n", - "=============================================\n", - "Column: PhoneService - Unique Values: [False True]\n", - "=============================================\n", - "Column: MultipleLines - Unique Values: [None False True]\n", - "=============================================\n", - "Column: InternetService - Unique Values: ['DSL' 'Fiber optic' 'No']\n", - "=============================================\n", - "Column: OnlineSecurity - Unique Values: [False True None]\n", - "=============================================\n", - "Column: OnlineBackup - Unique Values: [True False None]\n", - "=============================================\n", - "Column: DeviceProtection - Unique Values: [False True None]\n", - "=============================================\n", - "Column: TechSupport - Unique Values: [False True None]\n", - "=============================================\n", - "Column: StreamingTV - Unique Values: [False True None]\n", - "=============================================\n", - "Column: StreamingMovies - Unique Values: [False True None]\n", - "=============================================\n", - "Column: Contract - Unique Values: ['Month-to-month' 'One year' 'Two year']\n", - "=============================================\n", - "Column: PaperlessBilling - Unique Values: [ True False]\n", - "=============================================\n", - "Column: PaymentMethod - Unique Values: ['Electronic check' 'Mailed check' 'Bank transfer (automatic)'\n", - " 'Credit card (automatic)']\n", - "=============================================\n", - "Column: MonthlyCharges - Unique Values: [29.85000038 56.95000076 53.84999847 ... 33.90000153 34.\n", - " 38.59999847]\n", - "=============================================\n", - "Column: TotalCharges - Unique Values: [ 29.85000038 1889.5 108.15000153 ... 6143.14990234 144.80000305\n", - " 414.95001221]\n", - "=============================================\n", - "Column: Churn - Unique Values: [False True None]\n", - "=============================================\n" - ] - } - ], - "source": [ - "# check unique values of each column\n", - "for column in df_db.columns:\n", - " print('Column: {} - Unique Values: {}'.format(column, df_db[column].unique()))\n", - " print('==='*15)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
tenureMonthlyChargesTotalCharges
count3000.0000003000.0000002995.000000
mean32.52733365.3474002301.278315
std24.63776830.1370532274.987884
min0.00000018.40000018.799999
25%9.00000035.787499415.250000
50%29.00000070.9000021404.650024
75%56.00000090.2625013868.725098
max72.000000118.6500028564.750000
\n", - "
" - ], - "text/plain": [ - " tenure MonthlyCharges TotalCharges\n", - "count 3000.000000 3000.000000 2995.000000\n", - "mean 32.527333 65.347400 2301.278315\n", - "std 24.637768 30.137053 2274.987884\n", - "min 0.000000 18.400000 18.799999\n", - "25% 9.000000 35.787499 415.250000\n", - "50% 29.000000 70.900002 1404.650024\n", - "75% 56.000000 90.262501 3868.725098\n", - "max 72.000000 118.650002 8564.750000" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Summary statistics of df_db to get insights into the distribution and basic characteristics of the numerical variables\n", - "df_db.describe()" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "customerID 0\n", - "gender 0\n", - "SeniorCitizen 0\n", - "Partner 0\n", - "Dependents 0\n", - "tenure 0\n", - "PhoneService 0\n", - "MultipleLines 269\n", - "InternetService 0\n", - "OnlineSecurity 651\n", - "OnlineBackup 651\n", - "DeviceProtection 651\n", - "TechSupport 651\n", - "StreamingTV 651\n", - "StreamingMovies 651\n", - "Contract 0\n", - "PaperlessBilling 0\n", - "PaymentMethod 0\n", - "MonthlyCharges 0\n", - "TotalCharges 5\n", - "Churn 1\n", - "dtype: int64" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Handling missing values\n", - "df_db.isnull().sum()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explore df_excel" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurityOnlineBackupDeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalCharges
07613-LLQFOMale0NoNo12YesYesFiber opticNoNoNoNoYesNoMonth-to-monthYesElectronic check84.451059.55
14568-TTZRTMale0NoNo9YesNoNoNo internet serviceNo internet serviceNo internet serviceNo internet serviceNo internet serviceNo internet serviceMonth-to-monthNoMailed check20.40181.8
29513-DXHDAMale0NoNo27YesNoDSLYesNoYesYesYesYesOne yearNoElectronic check81.702212.55
32640-PMGFLMale0NoYes27YesYesFiber opticNoNoNoYesNoNoMonth-to-monthYesElectronic check79.502180.55
43801-HMYNLMale0YesYes1YesNoFiber opticNoNoNoNoYesYesMonth-to-monthNoMailed check89.1589.15
\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n", - "0 7613-LLQFO Male 0 No No 12 Yes \n", - "1 4568-TTZRT Male 0 No No 9 Yes \n", - "2 9513-DXHDA Male 0 No No 27 Yes \n", - "3 2640-PMGFL Male 0 No Yes 27 Yes \n", - "4 3801-HMYNL Male 0 Yes Yes 1 Yes \n", - "\n", - " MultipleLines InternetService OnlineSecurity OnlineBackup \\\n", - "0 Yes Fiber optic No No \n", - "1 No No No internet service No internet service \n", - "2 No DSL Yes No \n", - "3 Yes Fiber optic No No \n", - "4 No Fiber optic No No \n", - "\n", - " DeviceProtection TechSupport StreamingTV \\\n", - "0 No No Yes \n", - "1 No internet service No internet service No internet service \n", - "2 Yes Yes Yes \n", - "3 No Yes No \n", - "4 No No Yes \n", - "\n", - " StreamingMovies Contract PaperlessBilling PaymentMethod \\\n", - "0 No Month-to-month Yes Electronic check \n", - "1 No internet service Month-to-month No Mailed check \n", - "2 Yes One year No Electronic check \n", - "3 No Month-to-month Yes Electronic check \n", - "4 Yes Month-to-month No Mailed check \n", - "\n", - " MonthlyCharges TotalCharges \n", - "0 84.45 1059.55 \n", - "1 20.40 181.8 \n", - "2 81.70 2212.55 \n", - "3 79.50 2180.55 \n", - "4 89.15 89.15 " - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Call the dataset df_excel\n", - "df_excel.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "RangeIndex: 2000 entries, 0 to 1999\n", - "Data columns (total 20 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 customerID 2000 non-null object \n", - " 1 gender 2000 non-null object \n", - " 2 SeniorCitizen 2000 non-null int64 \n", - " 3 Partner 2000 non-null object \n", - " 4 Dependents 2000 non-null object \n", - " 5 tenure 2000 non-null int64 \n", - " 6 PhoneService 2000 non-null object \n", - " 7 MultipleLines 2000 non-null object \n", - " 8 InternetService 2000 non-null object \n", - " 9 OnlineSecurity 2000 non-null object \n", - " 10 OnlineBackup 2000 non-null object \n", - " 11 DeviceProtection 2000 non-null object \n", - " 12 TechSupport 2000 non-null object \n", - " 13 StreamingTV 2000 non-null object \n", - " 14 StreamingMovies 2000 non-null object \n", - " 15 Contract 2000 non-null object \n", - " 16 PaperlessBilling 2000 non-null object \n", - " 17 PaymentMethod 2000 non-null object \n", - " 18 MonthlyCharges 2000 non-null float64\n", - " 19 TotalCharges 2000 non-null object \n", - "dtypes: float64(1), int64(2), object(17)\n", - "memory usage: 312.6+ KB\n" - ] - } - ], - "source": [ - "# Column information\n", - "df_excel.info()" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": { - "scrolled": false - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Column: customerID - Unique Values: ['7613-LLQFO' '4568-TTZRT' '9513-DXHDA' ... '4816-JBHOV' '8920-NAVAY'\n", - " '1699-TLDLZ']\n", - "=============================================\n", - "Column: gender - Unique Values: ['Male' 'Female']\n", - "=============================================\n", - "Column: SeniorCitizen - Unique Values: [0 1]\n", - "=============================================\n", - "Column: Partner - Unique Values: ['No' 'Yes']\n", - "=============================================\n", - "Column: Dependents - Unique Values: ['No' 'Yes']\n", - "=============================================\n", - "Column: tenure - Unique Values: [12 9 27 1 24 14 32 11 38 54 29 44 59 3 18 67 22 33 5 2 72 16 8 23\n", - " 50 17 68 25 71 46 48 61 37 65 49 64 21 10 6 51 47 52 35 45 4 60 28 39\n", - " 30 55 58 26 43 56 36 13 70 69 41 15 19 31 34 66 40 7 53 63 42 0 57 62\n", - " 20]\n", - "=============================================\n", - "Column: PhoneService - Unique Values: ['Yes' 'No']\n", - "=============================================\n", - "Column: MultipleLines - Unique Values: ['Yes' 'No' 'No phone service']\n", - "=============================================\n", - "Column: InternetService - Unique Values: ['Fiber optic' 'No' 'DSL']\n", - "=============================================\n", - "Column: OnlineSecurity - Unique Values: ['No' 'No internet service' 'Yes']\n", - "=============================================\n", - "Column: OnlineBackup - Unique Values: ['No' 'No internet service' 'Yes']\n", - "=============================================\n", - "Column: DeviceProtection - Unique Values: ['No' 'No internet service' 'Yes']\n", - "=============================================\n", - "Column: TechSupport - Unique Values: ['No' 'No internet service' 'Yes']\n", - "=============================================\n", - "Column: StreamingTV - Unique Values: ['Yes' 'No internet service' 'No']\n", - "=============================================\n", - "Column: StreamingMovies - Unique Values: ['No' 'No internet service' 'Yes']\n", - "=============================================\n", - "Column: Contract - Unique Values: ['Month-to-month' 'One year' 'Two year']\n", - "=============================================\n", - "Column: PaperlessBilling - Unique Values: ['Yes' 'No']\n", - "=============================================\n", - "Column: PaymentMethod - Unique Values: ['Electronic check' 'Mailed check' 'Credit card (automatic)'\n", - " 'Bank transfer (automatic)']\n", - "=============================================\n", - "Column: MonthlyCharges - Unique Values: [ 84.45 20.4 81.7 79.5 89.15 20.3 74.95 74.4 20. 25.\n", - " 80.45 19.75 65.65 71. 89.2 86.75 55.3 61.5 25.1 55.15\n", - " 34.05 19.95 89.7 26.3 84.95 20.7 43.25 48.35 79.55 71.05\n", - " 19.45 110.8 84.5 69.3 49.35 20.35 105.6 64.45 108.6 49.9\n", - " 30.3 30.4 45.4 103.3 84.15 44.45 85.4 89.9 55.05 104.1\n", - " 106.6 75.2 70.5 19.6 55.85 24.05 38.1 106.4 34.25 100.05\n", - " 68.65 45.8 75.75 84.4 96.4 20.55 50.95 90.5 79.4 58.75\n", - " 59.45 105.7 56.25 53.3 85.55 24.3 77.85 59.9 23.95 20.15\n", - " 105.35 95.65 87.05 81. 82.45 53.5 20.5 54.4 58.6 84.8\n", - " 61.4 79.65 94.45 79.8 54.2 74.05 49.15 19.4 113.65 106.\n", - " 25.95 19.1 103.4 100.55 95.4 75.15 107.9 19.5 85.95 24.95\n", - " 59.4 69.95 82.85 19. 38.85 30.6 95. 78.45 74.3 51.05\n", - " 19.2 99.55 70. 109.1 45.3 29.85 76.45 95.1 19.8 72.8\n", - " 18.95 76.65 99.15 101.75 75.45 64.1 25.65 75.1 95.85 72.75\n", - " 19.85 19.05 44.95 49.55 94.85 46.25 19.35 69.6 90.7 101.4\n", - " 20.25 48.8 74.35 68.75 100.2 20.85 95.9 45. 81.5 25.5\n", - " 48.9 84.1 81.3 95.2 36.45 83.3 25.05 89.85 49.85 54.65\n", - " 29.35 19.15 55.55 80.55 69.5 104.3 79.6 55.25 88.05 117.6\n", - " 19.65 70.55 93.85 65.8 20.05 80. 35.4 80.25 50.45 20.45\n", - " 24.7 77.3 29.75 44.9 29.8 74.65 71.95 20.75 56.3 105.25\n", - " 94.2 19.55 53.65 29.9 19.7 43.7 49.45 106.55 20.1 39.7\n", - " 54.5 83.8 111.6 86.65 106.75 62.1 104.5 101.8 110.6 84.9\n", - " 93.2 24.4 85. 87.45 85.8 91.1 70.75 74.8 24.8 100.85\n", - " 101.35 68.25 105.1 79.15 57.2 94.8 102.5 69.2 95.45 100.95\n", - " 88.5 35. 64. 69.1 80.2 49.3 84.35 117.2 103.45 77.95\n", - " 109.95 94.75 25.2 19.9 44.8 80.05 107.35 47.85 70.8 29.5\n", - " 59.1 25.55 75.55 85.65 70.15 95.3 70.25 50.3 97.8 46.3\n", - " 106.3 75.35 89.4 88. 83.15 43.8 62.05 74.15 84.05 20.9\n", - " 105.9 99.5 44.15 53.9 85.45 85.05 44.1 90.2 50.85 59.2\n", - " 53.45 83.2 54.9 57.5 103.9 93.8 89.25 94.15 55.6 48.7\n", - " 19.25 104.9 54.85 19.3 79.85 75.5 73.75 96.05 68.4 20.65\n", - " 70.6 107.6 61.55 99.25 91.7 100.7 84.3 88.95 86.8 20.95\n", - " 50.7 53.4 101.9 59.5 87.8 41.9 83. 69.85 109.55 92.15\n", - " 97. 58.35 50.6 89.5 70.4 69.8 94.3 95.95 101.05 107.75\n", - " 54.6 71.3 94.7 104.15 90.55 60.8 98.8 98.15 35.35 103.15\n", - " 81.4 61.45 95.7 104.8 70.95 97.65 35.65 85.25 88.8 55.7\n", - " 85.2 91.15 83.85 45.9 91.4 91.5 51.3 21.1 104.75 106.15\n", - " 85.75 100.75 78.55 77.8 83.45 73.25 90.1 29.2 46.6 85.35\n", - " 54. 104.25 84.75 75.25 24.6 55.5 43.3 109.5 84.85 112.1\n", - " 95.05 50.35 74.6 74.2 69. 105.2 109.2 45.15 108.65 40.65\n", - " 55.35 90.05 68.05 96.2 102.1 23.4 92.2 43.9 80.5 89.8\n", - " 90.45 50.75 84.6 89.65 51.7 23.3 65.4 65.1 81.2 72.9\n", - " 74.5 60.3 75. 90.15 40. 99.45 69.05 59.7 86.25 45.65\n", - " 70.1 40.75 70.2 84.2 66.15 45.85 49.8 103.95 100.15 99.65\n", - " 73.7 50.05 60.25 105.75 87.3 54.25 85.3 50. 90.95 72.25\n", - " 96.1 25.15 71.25 113.8 24.55 50.15 100.5 74.45 81.9 69.7\n", - " 25.35 24.65 25.25 60. 24.1 109.9 35.5 87.55 88.4 50.8\n", - " 99. 96.55 59.75 111.5 24.25 30.55 101. 100. 98.05 71.15\n", - " 54.15 63.9 69.15 64.65 108.75 98.85 89.6 83.25 24.5 73.\n", - " 80.4 78.5 102. 48.95 18.25 54.55 89.05 96.6 77.15 35.05\n", - " 108.1 20.2 49.2 71.65 106.5 94.25 68.95 58.5 78.9 79.2\n", - " 109.45 29.15 76.05 24.45 66.5 89.35 73.6 82.65 49. 80.35\n", - " 25.45 55.8 110.9 77.75 26.2 79.05 80.85 98.4 56.35 50.4\n", - " 109.75 91.25 54.75 81.45 49.1 100.3 65.25 94.1 73.55 104.65\n", - " 44.55 54.45 105. 88.7 74.25 30.75 112.9 94.05 78.85 78.65\n", - " 74.75 105.65 96.5 70.85 73.9 45.45 109.65 65. 114.1 86.95\n", - " 105.45 25.4 102.55 24. 25.6 73.5 98.25 101.55 103.1 34.2\n", - " 43.75 111.95 100.65 55.95 116.05 45.75 82. 65.15 88.85 106.85\n", - " 80.15 109.25 56.1 118.6 24.15 115.5 111.3 80.6 20.8 35.2\n", - " 78.8 89.95 49.4 115.25 81.25 93.55 86.4 66.3 94.65 82.05\n", - " 72.1 34.7 109.4 40.25 42.9 44. 88.9 57.65 108.05 105.3\n", - " 102.6 73.85 61.35 57.55 29.25 84.55 111.75 107.7 63.7 24.75\n", - " 50.9 60.4 79.25 110.1 25.3 24.35 76.5 81.15 38.5 92.9\n", - " 93.5 84.7 66. 101.5 74.9 99.75 67.8 25.7 56.15 86.7\n", - " 50.55 54.35 45.35 59. 69.45 64.95 18.85 114.3 45.05 51.\n", - " 110.45 84.65 60.05 44.65 93.25 20.6 34.8 60.75 51.35 64.05\n", - " 94.6 100.25 98.9 97.7 40.3 46.2 24.9 65.7 63.35 50.1\n", - " 74. 38.9 65.45 98.7 99.35 95.8 67.5 78.15 26.1 78.05\n", - " 40.35 68.9 76. 82.3 29.45 59.15 44.75 90.8 106.7 67.95\n", - " 77.4 99.7 78.95 95.55 62.85 71.55 94.95 86.1 39.3 36.25\n", - " 23.9 98.6 103.65 99.9 39.85 60.5 103.85 24.85 89. 55.\n", - " 76.15 117.35 45.2 89.75 49.95 67.05 87.95 75.7 62.15 101.25\n", - " 115.15 86.55 28.6 56.4 73.3 98.65 33.6 79.9 104.05 70.05\n", - " 23.05 59.95 78.6 116.8 43.55 65.2 102.95 90.6 108.2 92.\n", - " 112.2 70.3 75.85 80.65 68.5 115.75 59.55 36.1 94. 61.15\n", - " 110.2 106.35 65.9 52.5 88.75 75.3 26. 99.4 73.15 66.4\n", - " 115.55 104.45 92.4 25.75 49.6 97.05 105.95 91.85 40.1 110.3\n", - " 85.15 60.95 46. 58.55 86.35 69.75 65.6 82.1 79.1 90.65\n", - " 110. 67.45 89.1 69.9 51.1 94.4 78.25 76.4 48.65 59.85\n", - " 80.3 91.8 18.8 64.75 89.45 85.6 54.1 80.9 90.85 48.75\n", - " 79.7 100.4 57.95 86.5 62.45 89.55 83.55 71.45 46.35 66.1\n", - " 75.4 70.45 21.05 69.35 40.55 75.65 60.6 101.15 98. 104.7\n", - " 93.9 86.45 98.5 78.2 88.45 69.55 83.75 98.1 53.35 69.4\n", - " 40.15 70.35 53.85 115.6 97.95 78.3 96.8 77.35 66.05 68.15\n", - " 92.45 45.55 93.4 88.15 79.35 79.75 105.15 79.3 105.5 92.7\n", - " 26.25 96.95 115.8 67.75 90.35 55.75 114.6 66.8 104.85 74.1\n", - " 118.75 85.9 101.3 21.2 24.2 102.8 99.95 115.85 35.1 99.1\n", - " 67.25 55.1 117.8 45.25 95.35 116.6 65.05 92.5 18.75 93.6\n", - " 104.4 70.7 108.95 26.45 86.2 51.2 75.8 36.15 61.2 99.85\n", - " 58.4 88.3 108.9 107.4 106.65 104.35 55.45 61.3 96.85 108.25\n", - " 105.05 66.9 110.7 38.25 54.95 79. 39.1 100.45 39.55 23.15\n", - " 72.45 60.1 91.55 35.8 113.15 53.95 99.3 51.55 78.75 54.7\n", - " 71.1 106.25 114.05 116.15 66.25 99.8 90. 54.05 97.25 83.05\n", - " 41.1 74.55 40.2 78.35 109.7 33.45 39.4 76.25 46.4 59.6\n", - " 108.5 58.95 63.05 64.4 83.9 117.45 59.05 76.55 62.5 29.4\n", - " 94.9 111.65 106.05 113.45 92.55 49.7 30.2 85.7 74.7 107.55\n", - " 23.85 76.1 39.2 39.15 59.8 49.75 35.75 60.15 84. 110.75\n", - " 76.35 18.9 98.35 91.65 44.35 47.95 63.6 53. 36.85 103.75\n", - " 56.75 59.65 45.5 106.45 30.05 44.7 ]\n", - "=============================================\n", - "Column: TotalCharges - Unique Values: [1059.55 181.8 2212.55 ... 552.95 7053.35 301.55]\n", - "=============================================\n" - ] - } - ], - "source": [ - "# check unique values of each column\n", - "for column in df_excel.columns:\n", - " print('Column: {} - Unique Values: {}'.format(column, df_excel[column].unique()))\n", - " print('==='*15)" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
SeniorCitizentenureMonthlyCharges
count2000.0000002000.0000002000.000000
mean0.16150031.85300063.933325
std0.36808424.63267730.136858
min0.0000000.00000018.250000
25%0.0000008.00000034.250000
50%0.00000027.00000069.800000
75%0.00000055.00000089.275000
max1.00000072.000000118.750000
\n", - "
" - ], - "text/plain": [ - " SeniorCitizen tenure MonthlyCharges\n", - "count 2000.000000 2000.000000 2000.000000\n", - "mean 0.161500 31.853000 63.933325\n", - "std 0.368084 24.632677 30.136858\n", - "min 0.000000 0.000000 18.250000\n", - "25% 0.000000 8.000000 34.250000\n", - "50% 0.000000 27.000000 69.800000\n", - "75% 0.000000 55.000000 89.275000\n", - "max 1.000000 72.000000 118.750000" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Summary statistics of df_excel to get insights into the distribution and basic characteristics of the numerical variables\n", - "df_excel.describe()" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "customerID 0\n", - "gender 0\n", - "SeniorCitizen 0\n", - "Partner 0\n", - "Dependents 0\n", - "tenure 0\n", - "PhoneService 0\n", - "MultipleLines 0\n", - "InternetService 0\n", - "OnlineSecurity 0\n", - "OnlineBackup 0\n", - "DeviceProtection 0\n", - "TechSupport 0\n", - "StreamingTV 0\n", - "StreamingMovies 0\n", - "Contract 0\n", - "PaperlessBilling 0\n", - "PaymentMethod 0\n", - "MonthlyCharges 0\n", - "TotalCharges 0\n", - "dtype: int64" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Handling missing values\n", - "df_excel.isnull().sum()" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "As expected, the test dataset does not have the \"Churn\" column." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explore df_csv" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurity...DeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalChargesChurn
05600-PDUJFMale0NoNo6YesNoDSLNo...NoYesNoNoMonth-to-monthYesCredit card (automatic)49.50312.7No
18292-TYSPYMale0NoNo19YesNoDSLNo...YesYesNoNoMonth-to-monthYesCredit card (automatic)55.001046.5Yes
20567-XRHCUFemale0YesYes69NoNo phone serviceDSLYes...YesNoNoYesTwo yearYesCredit card (automatic)43.952960.1No
31867-BDVFHMale0YesYes11YesYesFiber opticNo...NoNoNoNoMonth-to-monthYesElectronic check74.35834.2Yes
42067-QYTCFFemale0YesNo64YesYesFiber opticNo...YesYesYesYesMonth-to-monthYesElectronic check111.156953.4No
\n", - "

5 rows × 21 columns

\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n", - "0 5600-PDUJF Male 0 No No 6 Yes \n", - "1 8292-TYSPY Male 0 No No 19 Yes \n", - "2 0567-XRHCU Female 0 Yes Yes 69 No \n", - "3 1867-BDVFH Male 0 Yes Yes 11 Yes \n", - "4 2067-QYTCF Female 0 Yes No 64 Yes \n", - "\n", - " MultipleLines InternetService OnlineSecurity ... DeviceProtection \\\n", - "0 No DSL No ... No \n", - "1 No DSL No ... Yes \n", - "2 No phone service DSL Yes ... Yes \n", - "3 Yes Fiber optic No ... No \n", - "4 Yes Fiber optic No ... Yes \n", - "\n", - " TechSupport StreamingTV StreamingMovies Contract PaperlessBilling \\\n", - "0 Yes No No Month-to-month Yes \n", - "1 Yes No No Month-to-month Yes \n", - "2 No No Yes Two year Yes \n", - "3 No No No Month-to-month Yes \n", - "4 Yes Yes Yes Month-to-month Yes \n", - "\n", - " PaymentMethod MonthlyCharges TotalCharges Churn \n", - "0 Credit card (automatic) 49.50 312.7 No \n", - "1 Credit card (automatic) 55.00 1046.5 Yes \n", - "2 Credit card (automatic) 43.95 2960.1 No \n", - "3 Electronic check 74.35 834.2 Yes \n", - "4 Electronic check 111.15 6953.4 No \n", - "\n", - "[5 rows x 21 columns]" - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Call the dataset df_csv\n", - "df_csv.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "RangeIndex: 2043 entries, 0 to 2042\n", - "Data columns (total 21 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 customerID 2043 non-null object \n", - " 1 gender 2043 non-null object \n", - " 2 SeniorCitizen 2043 non-null int64 \n", - " 3 Partner 2043 non-null object \n", - " 4 Dependents 2043 non-null object \n", - " 5 tenure 2043 non-null int64 \n", - " 6 PhoneService 2043 non-null object \n", - " 7 MultipleLines 2043 non-null object \n", - " 8 InternetService 2043 non-null object \n", - " 9 OnlineSecurity 2043 non-null object \n", - " 10 OnlineBackup 2043 non-null object \n", - " 11 DeviceProtection 2043 non-null object \n", - " 12 TechSupport 2043 non-null object \n", - " 13 StreamingTV 2043 non-null object \n", - " 14 StreamingMovies 2043 non-null object \n", - " 15 Contract 2043 non-null object \n", - " 16 PaperlessBilling 2043 non-null object \n", - " 17 PaymentMethod 2043 non-null object \n", - " 18 MonthlyCharges 2043 non-null float64\n", - " 19 TotalCharges 2043 non-null object \n", - " 20 Churn 2043 non-null object \n", - "dtypes: float64(1), int64(2), object(18)\n", - "memory usage: 335.3+ KB\n" - ] - } - ], - "source": [ - "# Column information\n", - "df_csv.info()" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Column: customerID - Unique Values: ['5600-PDUJF' '8292-TYSPY' '0567-XRHCU' ... '4801-JZAZL' '8361-LTMKD'\n", - " '3186-AJIEK']\n", - "=============================================\n", - "Column: gender - Unique Values: ['Male' 'Female']\n", - "=============================================\n", - "Column: SeniorCitizen - Unique Values: [0 1]\n", - "=============================================\n", - "Column: Partner - Unique Values: ['No' 'Yes']\n", - "=============================================\n", - "Column: Dependents - Unique Values: ['No' 'Yes']\n", - "=============================================\n", - "Column: tenure - Unique Values: [ 6 19 69 11 64 39 15 25 66 61 43 12 23 71 34 5 41 72 14 1 10 7 9 48\n", - " 20 16 2 22 35 54 56 18 68 53 30 36 55 21 33 44 4 49 42 67 40 45 57 8\n", - " 65 3 17 28 52 47 50 46 29 27 13 24 62 26 60 51 70 59 38 37 0 58 31 32\n", - " 63]\n", - "=============================================\n", - "Column: PhoneService - Unique Values: ['Yes' 'No']\n", - "=============================================\n", - "Column: MultipleLines - Unique Values: ['No' 'No phone service' 'Yes']\n", - "=============================================\n", - "Column: InternetService - Unique Values: ['DSL' 'Fiber optic' 'No']\n", - "=============================================\n", - "Column: OnlineSecurity - Unique Values: ['No' 'Yes' 'No internet service']\n", - "=============================================\n", - "Column: OnlineBackup - Unique Values: ['No' 'Yes' 'No internet service']\n", - "=============================================\n", - "Column: DeviceProtection - Unique Values: ['No' 'Yes' 'No internet service']\n", - "=============================================\n", - "Column: TechSupport - Unique Values: ['Yes' 'No' 'No internet service']\n", - "=============================================\n", - "Column: StreamingTV - Unique Values: ['No' 'Yes' 'No internet service']\n", - "=============================================\n", - "Column: StreamingMovies - Unique Values: ['No' 'Yes' 'No internet service']\n", - "=============================================\n", - "Column: Contract - Unique Values: ['Month-to-month' 'Two year' 'One year']\n", - "=============================================\n", - "Column: PaperlessBilling - Unique Values: ['Yes' 'No']\n", - "=============================================\n", - "Column: PaymentMethod - Unique Values: ['Credit card (automatic)' 'Electronic check' 'Mailed check'\n", - " 'Bank transfer (automatic)']\n", - "=============================================\n", - "Column: MonthlyCharges - Unique Values: [ 49.5 55. 43.95 ... 78.7 60.65 103.2 ]\n", - "=============================================\n", - "Column: TotalCharges - Unique Values: ['312.7' '1046.5' '2960.1' ... '346.45' '306.6' '6844.5']\n", - "=============================================\n", - "Column: Churn - Unique Values: ['No' 'Yes']\n", - "=============================================\n" - ] - } - ], - "source": [ - "# check unique values of each column\n", - "for column in df_csv.columns:\n", - " print('Column: {} - Unique Values: {}'.format(column, df_csv[column].unique()))\n", - " print('==='*15)" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
SeniorCitizentenureMonthlyCharges
count2043.0000002043.0000002043.000000
mean0.16838032.64904664.712555
std0.37429524.37624829.970010
min0.0000000.00000018.550000
25%0.0000009.00000035.825000
50%0.00000030.00000070.250000
75%0.00000055.00000089.625000
max1.00000072.000000118.350000
\n", - "
" - ], - "text/plain": [ - " SeniorCitizen tenure MonthlyCharges\n", - "count 2043.000000 2043.000000 2043.000000\n", - "mean 0.168380 32.649046 64.712555\n", - "std 0.374295 24.376248 29.970010\n", - "min 0.000000 0.000000 18.550000\n", - "25% 0.000000 9.000000 35.825000\n", - "50% 0.000000 30.000000 70.250000\n", - "75% 0.000000 55.000000 89.625000\n", - "max 1.000000 72.000000 118.350000" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Summary statistics of df_csv to get insights into the distribution and basic characteristics of the numerical variables\n", - "df_csv.describe()" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "customerID 0\n", - "gender 0\n", - "SeniorCitizen 0\n", - "Partner 0\n", - "Dependents 0\n", - "tenure 0\n", - "PhoneService 0\n", - "MultipleLines 0\n", - "InternetService 0\n", - "OnlineSecurity 0\n", - "OnlineBackup 0\n", - "DeviceProtection 0\n", - "TechSupport 0\n", - "StreamingTV 0\n", - "StreamingMovies 0\n", - "Contract 0\n", - "PaperlessBilling 0\n", - "PaymentMethod 0\n", - "MonthlyCharges 0\n", - "TotalCharges 0\n", - "Churn 0\n", - "dtype: int64" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Handling missing values\n", - "df_csv.isnull().sum()" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "Observations:\n", - "Dataset df_db has missing values in a number of columns. \n", - "SeniorCitizen is supposed to be boolean type (0,1) in df_excel; but (yes, no) in df_db\n", - "Partner is supposed to be boolean type but is object in df_excel\n", - "Dependants is supposed to be boolean type but is object in df_excel\n", - "PhoneService is supposed to be boolean type but is object in df_excel\n", - "MultiplesLines is (None, True, or False) in df_db; but True or False in df_excel\n", - "OnlineSecurity is (True/False) in df_db; but Yes/No/No Internet in df_excel\n", - "DeviceProtection is (True/False) in df_db; but (yes/no) in df_excel\n", - "\n", - "TotalCharges is supposed to be float64 type but is object in df_excel" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.13" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/.ipynb_checkpoints/LP2_Classification_Telco_customer_churn-checkpoint.ipynb b/.ipynb_checkpoints/LP2_Classification_Telco_customer_churn-checkpoint.ipynb deleted file mode 100644 index f537ed6..0000000 --- a/.ipynb_checkpoints/LP2_Classification_Telco_customer_churn-checkpoint.ipynb +++ /dev/null @@ -1,2326 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Business Understanding" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Business Understanding:\n", - "\n", - "The telecom company has provided access to three different datasets for a classification project. The first dataset consists of 3000 records and is stored in a remote database hosted on Microsoft SQL Server. The second dataset contains 2000 records and is stored on OneDrive. The third dataset is hosted on a GitHub repository.\n", - "\n", - "## Objectives of the project\n", - "\n", - "To assist the telecom company understand their data: The project aims to provide insights into the telecom company's data, including customer demographics, services availed, and payment details. This understanding will enable the company to make informed business decisions.\n", - "\n", - "Find the lifetime value of each customer: By analyzing the data, the project aims to identify factors that influence the rate at which customers churn. Understanding customer behavior and identifying key predictors will help the telecom company estimate the lifetime value of each customer.\n", - "\n", - "Predict customer churn: The project involves developing a predictive model to determine whether a customer is likely to churn or not. By using machine learning algorithms such as logistic regression, decision trees, support vector machines, random forest, etc., the project aims to build a model that accurately predicts customer churn.\n", - "\n", - "## Methodology\n", - "\n", - "To achieve these objectives, the project will follow the CRISP-DM framework and involve the following steps:\n", - "\n", - "Data exploration: Explore the datasets to gain insights into the available variables, their distributions, and relationships. This step will provide an initial understanding of the data and help identify any data quality issues.\n", - "\n", - "Missing value computations: Identify missing values in the datasets and decide on an appropriate strategy for handling them, such as imputation or removal of missing data points.\n", - "\n", - "Feature engineering: Perform feature engineering to transform and create new variables that can potentially improve the predictive power of the models. This step may involve encoding categorical variables, scaling numerical variables, or creating interaction terms.\n", - "\n", - "Model development: Utilize machine learning algorithms such as logistic regression, decision trees, support vector machines, random forest, etc., to develop predictive models for customer churn. Train the models on a subset of the data and evaluate their performance using appropriate metrics.\n", - "\n", - "Model evaluation and interpretation: Evaluate the trained models using evaluation metrics such as accuracy, precision, recall, and F1-score. Interpret the models to understand the factors driving customer churn and their relative importance.\n", - "\n", - "Model optimization and hyperparameter tuning: Fine-tune the models by optimizing their hyperparameters to improve their performance. This step may involve techniques like grid search or random search to find the optimal combination of hyperparameters." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Hypothesis\n", - "H0: The churn rate of customers in the telecom company is not significantly influenced by various factors related to their demographics, services, and payment methods.\n", - "H1: The churn rate of customers in the telecom company is influenced by various factors related to their demographics, services, and payment methods.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Questions\n", - "1. Is there a relationship between the monthly charges and the likelihood of churn?\n", - "2. Do customers who have internet service, specifically fiber optic, exhibit a higher churn rate compared to those with DSL or no internet service?\n", - "3. Does the availability of online security, online backup, device protection, and tech support impact the churn rate?\n", - "4. How does the churn rate vary based on the customers' gender?\n", - "5. Does the presence of a partner influence the likelihood of churn?\n", - "6. Is there a correlation between the tenure of customers and their churn rate?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Load Datasets" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load first dataset from database" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "# Import necessary libraries\n", - "import pyodbc\n", - "import pandas as pd\n" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "# Establish a connection\n", - "server = 'dap-projects-database.database.windows.net'\n", - "database = 'dapDB'\n", - "username = 'dataAnalyst_LP2'\n", - "password = 'A3g@3kR$2y'\n", - "\n", - "conn_str = f'DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}'\n", - "conn = pyodbc.connect(conn_str)\n" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "c:\\Users\\PSL-CUDJOE\\anaconda3\\lib\\site-packages\\pandas\\io\\sql.py:761: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy\n", - " warnings.warn(\n" - ] - } - ], - "source": [ - "# Query the database to retrieve the data\n", - "query = 'SELECT TOP 3000 * FROM LP2_Telco_churn_first_3000'\n", - "df_db = pd.read_sql(query, conn)\n" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "# Close connection\n", - "conn.close()\n" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurity...DeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalChargesChurn
07590-VHVEGFemaleFalseTrueFalse1FalseNoneDSLFalse...FalseFalseFalseFalseMonth-to-monthTrueElectronic check29.85000029.850000False
15575-GNVDEMaleFalseFalseFalse34TrueFalseDSLTrue...TrueFalseFalseFalseOne yearFalseMailed check56.9500011889.500000False
23668-QPYBKMaleFalseFalseFalse2TrueFalseDSLTrue...FalseFalseFalseFalseMonth-to-monthTrueMailed check53.849998108.150002True
37795-CFOCWMaleFalseFalseFalse45FalseNoneDSLTrue...TrueTrueFalseFalseOne yearFalseBank transfer (automatic)42.2999991840.750000False
49237-HQITUFemaleFalseFalseFalse2TrueFalseFiber opticFalse...FalseFalseFalseFalseMonth-to-monthTrueElectronic check70.699997151.649994True
\n", - "

5 rows × 21 columns

\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure \\\n", - "0 7590-VHVEG Female False True False 1 \n", - "1 5575-GNVDE Male False False False 34 \n", - "2 3668-QPYBK Male False False False 2 \n", - "3 7795-CFOCW Male False False False 45 \n", - "4 9237-HQITU Female False False False 2 \n", - "\n", - " PhoneService MultipleLines InternetService OnlineSecurity ... \\\n", - "0 False None DSL False ... \n", - "1 True False DSL True ... \n", - "2 True False DSL True ... \n", - "3 False None DSL True ... \n", - "4 True False Fiber optic False ... \n", - "\n", - " DeviceProtection TechSupport StreamingTV StreamingMovies Contract \\\n", - "0 False False False False Month-to-month \n", - "1 True False False False One year \n", - "2 False False False False Month-to-month \n", - "3 True True False False One year \n", - "4 False False False False Month-to-month \n", - "\n", - " PaperlessBilling PaymentMethod MonthlyCharges TotalCharges \\\n", - "0 True Electronic check 29.850000 29.850000 \n", - "1 False Mailed check 56.950001 1889.500000 \n", - "2 True Mailed check 53.849998 108.150002 \n", - "3 False Bank transfer (automatic) 42.299999 1840.750000 \n", - "4 True Electronic check 70.699997 151.649994 \n", - "\n", - " Churn \n", - "0 False \n", - "1 False \n", - "2 True \n", - "3 False \n", - "4 True \n", - "\n", - "[5 rows x 21 columns]" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_db.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load second dataset (excelfile)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "# Read the excel file into a pandas dataframe\n", - "df_excel = pd.read_excel('Telco-churn-second-2000.xlsx')\n" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurityOnlineBackupDeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalCharges
07613-LLQFOMale0NoNo12YesYesFiber opticNoNoNoNoYesNoMonth-to-monthYesElectronic check84.451059.55
14568-TTZRTMale0NoNo9YesNoNoNo internet serviceNo internet serviceNo internet serviceNo internet serviceNo internet serviceNo internet serviceMonth-to-monthNoMailed check20.40181.8
29513-DXHDAMale0NoNo27YesNoDSLYesNoYesYesYesYesOne yearNoElectronic check81.702212.55
32640-PMGFLMale0NoYes27YesYesFiber opticNoNoNoYesNoNoMonth-to-monthYesElectronic check79.502180.55
43801-HMYNLMale0YesYes1YesNoFiber opticNoNoNoNoYesYesMonth-to-monthNoMailed check89.1589.15
\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n", - "0 7613-LLQFO Male 0 No No 12 Yes \n", - "1 4568-TTZRT Male 0 No No 9 Yes \n", - "2 9513-DXHDA Male 0 No No 27 Yes \n", - "3 2640-PMGFL Male 0 No Yes 27 Yes \n", - "4 3801-HMYNL Male 0 Yes Yes 1 Yes \n", - "\n", - " MultipleLines InternetService OnlineSecurity OnlineBackup \\\n", - "0 Yes Fiber optic No No \n", - "1 No No No internet service No internet service \n", - "2 No DSL Yes No \n", - "3 Yes Fiber optic No No \n", - "4 No Fiber optic No No \n", - "\n", - " DeviceProtection TechSupport StreamingTV \\\n", - "0 No No Yes \n", - "1 No internet service No internet service No internet service \n", - "2 Yes Yes Yes \n", - "3 No Yes No \n", - "4 No No Yes \n", - "\n", - " StreamingMovies Contract PaperlessBilling PaymentMethod \\\n", - "0 No Month-to-month Yes Electronic check \n", - "1 No internet service Month-to-month No Mailed check \n", - "2 Yes One year No Electronic check \n", - "3 No Month-to-month Yes Electronic check \n", - "4 Yes Month-to-month No Mailed check \n", - "\n", - " MonthlyCharges TotalCharges \n", - "0 84.45 1059.55 \n", - "1 20.40 181.8 \n", - "2 81.70 2212.55 \n", - "3 79.50 2180.55 \n", - "4 89.15 89.15 " - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_excel.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load third dataset (csv file)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "# Read third dataset\n", - "df_csv = pd.read_csv('LP2_Telco-churn-last-2000.csv')" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurity...DeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalChargesChurn
05600-PDUJFMale0NoNo6YesNoDSLNo...NoYesNoNoMonth-to-monthYesCredit card (automatic)49.50312.7No
18292-TYSPYMale0NoNo19YesNoDSLNo...YesYesNoNoMonth-to-monthYesCredit card (automatic)55.001046.5Yes
20567-XRHCUFemale0YesYes69NoNo phone serviceDSLYes...YesNoNoYesTwo yearYesCredit card (automatic)43.952960.1No
31867-BDVFHMale0YesYes11YesYesFiber opticNo...NoNoNoNoMonth-to-monthYesElectronic check74.35834.2Yes
42067-QYTCFFemale0YesNo64YesYesFiber opticNo...YesYesYesYesMonth-to-monthYesElectronic check111.156953.4No
\n", - "

5 rows × 21 columns

\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n", - "0 5600-PDUJF Male 0 No No 6 Yes \n", - "1 8292-TYSPY Male 0 No No 19 Yes \n", - "2 0567-XRHCU Female 0 Yes Yes 69 No \n", - "3 1867-BDVFH Male 0 Yes Yes 11 Yes \n", - "4 2067-QYTCF Female 0 Yes No 64 Yes \n", - "\n", - " MultipleLines InternetService OnlineSecurity ... DeviceProtection \\\n", - "0 No DSL No ... No \n", - "1 No DSL No ... Yes \n", - "2 No phone service DSL Yes ... Yes \n", - "3 Yes Fiber optic No ... No \n", - "4 Yes Fiber optic No ... Yes \n", - "\n", - " TechSupport StreamingTV StreamingMovies Contract PaperlessBilling \\\n", - "0 Yes No No Month-to-month Yes \n", - "1 Yes No No Month-to-month Yes \n", - "2 No No Yes Two year Yes \n", - "3 No No No Month-to-month Yes \n", - "4 Yes Yes Yes Month-to-month Yes \n", - "\n", - " PaymentMethod MonthlyCharges TotalCharges Churn \n", - "0 Credit card (automatic) 49.50 312.7 No \n", - "1 Credit card (automatic) 55.00 1046.5 Yes \n", - "2 Credit card (automatic) 43.95 2960.1 No \n", - "3 Electronic check 74.35 834.2 Yes \n", - "4 Electronic check 111.15 6953.4 No \n", - "\n", - "[5 rows x 21 columns]" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_csv.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Data Exploration" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explore individual datasets " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explore df_db\n" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurity...DeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalChargesChurn
07590-VHVEGFemaleFalseTrueFalse1FalseNoneDSLFalse...FalseFalseFalseFalseMonth-to-monthTrueElectronic check29.85000029.850000False
15575-GNVDEMaleFalseFalseFalse34TrueFalseDSLTrue...TrueFalseFalseFalseOne yearFalseMailed check56.9500011889.500000False
23668-QPYBKMaleFalseFalseFalse2TrueFalseDSLTrue...FalseFalseFalseFalseMonth-to-monthTrueMailed check53.849998108.150002True
37795-CFOCWMaleFalseFalseFalse45FalseNoneDSLTrue...TrueTrueFalseFalseOne yearFalseBank transfer (automatic)42.2999991840.750000False
49237-HQITUFemaleFalseFalseFalse2TrueFalseFiber opticFalse...FalseFalseFalseFalseMonth-to-monthTrueElectronic check70.699997151.649994True
..................................................................
29952209-XADXFFemaleFalseFalseFalse1FalseNoneDSLFalse...FalseFalseFalseFalseMonth-to-monthFalseBank transfer (automatic)25.25000025.250000False
29966620-JDYNWFemaleFalseFalseFalse18TrueTrueDSLTrue...TrueFalseFalseFalseMonth-to-monthTrueMailed check60.5999981156.349976False
29971891-FZYSAMaleTrueTrueFalse69TrueTrueFiber opticFalse...FalseFalseTrueFalseMonth-to-monthTrueElectronic check89.9499976143.149902True
29984770-UEZOXMaleFalseFalseFalse2TrueFalseFiber opticFalse...FalseFalseFalseFalseMonth-to-monthTrueElectronic check74.750000144.800003False
29991A1:U3001038-RQOSTMaleFalseTrueTrue19TrueFalseNoNone...NoneNoneNoneNoneMonth-to-monthFalseMailed check20.600000414.950012False
\n", - "

3000 rows × 21 columns

\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure \\\n", - "0 7590-VHVEG Female False True False 1 \n", - "1 5575-GNVDE Male False False False 34 \n", - "2 3668-QPYBK Male False False False 2 \n", - "3 7795-CFOCW Male False False False 45 \n", - "4 9237-HQITU Female False False False 2 \n", - "... ... ... ... ... ... ... \n", - "2995 2209-XADXF Female False False False 1 \n", - "2996 6620-JDYNW Female False False False 18 \n", - "2997 1891-FZYSA Male True True False 69 \n", - "2998 4770-UEZOX Male False False False 2 \n", - "2999 1A1:U3001038-RQOST Male False True True 19 \n", - "\n", - " PhoneService MultipleLines InternetService OnlineSecurity ... \\\n", - "0 False None DSL False ... \n", - "1 True False DSL True ... \n", - "2 True False DSL True ... \n", - "3 False None DSL True ... \n", - "4 True False Fiber optic False ... \n", - "... ... ... ... ... ... \n", - "2995 False None DSL False ... \n", - "2996 True True DSL True ... \n", - "2997 True True Fiber optic False ... \n", - "2998 True False Fiber optic False ... \n", - "2999 True False No None ... \n", - "\n", - " DeviceProtection TechSupport StreamingTV StreamingMovies Contract \\\n", - "0 False False False False Month-to-month \n", - "1 True False False False One year \n", - "2 False False False False Month-to-month \n", - "3 True True False False One year \n", - "4 False False False False Month-to-month \n", - "... ... ... ... ... ... \n", - "2995 False False False False Month-to-month \n", - "2996 True False False False Month-to-month \n", - "2997 False False True False Month-to-month \n", - "2998 False False False False Month-to-month \n", - "2999 None None None None Month-to-month \n", - "\n", - " PaperlessBilling PaymentMethod MonthlyCharges TotalCharges \\\n", - "0 True Electronic check 29.850000 29.850000 \n", - "1 False Mailed check 56.950001 1889.500000 \n", - "2 True Mailed check 53.849998 108.150002 \n", - "3 False Bank transfer (automatic) 42.299999 1840.750000 \n", - "4 True Electronic check 70.699997 151.649994 \n", - "... ... ... ... ... \n", - "2995 False Bank transfer (automatic) 25.250000 25.250000 \n", - "2996 True Mailed check 60.599998 1156.349976 \n", - "2997 True Electronic check 89.949997 6143.149902 \n", - "2998 True Electronic check 74.750000 144.800003 \n", - "2999 False Mailed check 20.600000 414.950012 \n", - "\n", - " Churn \n", - "0 False \n", - "1 False \n", - "2 True \n", - "3 False \n", - "4 True \n", - "... ... \n", - "2995 False \n", - "2996 False \n", - "2997 True \n", - "2998 False \n", - "2999 False \n", - "\n", - "[3000 rows x 21 columns]" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Call df_db\n", - "df_db" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "RangeIndex: 3000 entries, 0 to 2999\n", - "Data columns (total 21 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 customerID 3000 non-null object \n", - " 1 gender 3000 non-null object \n", - " 2 SeniorCitizen 3000 non-null bool \n", - " 3 Partner 3000 non-null bool \n", - " 4 Dependents 3000 non-null bool \n", - " 5 tenure 3000 non-null int64 \n", - " 6 PhoneService 3000 non-null bool \n", - " 7 MultipleLines 2731 non-null object \n", - " 8 InternetService 3000 non-null object \n", - " 9 OnlineSecurity 2349 non-null object \n", - " 10 OnlineBackup 2349 non-null object \n", - " 11 DeviceProtection 2349 non-null object \n", - " 12 TechSupport 2349 non-null object \n", - " 13 StreamingTV 2349 non-null object \n", - " 14 StreamingMovies 2349 non-null object \n", - " 15 Contract 3000 non-null object \n", - " 16 PaperlessBilling 3000 non-null bool \n", - " 17 PaymentMethod 3000 non-null object \n", - " 18 MonthlyCharges 3000 non-null float64\n", - " 19 TotalCharges 2995 non-null float64\n", - " 20 Churn 2999 non-null object \n", - "dtypes: bool(5), float64(2), int64(1), object(13)\n", - "memory usage: 389.8+ KB\n" - ] - } - ], - "source": [ - "# Column information\n", - "df_db.info()" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "df_db has all 21 columns arranged as decribed in the readme file. The format for CustomerID in row 2999 is different from the rest. " - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
tenureMonthlyChargesTotalCharges
count3000.0000003000.0000002995.000000
mean32.52733365.3474002301.278315
std24.63776830.1370532274.987884
min0.00000018.40000018.799999
25%9.00000035.787499415.250000
50%29.00000070.9000021404.650024
75%56.00000090.2625013868.725098
max72.000000118.6500028564.750000
\n", - "
" - ], - "text/plain": [ - " tenure MonthlyCharges TotalCharges\n", - "count 3000.000000 3000.000000 2995.000000\n", - "mean 32.527333 65.347400 2301.278315\n", - "std 24.637768 30.137053 2274.987884\n", - "min 0.000000 18.400000 18.799999\n", - "25% 9.000000 35.787499 415.250000\n", - "50% 29.000000 70.900002 1404.650024\n", - "75% 56.000000 90.262501 3868.725098\n", - "max 72.000000 118.650002 8564.750000" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Summary statistics of df_db to get insights into the distribution and basic characteristics of the numerical variables\n", - "df_db.describe()" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "customerID 0\n", - "gender 0\n", - "SeniorCitizen 0\n", - "Partner 0\n", - "Dependents 0\n", - "tenure 0\n", - "PhoneService 0\n", - "MultipleLines 269\n", - "InternetService 0\n", - "OnlineSecurity 651\n", - "OnlineBackup 651\n", - "DeviceProtection 651\n", - "TechSupport 651\n", - "StreamingTV 651\n", - "StreamingMovies 651\n", - "Contract 0\n", - "PaperlessBilling 0\n", - "PaymentMethod 0\n", - "MonthlyCharges 0\n", - "TotalCharges 5\n", - "Churn 1\n", - "dtype: int64" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Handling missing values\n", - "df_db.isnull().sum()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explore df_excel" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurityOnlineBackupDeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalCharges
07613-LLQFOMale0NoNo12YesYesFiber opticNoNoNoNoYesNoMonth-to-monthYesElectronic check84.451059.55
14568-TTZRTMale0NoNo9YesNoNoNo internet serviceNo internet serviceNo internet serviceNo internet serviceNo internet serviceNo internet serviceMonth-to-monthNoMailed check20.40181.8
29513-DXHDAMale0NoNo27YesNoDSLYesNoYesYesYesYesOne yearNoElectronic check81.702212.55
32640-PMGFLMale0NoYes27YesYesFiber opticNoNoNoYesNoNoMonth-to-monthYesElectronic check79.502180.55
43801-HMYNLMale0YesYes1YesNoFiber opticNoNoNoNoYesYesMonth-to-monthNoMailed check89.1589.15
\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n", - "0 7613-LLQFO Male 0 No No 12 Yes \n", - "1 4568-TTZRT Male 0 No No 9 Yes \n", - "2 9513-DXHDA Male 0 No No 27 Yes \n", - "3 2640-PMGFL Male 0 No Yes 27 Yes \n", - "4 3801-HMYNL Male 0 Yes Yes 1 Yes \n", - "\n", - " MultipleLines InternetService OnlineSecurity OnlineBackup \\\n", - "0 Yes Fiber optic No No \n", - "1 No No No internet service No internet service \n", - "2 No DSL Yes No \n", - "3 Yes Fiber optic No No \n", - "4 No Fiber optic No No \n", - "\n", - " DeviceProtection TechSupport StreamingTV \\\n", - "0 No No Yes \n", - "1 No internet service No internet service No internet service \n", - "2 Yes Yes Yes \n", - "3 No Yes No \n", - "4 No No Yes \n", - "\n", - " StreamingMovies Contract PaperlessBilling PaymentMethod \\\n", - "0 No Month-to-month Yes Electronic check \n", - "1 No internet service Month-to-month No Mailed check \n", - "2 Yes One year No Electronic check \n", - "3 No Month-to-month Yes Electronic check \n", - "4 Yes Month-to-month No Mailed check \n", - "\n", - " MonthlyCharges TotalCharges \n", - "0 84.45 1059.55 \n", - "1 20.40 181.8 \n", - "2 81.70 2212.55 \n", - "3 79.50 2180.55 \n", - "4 89.15 89.15 " - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Call the dataset df_excel\n", - "df_excel.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "RangeIndex: 2000 entries, 0 to 1999\n", - "Data columns (total 20 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 customerID 2000 non-null object \n", - " 1 gender 2000 non-null object \n", - " 2 SeniorCitizen 2000 non-null int64 \n", - " 3 Partner 2000 non-null object \n", - " 4 Dependents 2000 non-null object \n", - " 5 tenure 2000 non-null int64 \n", - " 6 PhoneService 2000 non-null object \n", - " 7 MultipleLines 2000 non-null object \n", - " 8 InternetService 2000 non-null object \n", - " 9 OnlineSecurity 2000 non-null object \n", - " 10 OnlineBackup 2000 non-null object \n", - " 11 DeviceProtection 2000 non-null object \n", - " 12 TechSupport 2000 non-null object \n", - " 13 StreamingTV 2000 non-null object \n", - " 14 StreamingMovies 2000 non-null object \n", - " 15 Contract 2000 non-null object \n", - " 16 PaperlessBilling 2000 non-null object \n", - " 17 PaymentMethod 2000 non-null object \n", - " 18 MonthlyCharges 2000 non-null float64\n", - " 19 TotalCharges 2000 non-null object \n", - "dtypes: float64(1), int64(2), object(17)\n", - "memory usage: 312.6+ KB\n" - ] - } - ], - "source": [ - "# Column information\n", - "df_excel.info()" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
SeniorCitizentenureMonthlyCharges
count2000.0000002000.0000002000.000000
mean0.16150031.85300063.933325
std0.36808424.63267730.136858
min0.0000000.00000018.250000
25%0.0000008.00000034.250000
50%0.00000027.00000069.800000
75%0.00000055.00000089.275000
max1.00000072.000000118.750000
\n", - "
" - ], - "text/plain": [ - " SeniorCitizen tenure MonthlyCharges\n", - "count 2000.000000 2000.000000 2000.000000\n", - "mean 0.161500 31.853000 63.933325\n", - "std 0.368084 24.632677 30.136858\n", - "min 0.000000 0.000000 18.250000\n", - "25% 0.000000 8.000000 34.250000\n", - "50% 0.000000 27.000000 69.800000\n", - "75% 0.000000 55.000000 89.275000\n", - "max 1.000000 72.000000 118.750000" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Summary statistics of df_excel to get insights into the distribution and basic characteristics of the numerical variables\n", - "df_excel.describe()" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "customerID 0\n", - "gender 0\n", - "SeniorCitizen 0\n", - "Partner 0\n", - "Dependents 0\n", - "tenure 0\n", - "PhoneService 0\n", - "MultipleLines 0\n", - "InternetService 0\n", - "OnlineSecurity 0\n", - "OnlineBackup 0\n", - "DeviceProtection 0\n", - "TechSupport 0\n", - "StreamingTV 0\n", - "StreamingMovies 0\n", - "Contract 0\n", - "PaperlessBilling 0\n", - "PaymentMethod 0\n", - "MonthlyCharges 0\n", - "TotalCharges 0\n", - "dtype: int64" - ] - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Handling missing values\n", - "df_excel.isnull().sum()" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "As expected, the test dataset does not have the \"Churn\" column." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Explore df_csv" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurity...DeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalChargesChurn
05600-PDUJFMale0NoNo6YesNoDSLNo...NoYesNoNoMonth-to-monthYesCredit card (automatic)49.50312.7No
18292-TYSPYMale0NoNo19YesNoDSLNo...YesYesNoNoMonth-to-monthYesCredit card (automatic)55.001046.5Yes
20567-XRHCUFemale0YesYes69NoNo phone serviceDSLYes...YesNoNoYesTwo yearYesCredit card (automatic)43.952960.1No
31867-BDVFHMale0YesYes11YesYesFiber opticNo...NoNoNoNoMonth-to-monthYesElectronic check74.35834.2Yes
42067-QYTCFFemale0YesNo64YesYesFiber opticNo...YesYesYesYesMonth-to-monthYesElectronic check111.156953.4No
\n", - "

5 rows × 21 columns

\n", - "
" - ], - "text/plain": [ - " customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n", - "0 5600-PDUJF Male 0 No No 6 Yes \n", - "1 8292-TYSPY Male 0 No No 19 Yes \n", - "2 0567-XRHCU Female 0 Yes Yes 69 No \n", - "3 1867-BDVFH Male 0 Yes Yes 11 Yes \n", - "4 2067-QYTCF Female 0 Yes No 64 Yes \n", - "\n", - " MultipleLines InternetService OnlineSecurity ... DeviceProtection \\\n", - "0 No DSL No ... No \n", - "1 No DSL No ... Yes \n", - "2 No phone service DSL Yes ... Yes \n", - "3 Yes Fiber optic No ... No \n", - "4 Yes Fiber optic No ... Yes \n", - "\n", - " TechSupport StreamingTV StreamingMovies Contract PaperlessBilling \\\n", - "0 Yes No No Month-to-month Yes \n", - "1 Yes No No Month-to-month Yes \n", - "2 No No Yes Two year Yes \n", - "3 No No No Month-to-month Yes \n", - "4 Yes Yes Yes Month-to-month Yes \n", - "\n", - " PaymentMethod MonthlyCharges TotalCharges Churn \n", - "0 Credit card (automatic) 49.50 312.7 No \n", - "1 Credit card (automatic) 55.00 1046.5 Yes \n", - "2 Credit card (automatic) 43.95 2960.1 No \n", - "3 Electronic check 74.35 834.2 Yes \n", - "4 Electronic check 111.15 6953.4 No \n", - "\n", - "[5 rows x 21 columns]" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Call the dataset df_csv\n", - "df_csv.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "RangeIndex: 2043 entries, 0 to 2042\n", - "Data columns (total 21 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 customerID 2043 non-null object \n", - " 1 gender 2043 non-null object \n", - " 2 SeniorCitizen 2043 non-null int64 \n", - " 3 Partner 2043 non-null object \n", - " 4 Dependents 2043 non-null object \n", - " 5 tenure 2043 non-null int64 \n", - " 6 PhoneService 2043 non-null object \n", - " 7 MultipleLines 2043 non-null object \n", - " 8 InternetService 2043 non-null object \n", - " 9 OnlineSecurity 2043 non-null object \n", - " 10 OnlineBackup 2043 non-null object \n", - " 11 DeviceProtection 2043 non-null object \n", - " 12 TechSupport 2043 non-null object \n", - " 13 StreamingTV 2043 non-null object \n", - " 14 StreamingMovies 2043 non-null object \n", - " 15 Contract 2043 non-null object \n", - " 16 PaperlessBilling 2043 non-null object \n", - " 17 PaymentMethod 2043 non-null object \n", - " 18 MonthlyCharges 2043 non-null float64\n", - " 19 TotalCharges 2043 non-null object \n", - " 20 Churn 2043 non-null object \n", - "dtypes: float64(1), int64(2), object(18)\n", - "memory usage: 335.3+ KB\n" - ] - } - ], - "source": [ - "# Column information\n", - "df_csv.info()" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
SeniorCitizentenureMonthlyCharges
count2043.0000002043.0000002043.000000
mean0.16838032.64904664.712555
std0.37429524.37624829.970010
min0.0000000.00000018.550000
25%0.0000009.00000035.825000
50%0.00000030.00000070.250000
75%0.00000055.00000089.625000
max1.00000072.000000118.350000
\n", - "
" - ], - "text/plain": [ - " SeniorCitizen tenure MonthlyCharges\n", - "count 2043.000000 2043.000000 2043.000000\n", - "mean 0.168380 32.649046 64.712555\n", - "std 0.374295 24.376248 29.970010\n", - "min 0.000000 0.000000 18.550000\n", - "25% 0.000000 9.000000 35.825000\n", - "50% 0.000000 30.000000 70.250000\n", - "75% 0.000000 55.000000 89.625000\n", - "max 1.000000 72.000000 118.350000" - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Summary statistics of df_csv to get insights into the distribution and basic characteristics of the numerical variables\n", - "df_csv.describe()" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "customerID 0\n", - "gender 0\n", - "SeniorCitizen 0\n", - "Partner 0\n", - "Dependents 0\n", - "tenure 0\n", - "PhoneService 0\n", - "MultipleLines 0\n", - "InternetService 0\n", - "OnlineSecurity 0\n", - "OnlineBackup 0\n", - "DeviceProtection 0\n", - "TechSupport 0\n", - "StreamingTV 0\n", - "StreamingMovies 0\n", - "Contract 0\n", - "PaperlessBilling 0\n", - "PaymentMethod 0\n", - "MonthlyCharges 0\n", - "TotalCharges 0\n", - "Churn 0\n", - "dtype: int64" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Handling missing values\n", - "df_csv.isnull().sum()" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [ - "Observations:\n", - "Dataset df_db has missing values in a number of columns. \n", - "SeniorCitizen is supposed to be boolean type (0,1) in df_excel; but (yes, no) in df_db\n", - "Partner is supposed to be boolean type but is object in df_excel\n", - "Dependants is supposed to be boolean type but is object in df_excel\n", - "PhoneService is supposed to be boolean type but is object in df_excel\n", - "MultiplesLines is (None, True, or False) in df_db; but True or False in df_excel\n", - "OnlineSecurity is (True/False) in df_db; but Yes/No/No Internet in df_excel\n", - "DeviceProtection is (True/False) in df_db; but (yes/no) in df_excel\n", - "\n", - "TotalCharges is supposed to be float64 type but is object in df_excel\n", - "\n", - " \n" - ] - }, - { - "cell_type": "raw", - "metadata": {}, - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.13" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/Classification_Telco_customer_churn.ipynb b/Classification_Telco_customer_churn.ipynb index ffe1dfc..2f72ccb 100644 --- a/Classification_Telco_customer_churn.ipynb +++ b/Classification_Telco_customer_churn.ipynb @@ -1,5 +1,132 @@ { +<<<<<<< HEAD + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ +<<<<<<< HEAD + "# Project Title: \n", + "Telecommunications Customer Churn Prediction Analysis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ +======= +>>>>>>> main + "# Business Understanding" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ +<<<<<<< HEAD + "## 1. Introduction\n", + "This project aims to assist a telecom company in understanding their data and predicting customer churn. The company has provided access to three different datasets: the first dataset with 3000 records stored in a remote Microsoft SQL Server database, the second dataset with 2000 records stored on OneDrive, and the third dataset hosted on a GitHub repository.\n", + "\n", + "### 1.1. Objectives\n", + "Understand the data: Our first objective is to provide insights into the telecom company's data, including customer demographics, services availed, and payment details. This understanding will enable the company to make informed business decisions.\n", + "\n", + "Find the lifetime value of each customer: By analyzing the data, we aim to identify factors that influence the rate at which customers churn. Understanding customer behavior and identifying key predictors will help the telecom company estimate the lifetime value of each customer.\n", + "\n", + "Predict customer churn: The primary objective is to develop a predictive model that accurately determines whether a customer is likely to churn or not. We will employ machine learning algorithms such as logistic regression, decision trees, support vector machines, random forest, etc., to build a model that effectively predicts customer churn.\n", + "\n", + "### 1.2. Methodology\n", + "To achieve our objectives, we will follow the CRISP-DM framework, which consists of the following steps:\n", + "\n", + "Data exploration: We will thoroughly explore the datasets to gain insights into the available variables, their distributions, and relationships. This step will provide us with an initial understanding of the data and help identify any data quality issues.\n", + "\n", + "Missing value computations: We will identify missing values in the datasets and decide on an appropriate strategy for handling them. This may involve imputing missing values or removing data points with missing values.\n", + "\n", + "Feature engineering: We will perform feature engineering to transform and create new variables that can potentially improve the predictive power of our models. This step may include encoding categorical variables, scaling numerical variables, or creating interaction terms.\n", + "\n", + "Model development: We will utilize various machine learning algorithms such as logistic regression, decision trees, support vector machines, random forest, etc., to develop predictive models for customer churn. We will train the models on a subset of the data and evaluate their performance using appropriate metrics.\n", + "\n", + "Model evaluation and interpretation: We will evaluate the trained models using evaluation metrics such as accuracy, precision, recall, and F1-score. Additionally, we will interpret the models to understand the factors driving customer churn and their relative importance.\n", + "\n", + "Model optimization and hyperparameter tuning: We will fine-tune the models by optimizing their hyperparameters to improve their performance. This step may involve techniques like grid search or random search to find the optimal combination of hyperparameters.\n", + "\n", + "By following this methodology, we aim to provide valuable insights to the telecom company and develop a reliable predictive model for customer churn." +======= + "## Business Understanding:\n", + "\n", + "The telecom company has provided access to three different datasets for a classification project. The first dataset consists of 3000 records and is stored in a remote database hosted on Microsoft SQL Server. The second dataset contains 2000 records and is stored on OneDrive. The third dataset is hosted on a GitHub repository.\n", + "\n", + "## Objectives of the project\n", + "\n", + "To assist the telecom company understand their data: The project aims to provide insights into the telecom company's data, including customer demographics, services availed, and payment details. This understanding will enable the company to make informed business decisions.\n", + "\n", + "Find the lifetime value of each customer: By analyzing the data, the project aims to identify factors that influence the rate at which customers churn. Understanding customer behavior and identifying key predictors will help the telecom company estimate the lifetime value of each customer.\n", + "\n", + "Predict customer churn: The project involves developing a predictive model to determine whether a customer is likely to churn or not. By using machine learning algorithms such as logistic regression, decision trees, support vector machines, random forest, etc., the project aims to build a model that accurately predicts customer churn.\n", + "\n", + "## Methodology\n", + "\n", + "To achieve these objectives, the project will follow the CRISP-DM framework and involve the following steps:\n", + "\n", + "Data exploration: Explore the datasets to gain insights into the available variables, their distributions, and relationships. This step will provide an initial understanding of the data and help identify any data quality issues.\n", + "\n", + "Missing value computations: Identify missing values in the datasets and decide on an appropriate strategy for handling them, such as imputation or removal of missing data points.\n", + "\n", + "Feature engineering: Perform feature engineering to transform and create new variables that can potentially improve the predictive power of the models. This step may involve encoding categorical variables, scaling numerical variables, or creating interaction terms.\n", + "\n", + "Model development: Utilize machine learning algorithms such as logistic regression, decision trees, support vector machines, random forest, etc., to develop predictive models for customer churn. Train the models on a subset of the data and evaluate their performance using appropriate metrics.\n", + "\n", + "Model evaluation and interpretation: Evaluate the trained models using evaluation metrics such as accuracy, precision, recall, and F1-score. Interpret the models to understand the factors driving customer churn and their relative importance.\n", + "\n", + "Model optimization and hyperparameter tuning: Fine-tune the models by optimizing their hyperparameters to improve their performance. This step may involve techniques like grid search or random search to find the optimal combination of hyperparameters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Hypothesis\n", + "H0: The churn rate of customers in the telecom company is not significantly influenced by various factors related to their demographics, services, and payment methods.\n", + "H1: The churn rate of customers in the telecom company is influenced by various factors related to their demographics, services, and payment methods.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Questions\n", + "1. Is there a relationship between the monthly charges and the likelihood of churn?\n", + "2. Do customers who have internet service, specifically fiber optic, exhibit a higher churn rate compared to those with DSL or no internet service?\n", + "3. Does the availability of online security, online backup, device protection, and tech support impact the churn rate?\n", + "4. How does the churn rate vary based on the customers' gender?\n", + "5. Does the presence of a partner influence the likelihood of churn?\n", + "6. Is there a correlation between the tenure of customers and their churn rate?" +>>>>>>> main + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Load Datasets" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ +<<<<<<< HEAD + "## Installations" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ +======= "cells": [ +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 { "cell_type": "markdown", "metadata": {}, @@ -82,9 +209,26 @@ "## Load first dataset from database" ] } +<<<<<<< HEAD + ], + "source": [ + "%pip install pyodbc\n", + "%pip install openpyxl" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load first dataset from SQL database" +======= + "## Load first dataset from database" +>>>>>>> main +======= ] } +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 ] }, { @@ -95,9 +239,19 @@ "source": [ "# Import necessary libraries\n", "import pyodbc\n", +<<<<<<< HEAD +<<<<<<< HEAD "import pandas as pd\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" +======= + "import pandas as pd\n" +>>>>>>> main +======= + "import pandas as pd\n", + "import warnings\n", + "warnings.filterwarnings(\"ignore\")" +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 ] }, { @@ -112,22 +266,56 @@ "username = 'dataAnalyst_LP2'\n", "password = 'A3g@3kR$2y'\n", "\n", +<<<<<<< HEAD +<<<<<<< HEAD +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "# Create the connection string using the ODBC driver format\n", "conn_str = f'DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}'\n", "\n", "# Establish the connection using the connection string\n", "conn = pyodbc.connect(conn_str)" +<<<<<<< HEAD +======= + "conn_str = f'DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}'\n", + "conn = pyodbc.connect(conn_str)\n" +>>>>>>> main +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, +<<<<<<< HEAD +<<<<<<< HEAD +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "outputs": [], "source": [ "# Query the database to retrieve the data\n", "query = 'SELECT TOP 3000 * FROM LP2_Telco_churn_first_3000'\n", "df_db = pd.read_sql(query, conn)" +<<<<<<< HEAD +======= + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "c:\\Users\\PSL-CUDJOE\\anaconda3\\lib\\site-packages\\pandas\\io\\sql.py:761: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "# Query the database to retrieve the data\n", + "query = 'SELECT TOP 3000 * FROM LP2_Telco_churn_first_3000'\n", + "df_db = pd.read_sql(query, conn)\n" +>>>>>>> main +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 ] }, { @@ -137,7 +325,15 @@ "outputs": [], "source": [ "# Close connection\n", +<<<<<<< HEAD +<<<<<<< HEAD + "conn.close()" +======= + "conn.close()\n" +>>>>>>> main +======= "conn.close()" +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 ] }, { @@ -360,7 +556,14 @@ } ], "source": [ +<<<<<<< HEAD +<<<<<<< HEAD "# Display the dataframe\n", +======= +>>>>>>> main +======= + "# Display the dataframe\n", +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "df_db.head()" ] }, @@ -375,6 +578,22 @@ "cell_type": "code", "execution_count": 7, "metadata": {}, +<<<<<<< HEAD +<<<<<<< HEAD +======= + "outputs": [], + "source": [ + "# Read the excel file into a pandas dataframe\n", + "df_excel = pd.read_excel('Telco-churn-second-2000.xlsx')\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, +>>>>>>> main +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "outputs": [ { "data": { @@ -576,16 +795,33 @@ "4 89.15 89.15 " ] }, +<<<<<<< HEAD +<<<<<<< HEAD + "execution_count": 7, +======= + "execution_count": 8, +>>>>>>> main +======= "execution_count": 7, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "output_type": "execute_result" } ], "source": [ +<<<<<<< HEAD +<<<<<<< HEAD +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "# Read the excel file into a pandas dataframe\n", "df_excel = pd.read_excel('Telco-churn-second-2000.xlsx')\n", "\n", "# Display the dataframe\n", +<<<<<<< HEAD +======= +>>>>>>> main +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "df_excel.head()" ] }, @@ -598,7 +834,25 @@ }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD "execution_count": 8, +======= + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "# Read third dataset\n", + "df_csv = pd.read_csv('LP2_Telco-churn-last-2000.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 10, +>>>>>>> main +======= + "execution_count": 8, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "outputs": [ { @@ -803,16 +1057,33 @@ "[5 rows x 21 columns]" ] }, +<<<<<<< HEAD +<<<<<<< HEAD "execution_count": 8, +======= + "execution_count": 10, +>>>>>>> main +======= + "execution_count": 8, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "output_type": "execute_result" } ], "source": [ +<<<<<<< HEAD +<<<<<<< HEAD +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "# Read third dataset\n", "df_csv = pd.read_csv('LP2_Telco-churn-last-2000.csv')\n", "\n", "# Display the dataframe\n", +<<<<<<< HEAD +======= +>>>>>>> main +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "df_csv.head()" ] }, @@ -820,6 +1091,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ +<<<<<<< HEAD +<<<<<<< HEAD +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "# Questions and Hypothesis" ] }, @@ -852,6 +1127,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ +<<<<<<< HEAD +======= +>>>>>>> main +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "# Data Exploration" ] }, @@ -866,15 +1146,432 @@ "cell_type": "markdown", "metadata": {}, "source": [ +<<<<<<< HEAD +<<<<<<< HEAD "### Explore The Dataframe from the SQL Database(df_db)" +======= + "### Explore df_db\n" +>>>>>>> main +======= + "### Explore The Dataframe from the SQL Database(df_db)" +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 ] }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "execution_count": 9, "metadata": { "scrolled": false }, +<<<<<<< HEAD +======= + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurity...DeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalChargesChurn
07590-VHVEGFemaleFalseTrueFalse1FalseNoneDSLFalse...FalseFalseFalseFalseMonth-to-monthTrueElectronic check29.85000029.850000False
15575-GNVDEMaleFalseFalseFalse34TrueFalseDSLTrue...TrueFalseFalseFalseOne yearFalseMailed check56.9500011889.500000False
23668-QPYBKMaleFalseFalseFalse2TrueFalseDSLTrue...FalseFalseFalseFalseMonth-to-monthTrueMailed check53.849998108.150002True
37795-CFOCWMaleFalseFalseFalse45FalseNoneDSLTrue...TrueTrueFalseFalseOne yearFalseBank transfer (automatic)42.2999991840.750000False
49237-HQITUFemaleFalseFalseFalse2TrueFalseFiber opticFalse...FalseFalseFalseFalseMonth-to-monthTrueElectronic check70.699997151.649994True
..................................................................
29952209-XADXFFemaleFalseFalseFalse1FalseNoneDSLFalse...FalseFalseFalseFalseMonth-to-monthFalseBank transfer (automatic)25.25000025.250000False
29966620-JDYNWFemaleFalseFalseFalse18TrueTrueDSLTrue...TrueFalseFalseFalseMonth-to-monthTrueMailed check60.5999981156.349976False
29971891-FZYSAMaleTrueTrueFalse69TrueTrueFiber opticFalse...FalseFalseTrueFalseMonth-to-monthTrueElectronic check89.9499976143.149902True
29984770-UEZOXMaleFalseFalseFalse2TrueFalseFiber opticFalse...FalseFalseFalseFalseMonth-to-monthTrueElectronic check74.750000144.800003False
29991A1:U3001038-RQOSTMaleFalseTrueTrue19TrueFalseNoNone...NoneNoneNoneNoneMonth-to-monthFalseMailed check20.600000414.950012False
\n", + "

3000 rows × 21 columns

\n", + "
" + ], + "text/plain": [ + " customerID gender SeniorCitizen Partner Dependents tenure \\\n", + "0 7590-VHVEG Female False True False 1 \n", + "1 5575-GNVDE Male False False False 34 \n", + "2 3668-QPYBK Male False False False 2 \n", + "3 7795-CFOCW Male False False False 45 \n", + "4 9237-HQITU Female False False False 2 \n", + "... ... ... ... ... ... ... \n", + "2995 2209-XADXF Female False False False 1 \n", + "2996 6620-JDYNW Female False False False 18 \n", + "2997 1891-FZYSA Male True True False 69 \n", + "2998 4770-UEZOX Male False False False 2 \n", + "2999 1A1:U3001038-RQOST Male False True True 19 \n", + "\n", + " PhoneService MultipleLines InternetService OnlineSecurity ... \\\n", + "0 False None DSL False ... \n", + "1 True False DSL True ... \n", + "2 True False DSL True ... \n", + "3 False None DSL True ... \n", + "4 True False Fiber optic False ... \n", + "... ... ... ... ... ... \n", + "2995 False None DSL False ... \n", + "2996 True True DSL True ... \n", + "2997 True True Fiber optic False ... \n", + "2998 True False Fiber optic False ... \n", + "2999 True False No None ... \n", + "\n", + " DeviceProtection TechSupport StreamingTV StreamingMovies Contract \\\n", + "0 False False False False Month-to-month \n", + "1 True False False False One year \n", + "2 False False False False Month-to-month \n", + "3 True True False False One year \n", + "4 False False False False Month-to-month \n", + "... ... ... ... ... ... \n", + "2995 False False False False Month-to-month \n", + "2996 True False False False Month-to-month \n", + "2997 False False True False Month-to-month \n", + "2998 False False False False Month-to-month \n", + "2999 None None None None Month-to-month \n", + "\n", + " PaperlessBilling PaymentMethod MonthlyCharges TotalCharges \\\n", + "0 True Electronic check 29.850000 29.850000 \n", + "1 False Mailed check 56.950001 1889.500000 \n", + "2 True Mailed check 53.849998 108.150002 \n", + "3 False Bank transfer (automatic) 42.299999 1840.750000 \n", + "4 True Electronic check 70.699997 151.649994 \n", + "... ... ... ... ... \n", + "2995 False Bank transfer (automatic) 25.250000 25.250000 \n", + "2996 True Mailed check 60.599998 1156.349976 \n", + "2997 True Electronic check 89.949997 6143.149902 \n", + "2998 True Electronic check 74.750000 144.800003 \n", + "2999 False Mailed check 20.600000 414.950012 \n", + "\n", + " Churn \n", + "0 False \n", + "1 False \n", + "2 True \n", + "3 False \n", + "4 True \n", + "... ... \n", + "2995 False \n", + "2996 False \n", + "2997 True \n", + "2998 False \n", + "2999 False \n", + "\n", + "[3000 rows x 21 columns]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Call df_db\n", + "df_db" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, +>>>>>>> main +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "outputs": [ { "name": "stdout", @@ -925,6 +1622,10 @@ }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "execution_count": 10, "metadata": {}, "outputs": [ @@ -1014,6 +1715,12 @@ { "cell_type": "code", "execution_count": 12, +<<<<<<< HEAD +======= + "execution_count": 13, +>>>>>>> main +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "outputs": [ { @@ -1107,7 +1814,15 @@ "max 72.000000 118.650002 8564.750000" ] }, +<<<<<<< HEAD +<<<<<<< HEAD + "execution_count": 12, +======= + "execution_count": 13, +>>>>>>> main +======= "execution_count": 12, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "output_type": "execute_result" } @@ -1119,8 +1834,16 @@ }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD + "execution_count": 13, +======= + "execution_count": 14, +>>>>>>> main +======= "execution_count": 13, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "outputs": [ { @@ -1150,7 +1873,15 @@ "dtype: int64" ] }, +<<<<<<< HEAD +<<<<<<< HEAD + "execution_count": 13, +======= + "execution_count": 14, +>>>>>>> main +======= "execution_count": 13, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "output_type": "execute_result" } @@ -1169,7 +1900,15 @@ }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD + "execution_count": 14, +======= + "execution_count": 15, +>>>>>>> main +======= "execution_count": 14, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "outputs": [ { @@ -1372,7 +2111,15 @@ "4 89.15 89.15 " ] }, +<<<<<<< HEAD +<<<<<<< HEAD + "execution_count": 14, +======= + "execution_count": 15, +>>>>>>> main +======= "execution_count": 14, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "output_type": "execute_result" } @@ -1384,7 +2131,15 @@ }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD "execution_count": 15, +======= + "execution_count": 16, +>>>>>>> main +======= + "execution_count": 15, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "outputs": [ { @@ -1428,6 +2183,10 @@ }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "execution_count": 16, "metadata": { "scrolled": false @@ -1593,7 +2352,13 @@ { "cell_type": "code", "execution_count": 17, +<<<<<<< HEAD +======= + "execution_count": 18, +>>>>>>> main +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "outputs": [ { @@ -1687,7 +2452,15 @@ "max 1.000000 72.000000 118.750000" ] }, +<<<<<<< HEAD +<<<<<<< HEAD "execution_count": 17, +======= + "execution_count": 18, +>>>>>>> main +======= + "execution_count": 17, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "output_type": "execute_result" } @@ -1699,7 +2472,15 @@ }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD + "execution_count": 18, +======= + "execution_count": 21, +>>>>>>> main +======= "execution_count": 18, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "outputs": [ { @@ -1728,7 +2509,15 @@ "dtype: int64" ] }, +<<<<<<< HEAD +<<<<<<< HEAD "execution_count": 18, +======= + "execution_count": 21, +>>>>>>> main +======= + "execution_count": 18, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "output_type": "execute_result" } @@ -1754,7 +2543,15 @@ }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD + "execution_count": 19, +======= + "execution_count": 17, +>>>>>>> main +======= "execution_count": 19, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "outputs": [ { @@ -1959,7 +2756,15 @@ "[5 rows x 21 columns]" ] }, +<<<<<<< HEAD +<<<<<<< HEAD "execution_count": 19, +======= + "execution_count": 17, +>>>>>>> main +======= + "execution_count": 19, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "output_type": "execute_result" } @@ -2016,6 +2821,10 @@ }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "execution_count": 21, "metadata": {}, "outputs": [ @@ -2083,6 +2892,12 @@ { "cell_type": "code", "execution_count": 22, +<<<<<<< HEAD +======= + "execution_count": 19, +>>>>>>> main +======= +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "outputs": [ { @@ -2176,7 +2991,15 @@ "max 1.000000 72.000000 118.350000" ] }, +<<<<<<< HEAD +<<<<<<< HEAD + "execution_count": 22, +======= + "execution_count": 19, +>>>>>>> main +======= "execution_count": 22, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "output_type": "execute_result" } @@ -2188,7 +3011,15 @@ }, { "cell_type": "code", +<<<<<<< HEAD +<<<<<<< HEAD + "execution_count": 23, +======= + "execution_count": 22, +>>>>>>> main +======= "execution_count": 23, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "outputs": [ { @@ -2218,7 +3049,15 @@ "dtype: int64" ] }, +<<<<<<< HEAD +<<<<<<< HEAD "execution_count": 23, +======= + "execution_count": 22, +>>>>>>> main +======= + "execution_count": 23, +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 "metadata": {}, "output_type": "execute_result" } @@ -2242,7 +3081,17 @@ "OnlineSecurity is (True/False) in df_db; but Yes/No/No Internet in df_excel\n", "DeviceProtection is (True/False) in df_db; but (yes/no) in df_excel\n", "\n", +<<<<<<< HEAD +<<<<<<< HEAD + "TotalCharges is supposed to be float64 type but is object in df_excel" +======= + "TotalCharges is supposed to be float64 type but is object in df_excel\n", + "\n", + " \n" +>>>>>>> main +======= "TotalCharges is supposed to be float64 type but is object in df_excel" +>>>>>>> a61f3e857c29edc0e2b7d8a2e93f52101a5dda44 ] }, {