This project implements the K-Means clustering algorithm to analyze customer segmentation based on annual income and spending score using Python. The goal is to identify distinct customer groups to enhance targeted marketing strategies.
In this project, we:
- Loaded and preprocessed the customer dataset.
- Scaled the features using
StandardScaler
. - Employed the Elbow Method to determine the optimal number of clusters.
- Implemented K-Means clustering to categorize customers.
- Visualized the results to provide insights into customer segmentation.
The dataset used in this project is Mall_Customers.csv
, which contains information about customers, including their annual income and spending score. Ensure you place the dataset in the correct directory for the script to load it.
To run this project, you need to have Python installed. You can set up a virtual environment and install the required packages:
# Create a virtual environment (optional)
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install the required packages
pip install pandas numpy scikit-learn matplotlib
Usage
Clone the repository:
bash
Copy code
git clone https://github.com/yourusername/kmeans-clustering.git
cd kmeans-clustering
Run the Python script:
bash
Copy code
python kmeans_clustering.py
Make sure to adjust the file path in the script to point to your dataset location.
Results
The script will generate two plots:
Elbow Method Plot: Helps identify the optimal number of clusters based on inertia.
Cluster Visualization: Displays customer clusters along with their centroids.
License
This project is licensed under the MIT License. See the LICENSE file for details.
markdown
Copy code
### Notes
- Replace `yourusername` in the clone command with your actual GitHub username.
- Adjust the paths and filenames as necessary.
- Add any additional sections or information that you think are relevant!