Skip to content

kpaganopoulos/aib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Employee Terminations

This data set is from Kaggle; explanation can be found at https://www.kaggle.com/HRAnalyticRepository/employee-attrition-data

WHAT AND WHO THIS DATASET IS FOR

This dataset is for the teams NOT doing LIVE presentations for Projects 1 and 2 (Describe and Evaluate).

Note: Teams not presenting live will upload presentations to the hub by 10:45 AM on Mondays, but please also bring hardcopies of relevant views of your work for turn-in.

Whether presenting live or submitting video, all presentations must be 8 minutes or less.

ABOUT THE DATA

Here are brief notes about the dataset and the contents of its single .csv file.

Overview:

  • The data represent fictitious/fake data on terminations.
  • For each of 10 years it show employees that are active and those that terminated.

Content: The data contains

  • employee id
  • employee record date ( year of data)
  • birth date
  • hire date
  • termination date
  • age
  • length of service
  • city
  • department
  • job
  • title
  • store
  • number
  • gender
  • termination reason
  • termination type
  • status year
  • status
  • business unit

Acknowledgements: None- it's fake data

Inspiration: A lot of turnover analyses occur at an aggregate level-such as turnover rates. But few analyses concentrate on trying to identify exactly which individuals might leave based on patterns that might be present in existing data. Machine learning algorithms often showcase customer churn examples for telcos or product marketing. Those algorithms equally apply to employee churn.

BRIEF:

Your client wants to develop a program to 'save' employees who would otherwise face likely termination.

As a first step, the client is asking, are we currently terminating employees who either (a) once performed well? or (b) have similar backgrounds as other employees who do perform well?

Depending on the answer to these questions, the client is interested in understanding the features of potentially valuable but likely-to-be-terminated employees so that effective retention programs could be designed and put in place.

Your job is to address these questions, in part, by completing the describe and evaluate phases of what a consulting engagement would be.

PROJECT 1: DESCRIBE

As discussed in class, the describe project is about fully characterising the dataset to surface what you find to be the most interesting patterns and questions relevant to the client's brief.

Here are some tips for your presentation:

  1. Begin with the context: the project aim, what you did, and what you hope to achieve
  2. Summary of what you did
  3. Some patterns (results) you probably expect
  4. Some patterns (findings) you might find surprising
  5. What this might mean
  6. Suggested focus for project (under general aims of brief)

PROJECT 2: EVALUATE

As discussed in class, the evaluate project is about taking a sober look at whether you have the data you need to satisfactorily achieve your project goals.

If so, explain what you'll use, and explain how and why it will give you what you need. (Points for creativity if you make the existing data work despite problems, but it's not good to think you have what you need but don't!)

If you don't have all you need, say what else you'd need, why, and where you might get it.

Here are some tips for your presentation:

  1. Begin with the context: the project aim, what you did this week, and what it means
  2. What you think is necessary
  3. How the data provides that, or not
  4. How you will either ... a. Make the project work with the data you have b. Get the data you need to make the project work
  5. Speculations about what you will find
  6. What is next and what this could all mean add up to

Getting Funded on Kickstarter

This data set is from Kaggle; explanation can be found at https://www.kaggle.com/codename007/funding-successful-projects#test.csv

WHO THIS DATASET IS FOR

This dataset is for all teams for Projects 3 (Explain) and 4 (Predict). Note this is for live presenters and video submitters.

Note: Teams presenting live are not required to upload your presentations to the hub, but please do bring hardcopies of the relevant views of your presentation.

Whether presenting live or submitting video, all presentations must be 8 minutes or less.

ABOUT THE DATA

There are three files given to download: train.csv, test.csv and sample_submission.csv The train data consists of sample projects from the May 2009 to May 2015. The test data consists of projects from June 2015 to March 2017.

See the URL above for explanations of the data.

BACKGROUND

Kickstarter is a community of more than 10 million people comprising of creative, tech enthusiasts who help in bringing creative project to life. Till now, more than $3 billion dollars have been contributed by the members in fueling creative projects. The projects can be literally anything – a device, a game, an app, a film etc.

Kickstarter works on all or nothing basis i.e if a project doesn’t meet it goal, the project owner gets nothing. For example: if a projects’s goal is $500. Even if it gets funded till $499, the project won’t be a success.

Recently, Kickstarter released its public data repository to allow researchers and enthusiasts like us to help them solve a problem. Will a project get fully funded ?

BRIEF

In this challenge, you have to first (a) explain why and then (b) predict if a project will get successfully funded or not.

PROJECT 3: EXPLAIN

As discussed in class, the explain project is about a combination of exploration, hypothesis, and hypothesis testing to explain the focal relationship as much as is possible with the dataset.

To be ready to do this project, you will need to lay a foundation by doing the describe and evaluate steps of the project, but you will not translate that work into full presentations as we did in the first 2 projects. Instead, summarise that work in a few pages and save time to present your explanation of the focal relationship.

To provide an explanation of the explanation relationship, you should perform at least a logistic regression, but feel free to use other methods as well. In your models, focus not just on explained variance, but also on highlighting variables with larger main effects or collections of variables that have interaction effect that are both statistically and substantively significant -- i.e., practically important to entrepreneurs and inventors seeking to raise money on kickstarter.

Here are some tips for your presentation:

  1. Begin with the context: the project aim, what you did, and what you hope to achieve
  2. Summary of what you will show
  3. Interesting patterns you saw in describing the data
  4. Opportunities and limitations you noted in evaluating what you can say with the data
  5. Hypotheses you derived from the describe and evaluate
  6. Features of the data that have implications for model selection
  7. The model(s) you ran and the results you got
  8. What this means for kickstarter users (entrepreneurs)

PROJECT 4: PREDICT

As we will discuss in lecture, the predict project is about making a prediction believable enough that people will act on it — potentially changing their plans or habits to leverage new insights.

To make an interesting prediction, you should do an explainable decision tree model that suits the data and gets you a good mix of accuracy and explainability.

Here are some tips for your presentation:

  1. Recap the context: the problem, what you did last week, and where you will take it today
  2. Executive summary of what you will show
  3. Give any refinements of the problem you want to make
  4. The models you considered, the model you chose, and why
  5. How you did the model
  6. Your results
  7. What it means (including limitations)

About

Analytics in Business Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published