Main goal: Develop Machine Learning aplication in a distributed environment using AWS services with Spark.
Students: Carlos Danilo Tomé and Lucas Galdino de Camargo
Dataset: https://www.kaggle.com/volodymyrgavrysh/bank-marketing-campaigns-dataset
A Portuguese bank wants to use analytical solutions to support its new investments in Marketing campaigns, seeking greater efficiency in its operation and greater competitiveness in the market.
The goal is to develop a machine learning model to calculate the probability of a customer adhering to the company's product/service, and thus be able to better segment their campaigns to improve their results.
According to the company's strategy, the solution must be scalable and therefore based on cloud computing. So we will use Spark with AWS cloud tools in order to make the predictions faster and scalable.
- Data Wrangling.ipynb and Data Cleanning.ipynb:: In this notebook, the team performs some corrections of order of registration error or missing information. is equivalent to a "Transient" step of an ETL process.
- Data Engineering.ipynb: In this notebook, the developers performs data transformations that are interesting for modeling this problem, such as creating new variables and regularization.
- Final Notebook.ipynb : In this notebook we join all process, include modeling step, in only one file.