Skip to content

Using Spark to solve the KPIs of a small Retail Dataset

Notifications You must be signed in to change notification settings

srjsunny/Retail-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retail-Data-Analysis

The objective of this project is to analyze Retail-Dataset and solve various KPIs of using Apache Spark.

Data Format: tab delimited

Data Time City Product-category Sales Payment-mode

Prerequisites:

Any flavor of linux with following installed:

  • JDK 8
  • Spark 2.4
  • Scala 2.11.x

KPIs

KPI 1

  • Find product wise sales distribution.
val breaking  = data.map{lst => 
   { 
     val tokens = lst.split("\t")
    
     val product = tokens(3)
     val sales = tokens(4)
    (product,Double.parseDouble(sales))  //product as key and sales as value
         
   }}
   .reduceByKey(_+_)    //reduceByKey((a,b) => a+b)

KPI 2

  • Find store wise sales.
val result = data.map{line=>{
   
   val tokens = line.split("\t")
   
   (tokens(2), Double.parseDouble(tokens(4)))
    
  }}.reduceByKey(_+_)

KPI 3

  • Find total sales done.
 val result =  data.map{lst =>{
    
    val tokens = lst.split("\t")
    
    ("Total sales done",Double.parseDouble(tokens(4)))  //String Total sales done as key
   
  }}.reduceByKey(_+_)    //all values corresponding to that key are added

Running the job

bin/spark-submit --class  fullClassName  jarFilePath  input_path   output_path

bin/spark-submit --class KPI_1.productWiseSalesDistrubution  ../Desktop/retail.jar   ../Desktop/Retail_Sample_Data_Set   /output1

Output

KPI_1

KPI_2

KPI_3

About

Using Spark to solve the KPIs of a small Retail Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages