Problem Statement
Up-till this point, we are sure that you are clear about what is Market basket Analysis, Apriori Algorithm and statistical concepts related to Association rules.
Process
Step 1 : Download the famous grocery dataset, available here : Click Here
Step 2 : Perform following Exploratory Data Analysis over the above dataset
- read the csv file into a dataframe.
- Get the shape find the top 20 "sold items" that occur in the dataset
- find how much of the total sales they account for.
Step 3 : Create a function prune_dataset, which will help us reduce the size of our dataset
based on our requirements. The function should perform Pruning based on percentage of
total sales. for
example the function call would look like this :
output_df, item_counts =
prune_dataset(input_df=grocery_df,
length_trans=2,total_sales_perc=0.4)
step 4 : We need to specify two pieces of information for
generating our rules: support and confidence.
We have already defined both of them
conceptually earlier (on this web page itself), so we will
not be defining them again. An
important piece of information is to start with a higher support, as lower support will mean
a higher number of frequent itemsets and hence a longer execution time.
Comments
Post a Comment