The sample data is a fictionalized data for Dominos Pizza Nigeria. One day sales data for their Lekki branch. You can download the practice along raw data file here: https://dl.dropboxusercontent.com/u/28140414/Dominos%20Pizza.csv
So the business question we want to tackle is: Is there a pattern in the quantities each customer buys? To be more specific, we want to examine the frequency distribution of the quantities purchased per sales transaction.
In Excel, it is extremely straightforward. Just plot a histogram on the quantity field.
Now let's head to doing same with R
I use R 3.3.2 and RStudio.
First, I import the csv file into RStudio.
Though not necessary for what we want to do, but I like doing it for any data I bring into R, I run the summary command on the dataframe/table. > summary(Dominos_Pizza)
Again, not a required step. I check out the standard plot graph on the Quantity field. > plot(Dominos_Pizza$Quantity)
Finally, I do the histogram chart on the Quantity field. > hist(Dominos_Pizza$Quantity)
For now I don't bother customizing the graph elements (labels, color, title, etc.)
It is Python time.
I use Rodeo IDE and Anaconda.
I import Pandas and use it to read in the csv file.
And here is the plot graph, like we did in R.
Finally, I create the histogram.
I will try to follow up with more tutorials of complex tasks, and some that are best suited to R and others that are best suited to Python. As per Excel, it is in a completely different class. It is a spreadsheet application.
Got any particular task you will like me to create a tutorial around? Ask away!
Hello Michael, thank you for this easy-to-follow tutorial. I am a beginner in R and was able to replicate your steps. I was also at the Bootcamp but your friend F. Okoye sent me this link. Thanks again.
ReplyDeleteCool. Glad to hear you were also at the Bootcamp. I've got to thank Francis for the kindness.
DeleteThanks for trying out the tutorial steps. I hope to publish more in the future.
Amazing, and I went a little extra to sort, sum total order for each pizza flavor and check which is the best selling for that day using R. I got Pepperoni Suya and that took me quite sometime as a beginner oh, but was exciting to get it right. Please I would appreciate a lecture or link to one on cleaning data before analyzing it.Thank you.
ReplyDeleteThat's impressive!
DeleteI will try to create other tutorials around data cleaning.
Hi Michael
ReplyDeleteUseful Post. Any more information about the "amazing community of Data Scientists in Nigeria" you mentioned you joined?
Hi Layibiyi,
DeleteWell, we now have a vibrant Whatsapp community and are gearing up for the next bootcamp.
I have met other amazing data analysts in the community.
Cheers.