which gives the probability of the item being on the shelf just after 30 days, or more generally, the probability that the item has not been sold in 30 days. There are several ways to represent the distribution of T: The most familiar is likely the probability-density function.
The simplest parametric model for survival data is the exponential distribution, with probability density function and single rate parameter λ in the following form:
I was really excited to try out a methodology I learned in grad school called Kaplan Meier Estimation. It involves computing the probabilities of occurrence of event at certain points of time. We then multiply these successive probabilities by any earlier computed probabilities to get the final estimate.
Total probability of a product still being on the shelf after 30 days is calculated by multiplying all the probabilities of the product still being on the shelf at every time interval before 30 days (by applying law of multiplication of probability to calculate cumulative probability). For example, the probability of a product still being on the shelf after 30 days can be considered to be probability of it still being there after the first day multiplied by the probability of it being there after the second day if it was there after the first day. This second probability is therefore a conditional probability. Although the probability calculated at any given interval is not very accurate because of the small number of events, the overall probability of lasting to each point is more accurate.
As usual, I can count on scikit-learn to have an estimator I can use. I plugged in all the data about how long the product has historically been on the shelf and got the following Kaplan-Meier curve:
This can be interpreted to mean that there is a less than 50% chance a product is still on the shelves after 30 days. or in other words greater than 50% confidence that the product will be sold within 30 days. My recommendation here is for the retailer to define what probability threshold it would like to see before offering a discount. Maybe it's fine with greater than not confidence, or maybe it would like to get to 80% confidence. With the right amount of historical data we could do a comparative analysis at different price points and see how much of a discount should be offered to get to that degree of confidence. Perhaps we can come back and do that later.
This was a particularly fun analysis because it allowed me to experiment with cumulative property in a defined time period, which is a concept I think can be applied to a lot of commercial challenges beyond healthcare.