My Graphic Studio

The Future of Statistics….. Back to the Basics


August 11, 2014

Last fall, Simply Statistics held an interesting panel discussion on the future of statistics and data analytics. Overall, I thought the discussion had a lot of great insights and gave an idea of where analytics is headed in the near future. This post explores some key-takeaways and implications.

1. We’re gaining the ability to do large scale experiments (think 6 degrees of separation at a massive scale). This will allow us to better understand the cause and effects of different actions.

2. Data analysis is about telling stories. It involves seeing a problem, using analysis to understand its moving parts, and expressing the findings through visuals or models that are easy to understand.

3. There is a growing synthesis between statistics and computer science. The ability to prep and build models to solve real-world problems is becoming a valued skill set.

4. More and more people are expected to become comfortable with using data analysis tools or at least be critical consumers of data (i.e. can interpret results and gauge the quality of the analysis).

5. More data is becoming available to the public every day. This will improve our ability to discover and share insights, review data quality, and synthesize different data sources to understand the bigger picture.

“Currently, large scale analytics is great at predictions but poor at inference.”

In many ways, this stresses the importance of fundamentals. For instance, if you shop at Amazon, you’ll notice that it recommends other items to purchase. This recommendation is based on your search and purchasing history as well as those of people who purchased that item. This is basically a traditional customer profile but on steroids.

Making recommendations is essentially a “prediction” problem and while making recommendations is great, it doesn’t tell us “how likely” your are to actually purchase those items (which is really what businesses want to know). That question is an “inference” problem and has more to do with the level of confidence we have in our tools and models.

From my personal experience, inferences are a fundamental part of good analysis. Looking at relationships and associations between different variables is where analytics offers true insight. The fact that large scale analytics have not matured the ability to make inferences tells me this is really where focus is needed.

This development is particularly interesting for the future of retail analytics. Using indicators like p-values, confidence intervals, and coefficients, can help us better understand the connections between different products.

For instance, lets say we have excess inventory of red shirts that we need to sell. A predictive model will show all the things people purchased with red shirts. Inferences will show which of those relationships are significant and how strong is the relationship.
In other words, we might find that people who bought blue jeans and Nikes were the most likely to buy red shirts. We can then target that demographic with discounts and increase overall sales.

At the end of the day it’s all about looking at data from different perspectives, finding interesting insights and trends, and translating those insights into reliable actions that enhance our relationships with customers, keep inventory moving, and improve the flow of revenue.