Visual Diagnostics for More Informed Machine Learning
Rebecca Bilbro
@rebeccabilbro
Bytecubed / District Data Labs
Welcome and thank you for joining me for my talk on visual diagnostics for more informed machine learning
Nice to Meet You
First of all, it's nice to meet you. My name is Rebecca. My day job is at Bytecubed, where I lead a team of data scientists who are using classification, clustering, natural language processing and graph techniques to change the way business analytics are conducted. On nights and weekends, I work with District Data Labs, an open source collaborative.
How I got started
As for the first question you get asked in the Machine Learning community... yes, I have a PhD. No it's not in machine learning. How many self-taught Machine Learning practitioners out there? Well I'm sure you can relate. My path to ML was circuitous. I studied math, I did research in technical communication, I dabbled in a lot of things. But when I found Python and Machine Learning, it was basically love at first sight. Because...
Machine learning is easy
Python makes Machine Learning so easy.
Six Simple Steps
Get data
Prep data
Pick model
Fit model
Validate model
Deploy
All you have to do is (1) get some data, (2) prep your data, (3) pick a model, (4) instantiate and fit that model, (5) validate it, and (6) deploy it! You can do all of that in just a few dozen lines of code. Want to see how I did it?
Get Data
Pick Model
Pick a model. Luckily for me, it turned out the internet was just FULL of people who know exactly which model is the best.
Fit Model
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model .fit(X,y)
model .predict(X)
Once I picked a model, I'd fit it using something like this. Scikit-Learn is incredible and the API made this ridiculously easy.
Validate Model
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
print (mse(expected, predicted) )
print (r2_score(expected, predicted) )
from sklearn.metrics import classification_report
print (classification_report(expected, predicted) )
Because I'm a nice person, next I'd validate the model using the coefficient of determination for my regressors, or the F1 score for my classifiers. I'd proceed to feel superior if I got anything over .8, and otherwise...
GridSearchCV
...I'd use Gridsearch to help me get my scores up.
Pipelines
Then I'd use pipelines to take all the hacky research code and put it into something good enough for deployment.
Done!
That's it! Except... at night when I'd lie in bed, I couldn't help but think...
...that maybe I had no idea what I was actually doing. I'm pretty sure I'm not the only one. Python and high-level libraries like Scikit-Learn have made Machine Learning accessible in a way that it never was before. But informed machine learning is still really hard.
Informed machine learning is hard
As the tools have become more accessible, the ML population has swelled. And it seems like the appetite for machine learning-based applications has never been bigger. Predictive methods are going to increasingly inform how we do all kinds of things, from how we shop to how we live, from how we fight to how we fall in love. Before, you used to have to go to school to do ML. My mentor has studied machine learning for a decade, and so has my boss. But the future of machine learning practitioners looks a lot more like me.
Anscombe's Quartet
Recognize this? It's Anscombe's quartet - four datasets with nearly identical statistical properties but that are no less significantly different. The takeaway is that of all of the analytical tools at our disposal, sometimes our eyes are the most important.
Machine Learning in color?
In other words, how do we turn on the technicolor for machine learning? Many of the tools have already been implemented in Python. But they're kind scattered, so what I've done is cobbled them together into what I like to think of as the "yellow brick road"
Follow the Yellow Brick Road
Follow the yellow brick road
The Model Selection Triple
My workflow is based on the model selection triple: feature analysis, model selection, and hyperparameter tuning.
When it comes to ML, the most important picture to have is the big picture. Our conversations about models sometimes give us tunnel vision. Whether it's random forests, SVM, Bayes, neural nets, everyone has their favorite! Picking a good model is important, but it's not enough. I propose a broader view. This is a diagram of the workflow I use to do machine learning, and it's the one I use now to teach beginners. As you'll see, visualizations play a critical role in every stage.
Visual Feature Analysis
Once I have a new dataset, I begin with feature analysis. This involves descriptive statistics, but also things like...
Boxplots
Boxplots, so that I can look at the central tendency of the data, see the distribution, and examine outliers.
Histograms
Histograms, to bin the values of individual features and expose frequencies.
Sploms
Scatterplot matrices, to check for pairwise relationships between features. I use these to look for covariance, for relationships that appear to be linear, quadratic, or exponential. I watch for homoscedastic or heteroscedastic behavior to understand how the features are dispersed relative to each other.
Jointplots
And often jointplots, when I need to zoom in on a single pair of features.
Radviz
I use radial visualizations or radviz, to examine the relative pull or predictiveness of certain features within a unit circle. I can also look for class separability.
Parallel Coordinates
I also find parallel coordinates useful - here my datapoints are plotted as individual line segments and I look for thick chords or braids of lines of the same color that indicate good class separability. My analysis of the features often leads back to the data, where I take another pass through to normalize, scale, extract, or otherwise wrangle the attributes.
Visual Model Selection
After more feature analysis has confirmed I'm on the right track, I identify the category of machine learning models best suited to my features and problem space, often experimenting with fit-predict on multiple models.
Choosing the Right Estimator
Many of us begin our journey through Machine Learning with Python using the Scikit-Learn "Choosing the Right Estimator" flowchart.
Cluster Comparison
There's also the cluster comparison plot, which you can use to compare different clustering algorithms across different datasets...
Classifier Comparison
...and the classifier comparison plot, which is a helpful visual comparison of the performance of nine different classifiers across three different toy datasets.
Models: Families, Forms, Fitted
Lately I've been experimenting with some different ways of visualizing model families. I think it would be useful to be able to treat model selection as a kind of graph theory traversal problem.
Classification Heatmap
Model evaluation tools, like this classification heatmap, can also feed into my model selection process...
ROC-AUC Plots
I particularly like using small multiples as a method for comparing the relative appropriateness of different algorithms for a given dataset...
Prediction Error Plots
And it helps to be able to visually compare the performance of each model...
Residual Plots
...so that I can build up my intuition not only about which model performs best, but about why - for instance due to bias or heteroscedasticity.
Visual tuning
This kind of evaluation of my models flows directly into a reflection on the models I initially selected, in some cases leading me to choose different models. My model evaluations also help me approach tuning...
Blind Gridsearch
Gridsearch is an incredibly powerful tool. But the problem is that picking the initial search range for the parameters requires some understanding of what parameters are available, what those parameters mean, what impact they can have on a model, and what a reasonable search space might be. Instead of just blundering around...
Validation Curves
I use validation curves to visualize training and validation scores of a model as I explore different values of a single hyperparameter. I look for that sweet spot with the highest value for both the training and the validation scores. Both scores low = underfit. Training score high + validation score low = overfit.
Visual Gridsearch
I also like to use heatmaps to visualize combinations of hyperparameter values that produce the best models. Yes, hyperparameter tuning is still hard. Like I said, some folks spend years in school studying and investigating the complexities of different model parameters. Spinning up that kind of hard-won intuition doesn't happen overnight, but for me visualizations add insight and take the process out of the black box.
How to facilitate better workflows?
Visualization can play a role throughout the workflow. By following the yellow brick road, you can get to a place where machine learning is more informed. We need to make Oz a place that anyone can get to. A lot of the tools are already implemented in Python - in Scikit-Learn, Matplotlib, Pandas, Bokeh, and Seaborn. But they're spread out across a lot of different places, which makes them harder to use together. We can fix that.
Yellowbrick
A package we've been working on to integrate visual feature analysis, model selection, evaluation and tuning with the Scikit-Learn pipeline.
Creating a “visual transformer” to inject visualization in the model workflow.
Feature visualizations come after transformers, call fit() draw() then predict()
Model evaluation visualizations come after estimator, call fit() predict() score() draw()
https://flic.kr/p/2Yj9mj
Visual Transformers
Creating a “visual transformer” to inject visualization in the model workflow. Feature visualizations come after transformers, call fit() draw() then predict(). Model evaluation visualizations come after estimator, call fit() predict() score() draw()
Search for Separability
Implemented our own versions in pure Matplotlib
Explore Pairs of Features
Rank1D and Rank2D evaluate single features or pairs of features using a variety of metrics that score the features on the scale [-1, 1] or [0, 1] allowing them to be ranked. A similar concept to SPLOMs, the scores are visualized on a lower-left triangle heatmap so that patterns between pairs of features can be easily discerned for downstream analysis.
Evaluate Your Sklearn Models
Identify Potential Problems
There's no place like
cd ~
pip install yellowbrick
There's no place like home! You can pip install yellowbrick.
Testers Wanted!
bit.ly/ybtesters
Rebecca Bilbro
@rebeccabilbro
Bytecubed / District Data Labs
Visual Diagnostics for More Informed Machine Learning
Rebecca Bilbro
@rebeccabilbro
Bytecubed / District Data Labs
Welcome and thank you for joining me for my talk on visual diagnostics for more informed machine learning