Happy New Year and welcome to 2019, the final year of Python 2 support! In honor of the new year, here’s a short post on how to convert pickles from your legacy Python 2 codebase to Python 3.
The first issue you’re likely to encounter is an encoding issue:
import pickle with open("old_pickle.pkl", 'rb') as f: loaded = pickle.load(f)
Traceback (most recent call last): File "<stdin>", line 2, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0x95 in position 0: ordinal not in range(128)
This is because pickle’s default is to decode all string data as ascii, which fails in this case. Instead we have to convert Python 2 bytestring data to Python 3 using either
encoding="bytes", or for pickled NumPy arrays, Scikit-Learn estimators, and instances of
time originally pickled using Python 2,
encoding="latin1". More on this here.
with open("old_pickle.pkl", 'rb') as f: loaded = pickle.load(f, encoding="latin1")
with open("old_pickle.pkl", 'rb') as f: loaded = pickle.load(f, encoding="bytes")
Python 2 Objects vs Python 3 Objects
The next problem you might encounter is a
KeyError, when Python 3 pickle fails to recognize the key
b'ObjectType' from the Python 2 pickle:
with open("old_pickle.pkl", "rb") as f: loaded = pickle.load(f, encoding="latin1")
Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/dill/_dill.py", line 568, in _load_type return _reverse_typemap[name] KeyError: b'ObjectType'
This occurs because the
ObjectType type was removed in Python 3. The solution is to use the
dill library to recast the Python 2
ObjectType to a Python 3
import dill dill._dill._reverse_typemap['ObjectType'] = object
From the Command Line
Here’s a little command line application I use for quickly converting these legacy pickles over to Python 3:
Gotchas and Addenda
Be warned that you may still get library-specific warnings, for instance when we try to unpickle an old version of a Scikit-Learn estimator using a newer version of the library:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/base.py:251: UserWarning: Trying to unpickle estimator TfidfTransformer from version 0.20.2 when using version 0.20.1. This might lead to breaking code or invalid results. Use at your own risk. UserWarning)
Additionally, if you’re dealing with a case where you’re extracting pickles from a pickled Spyder session, you’ll have to use
dill to load the session first, then load the individual pickles, then save those pickles individually, so that they can be used outside of a Spyder console:
import dill import pickle ## load the session, which has everything locked in dill.load_session("session_pickle.pkl") ## Load the actual pickles [classifier, count_vectorizer, tfidf_transformer] = pickle.loads(model_components) ## Save the pickles individually with open("classifier_p2.pkl", "wb") as x: pickle.dump(classifier, x) with open("count_vectorizer_p2.pkl", "wb") as y: pickle.dump(count_vectorizer, y) with open("tfidf_transformer_p2.pkl", "wb") as z: pickle.dump(tfidf_transformer, z)
Once you’ve done this, you can go ahead and convert the individual Python 2 pickles over to Python 3.