Python Serialization

We offer you a brighter future with FREE online courses - Start Now!!

There are many cases where you would like to store complex data in a file or share the data. These are the situations where serializations come into use. In this article, we will learn about Python serialization and implementing it using the pickle module. Then we will also see in brief serializing and deserializing using the other modules.

What is Serialization in Python?

As said above, serializing is the process of converting the data into the form, byte stream, in which the data can be stored. This process is also called pickling or flattening or marshaling. And the reverse process of converting the byte stream into object form is called deserialization or unpickling.

For this purpose, Python provides the following three modules:
1. Pickle Module
2. JSON Module
3. Marshal Module

This article mainly covers the Pickle module which is operationally the simplest way to store complex data in a special form.

Pickle Module in Python

Pickle is a module in Python for serializing and deserializing. It is the faster and simpler choice for this purpose if we do not need any human-readable format. To use this we should first import using the following command.

import pickle

It contains some protocols, the rules that are used to construct and deconstruct objects from/to binary format. This module contains the below five protocol versions:

Protocol Version Description 
0 Original, human-readable format protocol. It is backward-compatible with earlier versions of Python.
1 Old and binary format. It is also compatible with earlier versions of Python.
2 It is added in Python 2.3 version. It provides efficient pickling of new-style classes.
3 It is introduced in Python 3.0 and works with 3.x versions.  It supports byte objects 
4 Introduced in Python 3.4 version. It supports very large objects, more types of objects. It also contains data format optimizations.

Python Pickle Interfaces

The pickle module contains the following constants:

1. pickle.HIGHEST_PROTOCOL: This is an integer that represents the highest protocol version available. This is the protocol value that is generally passed to the functions used for pickling and unpickling.

2. pickle.DEFAULT_PROTOCOL: This is an integer that represents the default protocol used for pickling. This value may be less than the value of the highest protocol.

We can find these values in the way shown in the below example.
Example of funding the constants of the pickle module:

import pickle
print("Highest protocol: ",pickle.HIGHEST_PROTOCOL)
print("Default protocol: ",pickle.DEFAULT_PROTOCOL)

Output:

Highest protocol: 5
Default protocol: 4

This module provides the following four methods:

1. dump(): This method is used to serialize to an open file object
2. dumps(): This method is used for serializing to a string
3. load(): This method deserializes from an open-like object.
4. loads(): This does deserialization from a string.

We will discuss each of these methods in further sections.

Python Dump Functions

As said above, the dump function is used to write a pickled version of the object into a file. The syntax of this function is:

pickle.dump(obj, file, protocol = None, *, fix_imports = True) 

1. The obj is the object to be serialized

2. The file is the file name in which the converted result is to be stored

3. The optional argument protocol defines the protocol to be used while processing the object. We can give any version from 0 to the HIGHEST_PROTOCOL. If not mentioned, the default protocol is used.

4. If the fix_imports is True and protocol is less than 3, then the mapping is done from the new Python 3 names to the old module names used in Python 2. This allows the pickle data stream to be readable with Python 2 version.

Example of Python dump():

import pickle

content='PythonGeeks'
f=open('file.txt','wb')  #opened the file in write and binary mode 
pickle.dump(content,f) #dumping the content in the variable 'content' into the file
f.close() #closing the file

Output:
In the file we can see the following information:

€• Œ
PythonGeeks”.

Python Dumps Function

This function returns the pickled representation of the given data in bytes object form. The syntax of the function is :

pickle.dumps(obj, protocol = None, *, fix_imports = True)

This syntax is similar to the dump() function, except we do not have a file argument here.

Example of dumps() in Python:

import pickle

content=[ { 'a':1, 'b':2, 'c':3.0 } ] 
pickle.dumps(content) #dumping the content in the variable 'content' 

Output:

b’\x80\x04\x95!\x00\x00\x00\x00\x00\x00\x00]\x94}\x94(\x8c\x01a\x94K\x01\x8c\x01b\x94K\x02\x8c\x01c\x94G@\x08\x00\x00\x00\x00\x00\x00ua.’

Python Load Function

This function is used to read the information in picked form from a file and reconstruct it into the original form. The syntax of this function is :

pickle.load(file, *, fix_imports = True, encoding = “ASCII”, errors = “strict”) 

This function takes the file from which the data has to be read. Other arguments are optional and have a default value. Let us see an example of reading the data we stored in the dump() function example.

Example of load() in Python:

f=open('file.txt','rb') #opening the file to read the data in the binary form
pickle.load(f)

Output:

‘PythonGeeks’

From the output, we can see that we got the original content we gave to the file.

Python Loads Function

This function is used to read the pickled representation from an object and returns the reconstructed version. The syntax is:

pickle.loads(bytes_object, *, fix_imports = True, encoding = “ASCII”, errors = “strict”) 

This is similar to the dumps() function except that here we pass the bytes object rather than a normal object.

Example of loads() in Python:

import pickle

content = [ { 'a':1, 'b':2, 'c':3.0 } ] 
print ('Before pickling the content is:',content)
 
pickled_content = pickle.dumps(content) #dumping the content into an object 
 
reconstructed_content  = pickle.loads(pickled_content) #loading back the content
print ('After pickling the content is:',reconstructed_content)

Output:

Before pickling the content is: [{‘a’: 1, ‘b’: 2, ‘c’: 3.0}]
After pickling the content is: [{‘a’: 1, ‘b’: 2, ‘c’: 3.0}]

We can see that we got the same content after reconstruction.

Exceptions in Python Pickle

There are three important exceptions in Pickle that we need to know and these are:

1. exception pickle.PickleError: This exception inherits from Exception. It is the parent class for all other exceptions raised in this module.

2. exception pickle.PicklingError: This exception inherits from the PickleError. When the Pickler comes across any unpicklable object, it raises this error.

3. exception pickle.UnpicklingError: This exception also inherits from PickleError. This is raised when any problem like data corruption or a security violation occurs while unpickling an object.

Some other executions include:
1. EOFError
2. TypeError
3. ValueError
4. AttributeError
5. ImportError
6. IndexError

Classes imported in Pickle

The pickle module imports two classes and these are Pickler and Unpickler. As the names suggest, they can handle the serialization and deserialization operations respectively. Let us discuss these in detail in this section.

1. Pickler Object:

This object takes a binary file as input and writes the information in the data stream. This class can be instantiated by following the below syntax:

Pickler(file, protocol=None, *, fix_imports=True)

a. The file must have the mode of writing in the data in byte stream form

b. The protocol decides the protocol used while pickling. If nothing is mentioned, it uses the default protocol

c. If the fix_imports is True and protocol is less than 3, then the mapping is done from the new Python 3 names to the old module names used in Python 2. This allows the pickle data stream to be readable with Python 2 version.
This object has the following methods:

a. dump(obj): This function writes the pickled representation of specified object ‘obj’ by opening the file object given in the constructor of the Pickler. This is similar to the load() function.

b. persistent_id(obj): This function does nothing by default. It allies the subclass to override it. If it returns None, then the object obj is pickled as usual.
Otherwise, the Pickler emits the returned value as a persistent id for obj whose context is defined by the Unpickler.persistent_load().

c. Dispatch_table: The dispatch table is a mapping holding classes as keys and reduction functions as values.
By default, a pickler object does not have any dispatch_table. It uses the global dispatch table that the copyreg module manages.

d. Fast: This enables the fast mode when set True. This mode disables the usage of memo and therefore speeds up the pickling process by not generating superfluous PUT opcodes. This is no longer advised to be used.

2. Unpickler Object:

This function takes a binary file and reads the pickled data stream in the file. This can be instantiated by:

Unpickler(file, *, fix_imports = True, encoding = “ASCII”, errors = “strict”) 

a. The file should be opened in the binary read mode

b. fix_imports, encoding, and errors control the compatibility support for pickle streams generated by Python 2. These are optional arguments and have their corresponding default values.
This class has the following methods:

a. load() – This function reads a pickled object representation from the file object and returns the reconstructed form.

b. persistent_load(pid) – This raises an UnpicklingError by default. However, when we define it, it gives the persistent id if the id is valid. Otherwise, it gives an Unpickling Error.

c. find_class(module, name) – This function imports the specified module if required and returns the object called name from it. Here, the module and name arguments are str objects.

Types of Objects we can pickle and Unpickle in Python

The pickle module is compatible with many of the object types including:
1. None, True, and False

2. Integers, float values, and complex numbers

3. Strings, bytes, and bytearrays

4. Tuples, lists, sets, and dictionaries that contain only picklable objects

5. The functions defined at the top level of a module using the def keyword (not lambda)

6.  The built-in functions defined at the top level of a module

7. The classes that are defined at the top level of a module

8. Instances of the classes whose __dict__ or the result of calling __getstate__() is picklable

Python JSON Module

JSON stands for JavaScript Object Notation. Python contains a module named JSON to deal with the JSON data and work with JSON files. It is a widely used format for the exchange of lightweight data and this data is human readable. Using this module we can serialize and deserialize the bool, dict, int, float, list, string, tuple, none, etc. data types.

Example of serializing and deserializing using JSON:

import json   
 
info = {
  "id": "5",
  "name": "ABC",
  "pass": "abc#123"
}
    
# Serializing using json 
serialized_info = json.dumps(info)
print('Serialized data:',serialized_info)

#Deserializing using json
deserialized_info = json.loads(serialized_info) #get the information in the form of a dictionary
print('Deserialized data:',deserialized_info)
   
print(deserialized_info['id']) #printing one of the element by indexing the dictionary using key

Output:

Serialized data: {“id”: “5”, “name”: “ABC”, “pass”: “abc#123”}
Deserialized data: {‘id’: ‘5’, ‘name’: ‘ABC’, ‘pass’: ‘abc#123’}
5

Python Marshal Module

This module is the oldest one used for serialization and deserialization purposes. This is mainly used to read and write the compiled byte code into the file. It is always recommended to never unmarshal the data from unknown/untrusted/unauthorized sources.

Example of serializing and deserializing using marshal:

import marshal 
 
info = { "id": "5",
  "name": "ABC",
  "pass": "abc#123"}
 
# dumping the information. It returns the byte object stored in variable 'marshel_obj'
marshel_obj = marshal.dumps(info)   
print('Serialized Object: ', marshel_obj)
 
# loading the byte object into original value
unmarshal_obj = marshal.loads(marshel_obj)   
print('Deserialized Object : ', unmarshal_obj)

Output:

Serialized Object: b’\xfb\xda\x02id\xda\x015\xda\x04name\xda\x03ABC\xda\x04pass\xfa\x07abc#1230′
Deserialized Object : {‘id’: ‘5’, ‘name’: ‘ABC’, ‘pass’: ‘abc#123’}

Comparing Pickle with JSON and Marshal in Python

Python Pickle vs Marshal

The differences between the two modules are mentioned below:

1. Unlike marshal, the pickle can track the objects it has serialized. So, when the same objects are referenced again it doesn’t have to serialize them

2. Marshal does not have methods to serialize user-defined classes and their instances. If the class definition is importable and it is in the same module where we stored the object, then pickle can save and restore the class instances.

3. In Python, serialization done using pickle is in backward-compatible format. But this is not the same with the marshal.

Python Pickle vs JSON

JSON is a standard module used for serialization and deserialization purposes. The differences between these two modules are shown below:

1. Pickle serializes the objects into a binary format. Whereas JSON converts it into a text format.

2. Since the pickled data is in binary, it isn’t human-readable, but JSON’s is integrable. Even the marshal’s converted format is not readable.

3. Pickle can represent almost all Python data types. However, JSON can only represent a few in-built types.

Quiz on Serialization in Python

Conclusion

In this article, we discussed serialization and deserialization. Then we saw the pickle module in detail. We also learned the other two modules and saw the comparison between the modules.
Hoping that the covered topics gave a clear picture of serialization in Python. Happy learning!

Did you like our efforts? If Yes, please give PythonGeeks 5 Stars on Google | Facebook

Leave a Reply

Your email address will not be published. Required fields are marked *