Protocol Buffers in Python

Wednesday March 29, 2017

Google's data interchange format, Protocol Buffers, is pretty straightforward.

Installing Protobuf

Python support for protocol buffers can be installed with pip:

pip install protobuf

The package is imported as google.protobuf, but you likely won't need to import it.

To define new protocol buffer formats, you'll also want the protoc tool. One way to get it is to find and install a system-specific package, but you can also get it by installing another Google Python package, grpcio-tools:

pip install grpcio-tools

This won't put a protoc executable in your PATH, but it will let you run protoc via Python, as python -m grpc_tools.protoc. For convenience, you can add an alias in your shell:

alias protoc='python -m grpc_tools.protoc'

Defining Protobuf Messages

The details of protocol buffer messages types are defined in .proto files like my_example.proto.

syntax = "proto3";

message Bottle {
  string note = 1;

Syntax version 3 has to be specified, as the default is still version 2.

We're defining a fairly dull message. A Bottle can contain one note, which is a string.

The number one there is not setting a default value, but specifying a numbering that's used internally when reading and writing binary representations of our messages.

Generating Code for our Protobuf

Assuming we're in the same directory as my_example.proto, we can use protoc to generate some Python code corresponding to the message type we defined:

protoc --proto_path=./ --python_out=./ my_example.proto

This will produce a new file (Syntax version 3 does not affect the filename here.)

Using Protobuf Messages in Python

With the generated code, we can use our message type in Python, and take advantage of features like serialization and deserialization.

import my_example_pb2

my_bottle = my_example_pb2.Bottle(note='Ahoy!')

with open('my_bottle.pb', 'wb') as f:

with open('my_bottle.pb', 'rb') as f:
    new_bottle = my_example_pb2.Bottle().FromString(

This looks a lot like examples with TFRecords, because they use the same mechanism.

More Information

The example here is deliberately minimal. For more detail, the Google protocol buffers site is quite good, with a very nice Python tutorial.

I'm working on Building TensorFlow systems from components, a workshop at OSCON 2017.