Protobuf / Protocol Buffers – A Simple Introduction

Overview:

In this tutorial, I would like to introduce you to the Protobuf / Protocol Buffers –  an interface definition language or interface description language (IDL)! IDL is a specification which describes software component’s API in a language-neutral way.

Problems With JSON:

In a distributed systems architecture, 1 microservice written in Java might want to communicate with another microservice written in JS. By default, all the microservices use REST architectural style for their communication by exchanging JSON over HTTP. JSON is light weight compared to XML which was widely used for SOAP, easily readable etc. However there are problems with JSON.

  • No strict schema enforcing. People can easily change the structure to break the contract and It can cause confusions among the team members. Sometimes such issues are very late to find and fix during the integration phase. For example, consider this scenario.
// team 1 expectation
{
    "order_id" : 1234
}

// team 2 understanding
{
    "orderId" : 1234
}

// team 3 understanding
{
    "order":{
        "id" : 1234
    }
}
  • JSON is text based. Human friendly. Not machine friendly. It takes time to serialize and deserialize when we use for microservices communication.

Protobuf/Protocol Buffers:

Protobuf is a Google’s language-neutral, platform-neutral method for deserializing & serializing structured data between 2 services. Protobuf can be used to define the contract for the communication between 2 systems. We define the contract in a simple .proto file and we could easily generate source code in any language.

Here the request and response payload structures are defined using .proto files. The source code to create/parse the request and response can be generated for Java or JavaScript. Service A and B can use the generated source as the dependency and developers can focus on implementing the business logic by using auto generated source without worrying about breaking the contract!


gRPC & Protobuf Course:

I learnt gRPC & Protobuf in a hard way. But you can learn them quickly on Udemy. Yes, I have created a separate step by step course on gRPC & Protobuf along with Spring Boot integration for the next generation Microservice development. Click here for the special link.


Protocol Buffers – Scalar Types:

Scalar types are basic building blocks in Protocol Buffers using which we could create other complex types.

Java Type Proto Type
int int32
long int64
boolean bool
double double
float float
String string
byte[] bytes

For example, let’s assume that we have to define a contract for an address as shown here. Here Address is a type which will contain 4 fields in the given order.

message Address {
        int32 postbox = 1;
        string street = 2;
        string city = 3;
        string country = 4;
}

We can use the Address type defined above to create other Types if we need. For example, we need to define the Person type with Address as shown here.

message Person {
        string first_name = 1;
        string last_name = 2;
        int32 age = 3;
        Address address = 4;        
}

The above contract is easily readable, light weight, language & platform independent.

Collection:

What about collection? We can define a person type with multiple contact numbers, this way.

message Phone {
        int32 area_code = 1;
        int64 mobile_number = 2;
}

message Person {
        string first_name = 1;
        string last_name = 2;
        int32 age = 3;
        Address address = 4;
        repeated Phone contact_numbers = 5;      
}

Map:

Key and value pairs are defined as shown here.

message Dictionary {
  map<string, string> translation = 1;
}

Enum:

Protobuf can support enums as well. Various order status can be defined as shown here. The 0 pending is the default Status if it is not defined.

enum Status {
  PENDING = 0;
  IN_PROGRESS = 1;
  SUCCESS = 2;
  FAILED = 3;
}
 message PurchaseOrder {
    string product_name = 1;
    double price = 2;
    int32 quantity = 3;
    Status status = 4;
}

Default Values:

When we use protocol buffers, they have these default values when any of the fields is not set.

Proto Type Default Values
int32 / any number type 0
bool false
string empty string
enum first value
repeated empty list

Packages & Options:

By default, all the proto files will have the syntax as proto3. If it is not set, then it is assumed to be proto2.

syntax = "proto3";

We can include below options.

option java_package = "com.vinsguru.common";
option java_multiple_files = true;
  • The package option indicates that when we auto generate source code for java from proto files, they should be placed under com.vinsguru.common package.
  • A single .proto file can have multiple message types. But when we generate source code, each message should be a separate class. It is indicated by the multiple files option.

The complete syntax of a proto file would be like this.

syntax = "proto3";

package common;

option java_package = "com.vinsguru.common";
option java_multiple_files = true;

enum Status {
  PENDING = 0;
  IN_PROGRESS = 1;
  SUCCESS = 2;
  FAILED = 3;
}
message PurchaseOrder {
  string product_name = 1;
  double price = 2;
  int32 quantity = 3;
  Status status = 4;
}
  • package common; This is for proto files package.

Importing Types:

When we have multiple message Types, we can create packages and It is possible to import a library of Types.

  • In the below image, I have common folder which contains multiple proto filess

  • person.proto might want to use a type defined under common package which can be done as shown here.
syntax = "proto3";

import "common/address.proto";

package person;

option java_package = "com.vinsguru.person";
option java_multiple_files = true;

message Person {
  string first_name = 1;
  string last_name = 2;
  repeated common.Address address = 3;
}

Protoc Installation:

To generate source code from the .proto files, we need a protoc tool to be installed. We can download and install from here. Choose the installation file based on your OS. Download and ensure that it is present in your PATH.

  • Lets assume that this is my directory structure under which I have multiple proto files as shown here in different directories/packages.

  • I navigate to the proto directory. Issue the below command.
protoc --java_out=./ common/*.proto person/*.proto
  • –java_out – indicates the path in which we need to store generated java source code
  • common/*.proto person/*.proto indicates multiple proto files path.
  • After issuing the command, we can see java class files are generated from the .proto files in the appropriate packages.

  • Now If I issue below command, I can create javascript source files which can be used for a node js application.
protoc --js_out=./ common/*.proto person/*.proto

Once the source code is generated, we can import the java package and use it in our application as shown here. For ex: These address and person classes were created from the above .proto files.

Address address1 = Address.newBuilder()
        .setStreet("123 main st")
        .setCity("New York")
        .build();

Address address2 = Address.newBuilder()
        .setStreet("456 non-main st")
        .setCity("Las Vegas")
        .build();

Person person = Person.newBuilder()
        .setFirstName("vins")
        .setLastName("guru")
        .addAddress(address1)
        .addAddress(address2)
        .build();

Summary:

Protobuf is a great way for schema definition and auto generating source code which can immediately be used in our projects. It ensures type safety, faster serialization and deserialization. The real power of protobuf can be understood much better when we use it with gRPC.

Learn more about gRPC.

  1. gRPC – A High-Performance RPC framework – A Simple Introduction
  2. gRPC – Unary API Service Implementation
  3. gRPC Server Streaming API In Java
  4. gRPC Client Streaming API In Java

Happy learning 🙂

Share This:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.