Smashing Python tech debt with Polymorphism

Code Review Doctor
7 min readAug 5, 2022

--

Polymorphism is a pillar of Object Oriented Programming. OOP underpins modern software development in many industries. This article will dive in to Polymorphism by using a real world non-contrived example. That’s right: Polymorphism will not be explained in terms of animals or Dogs or Ducks but in terms of a real world problem that was solved via Polymorphism.

After giving a super low level real-world example there will be discussion on higher level details of Polymorphism.

Abstracting away differences in third party APIs

Imagine a cloud-based tool that analyses GitHub pull requests and adds comments to the Pull Request for any issues found. This is achievable because GitHub provides an API that the code review tool can GET information from and POST the review to. Nice and simple: use an off-the-shelf API client like PyGithub.

However, things get more complicated when you find out the code review tool also supports Bitbucket and has plans to soon support GitLab. This complicates things because GitHub, Bitbucket, and GitLab of course do not have a unified API. To comment on a pull request:

  • Github: /repos/{org}/{repo}/pulls/{number}/reviews
  • Bitbucket: /repositories/{space}/{repository}/pullrequests/{number}/
  • GitLab: /projects/{repo}/merge_requests/{number}/discussions

These different Source Control Management systems (SCM) also have different strategies to authentication, authorization, different statuses for PRs (draft, open, etc). They even have different names for what GitHub calls PRs!

These difference were not an issue in the beginning because initially only GitHub was supported by the code review tool, meaning only one client was needed: GitHubClient. One simple client and a few method. As the pull request analysis tool grew and Bitbucket support was introduced, BitbucketClient was written.

Before the Bitbucket integration everywhere in the codebase used GitHubClient. Everywhere methods on GitHubClient were invoked so when the developer needed to invoke BitbucketClient to authenticate he had to write some extra code: he would need to make decisions using if conditions. Same story for checking permissions, same story for pulling the code, and for commenting on the pull request.

After a couple of weeks it was determined this approach was not growing well. Lots of duplication. Lots of creepy self aware “what am I!?” code. Conditionals everywhere. It was a mess of spaghetti. The problem was identified and a design session was held. Instead of if statements everywhere imagine a Polymorphic API client: two API clients with exactly the same methods but each talk to different APIs. This approach massively simplifies the problem by abstracting away the differences. To keep things simple, we will illustrate with only three methods: approve_pull_request, comment_on_pull_request, and check_permissions. You can probably guess what they do.

Abstract ApiClient

First we use the built-in abc module (which stands for Abstract Base Classes) to define an Abstract Base Class for the API client:

import abcclass AbstractClient(abc.ABC):
@abc.abstractmethod
def approve_pull_request(self, commit_id):
pass
@abc.abstractmethod
def comment_on_pull_request(self, comments, commit_id, body):
pass

The idea is that next we write a number concrete API client that sub-class AbstractClient. The advantage of this approach is a common interface is guaranteed: if GitHubClient or BitbucketClient are missing either approve_pull_request or comment_on_pull_request then python will throw an exception:

class GitHubClient(AbstractClient):
pass
>>> client = GitHubClient()TypeError: Can’t instantiate abstract class GitHubClient with abstract methods approve_pull_request, comment_on_pull_request

Abstracting away GitHub’s API

Lets look at the concrete implementation of the GitHub API client. To oversimplify, GitHub’s API documentation provides the guidance:

To determine if the application has the permissions needed the clone the repository (we’re trying to analyze the code in the pull request, after all) and check if the application permission to review and comment, the API client must:

  • Authenticate as an app and get a payload showing the permissions.
  • The payload is a dict that looks like {“checks”: “write”, “contents”: “read”}

To approve a pull request In GitHub API the API client must:

  • Authenticate as an app and get a Javascript Web Token
  • Perform a POST request to /{org}/{repo}/statuses/{commit_id} exposing the JWT to the API in a header.

To comment on a pull request the API must:

  • Authenticate as an app and get a Javascript Web Token
  • Perform a POST request to /repos/{org}/{repo}/pulls/{number}/reviews exposing the JWT to the API in a header.

Putting this together we end up with a permissions container and API client like so:

class GitHubPermissions(client.AbstractPermissions):
@property
def has_permission_read_content(self):
return self.permissions.get("contents") == "read"
@property
def has_permission_status_write(self):
return self.permissions.get("statuses") == "write"
class GitHubClient(client.AbstractClient):
def approve_pull_request(self, description):
response = self.session.post(
url=self.urls[‘pull_request’],
json={
"sha": commit_id,
"state": "success",
"context": “Code Review Doctor”
}
)
response.raise_for_status()

def comment_on_pull_request(self, comments, commit_id, body):
data = {
"commit_id": commit_id,
"event": "REQUEST_CHANGES",
"body": body,
"comments": comments,
}
response = self.session.post(
url=self.urls["reviews"],
json=data
)
response.raise_for_status()

Performing the code analysis

Now we have Polymorphic API clients the code that interacts with the clients can ignore any differences between the Bitbucket API, the GitHub API, and any future APIs we integrate with:

def review_pull_request(
permissions, api_client, source_commit, issues
):
# the real code does more stuff, but removed it for simplicity
if not permissions.has_permission_read_content:
logger.warning('does not have permissions: contents read')
return
if not permissions.has_permission_status_write:
logger.warning('does not have permissions: statuses write')
return
if issues:
api_client.comment_on_pull_request(
comments=issues,
commit_id=commit_id,
body='some improvements needed'
)
else:
api_client.approve_pull_request(commit_id=source_commit)

This code results in Python and Django fixes suggested right inside the pull requests:

You can check your GitHub or Bitbucket Pull Requests, scan your entire codebase for free online.

Abstracting away Bitbucket’s API

Now lets look at the concrete implementation of the Bitbucket API client. To oversimplify, Bitbucket’s API documentation provides the guidance:

To determine if the application has the permissions needed the clone the repository (we’re trying to analyzing the pull request, after all) and check if the application permission to review and comment, the API client must:

  • Authenticate as an app and get a payload showing the permissions.
  • The payload is a list that looks like ['pullrequest']

To approve a pull request In github’s API the API client must:

  • Authenticate as an app and get a Javascript Web Token
  • Perform a POST request to /repositories/{space}/{repo}/commit/{commit}/statuses/build/', exposing the JWT to the API in a header.

To comment on a pull request the API must:

  • Authenticate as an app and get a Javascript Web Token
  • Perform a POST request to /repositories/{space}/{repository}/pullrequests/{number}/comments exposing the JWT to the API in a header.

Putting this together we end up with a permission container and API client like:

class BitbucketPermissions(client.AbstractPermissions):
@property
def has_permission_read_content(self):
return "pullrequest" in self.permissions
@property
def has_permission_status_write(self):
return "pullrequest" in self.permissions
class BitbucketClient(client.AbstractClient):
def approve_pull_request(self, commit_id):
data = {"key": "code-review-doctor", "state": "SUCCESSFUL"}
response = self.session.post(
url=self.urls['status'], json=data
)
response.raise_for_status()
def comment_on_pull_request(self, comments, commit_id, body):
for comment in comments:
data = {
"content": {"raw": comment['body']},
"inline": {
"to": comment['line'], "path": comment['path']
},
}
response = self.session.post(
url=self.urls['comments'], json=data
)
response.raise_for_status()

High level overview of Polymorphism

The scenario mentioned above is not new. This happens every day in software development. If a person took a 20000 FT view of this situation, it would not take him less than an hour to identify the root cause of this situation:

Identical classes are not being grouped based on their characteristics and common functionalities

This is where Polymorphism comes into play. According to Polymorphism:

  1. Group classes of the same hierarchy or functionality.
  2. Identify the characteristics and functionalities of these classes.
  3. Create a base class that has these characteristics.
  4. Derive child classes out of this parent class.
  5. Override characteristics and functionalities in the derived classes.

Fixing the above situation

Now the team fixed the above situation by implementing a Polymorphic API client the following benefits materialized:

  1. Code maintenance: The resulting code can be maintained over a long period without the restriction of more and more API Classes. Every time a new Git is added to the system, its class is created that is derived from the base ApiClient class.
  2. Improved readability: Developers could skip long if and switch loops. The compiler on the runtime decides which control routine to call.
  3. Decoupled code: BitbucketClient does not interfere with GitHubClient, nor will GitHubClient interferes with GitLabClient. Classes become separate and isolated in their functionality.

While polymorphism comes with a lot of benefits, it can also have some cons. Because there are multiple implementations of a single class, it can lead to an added step in debugging the application development. During the reading of the code, the developer sometimes can be confused about what is happening here.

Applying Polymorphism

When you need to apply polymorphism, you will need to know the structure of your classes. There are two ways to organize your classes. The first one is to have a parent-child relationship also known as a hierarchical relationship. The second one is to implement the interfaces. Interfaces are not a Python thing so this will be skipped.

To fully understand both these methods there are a few concepts involved: abstract classes and concrete classes. Other languages will probably use Interfaces too.

Below we will see these concepts along with the methods to implement polymorphism.

Abstract Classes

In the above scenario AbstractApiClient is (no shock) an abstract class. The abstract class does not have any implementation of its own. It cannot be instantiated. Rather it is meant to be used by the derived classes to inherit the features of the abstract class.

Concrete Classes

The concrete classes are the classes that are derived from the base class or Abstract class. There can be any number of concrete classes from a single base class. In the scenario that we had seen above, GitHubClient, BitbucketClient, and GitLabClient are the concrete classes. They all inherit from the base abstract class called AbstractApiClient.

Conclusion: it made the code a pleasure to maintain

In this post, we saw what polymorphism is. Applying polymorphism can make the application code a lot more readable and maintainable. Further using polymorphism application developers can have a lot simpler code. This code can be easily maintainable and can be further extended by applying the necessary design patterns. If you have an application that you wish to maintain for a long time in the future, I will highly recommend going with polymorphism for a decoupled and logical application code.

Follow us on Twitter.

--

--