Perform daily developer activities (debug, rollback, diffs, logs)

Introduction

When discovering something is broken, the goal is always to get back to a working state as quickly as possible. This might include extra steps, like rolling back to a previous state if the actual fix results in too much downtime.

This tutorial will take you through a standard “break-fix” exercise from a developer’s perspective, utilizing the full support that an Internal Developer Platform provides. It’s written from the perspective of a developer with no specific skill level or prior experience, and the content will be applicable for other technically oriented personas like platform engineers or SREs.

Prerequisites

To get started you’ll need:

  • A source code repository (e.g. on GitHub)
  • A container image registry (e.g. Amazon ECS)
  • A Kubernetes cluster (e.g. Amazon EKS)
  • A Humanitec Platform Orchestrator organization with the Resource Definitions to connect the infrastructure (eg the Kubernetes cluster)

The easiest way to achieve a consistent and pre-integrated installation of these prerequisites is to follow our tutorial Setup the reference architecture in your cloud for the installation on the cloud provider of your choice.

You also need you to have:

  • An Application deployed using the Platform Orchestrator and Score

If you need to create a new application, follow the scaffolding tutorial to create one.

For the following step-by-step guide we assume you’re using an Internal Developer Platform reference architecture instance as recommended in the prerequisites. If using your variant you may find differences. The overall process will be the same as long as the components mentioned in the prerequisites are all present.

We also assume that you followed the scaffolding tutorial and have the expected repository contents. If you bring your own repository, you must translate the steps to your structure and contents.

Break it

To simulate a break-fix cycle, you first need something that’s broken. To get to that state, inject a semantically incorrect state into the Application configuration.

One easy way to do this is to ask the Orchestrator to apply requests and limits which are semantically inconsistent by setting a memory request higher than the limit.

Use the following snippet and insert it into your Score file (named score.yaml for the scaffolding app example).

    resources:
      requests:
        memory: "250Mi"
        cpu: "50m"
      limits:
        memory: "200Mi"
        cpu: "150m"

Insert the snippet below the line that starts with “image:”, with “resources“ being on the same indentation level as “image”.

Ignoring good practices for the sake of the example, commit and push directly to the main branch. For the scaffolding app example, a GitHub action will be triggered and perform another Deployment into the Platform Orchestrator, specifically into the “development” Environment of your sample Application.

The snippet is flawed as it’s requesting more memory than the limit constraint allows. This will break our application nicely — let’s have a look.

The GitHub action should have built and triggered the Deployment by now. In the Platform Orchestrator:

  • From the left-hand navigation, select “Applications”.
  • For your sample Application, select the “Development” environment.
  • You’ll see a “Failed” Active deploy.

Revert it

<roleplay mode ON>

Let’s assume that this is a nasty error in the codebase. You might need a few hours of debugging and further analysis before you can fix-forward and deploy a fresh version. But your users are already banging on the broken front door. You need to do something right now.

<roleplay mode /OFF>

You’re going to revert the Application to the previous deployed version to gain time while you create and apply a real fix:

  • Using the breadcrumb navigation at the top of the page, click on “Env Development”.
  • In the uppermost Deployment under “Previous deploys”, locate the last known healthy state showing a “Successful” status.
  • Select the “Redeploy” button for that Deployment.
  • A dialog window will show the difference between the currently running version and the one you want to deploy. You should see that the erroneous requests and limits will be removed.
  • Select “Re-deploy”. A new Deployment will be added to your Application which should take on the “Running” status after a short while.

Analyze it

Now that you have some more time to analyze the situation, let’s revisit the error message we saw earlier for the failed Deployment.

  • From the left-hand navigation, select “Applications”.
  • For your sample Application, select the “Development” environment.
  • Under “Previous deploys”, find the Deployment containing the error.
  • Expand “Errors” and click on the error message.

The Workload screen will open and display the full error message, which should look like this:

Deployment.apps "<your_app_name>" is invalid:
spec.template.spec.containers[0].resources.requests:
Invalid value: "250Mi": must be less than or equal to memory limit of 200Mi

Fix it

<roleplay mode ON>

You quickly redeployed a previous version which resolved the immediate incident but also made the Application lose all the features deployed with the new version. You worked hard on debugging and now a new version is ready to be deployed with all the features and none of the bugs.

<roleplay mode /OFF>

Go back to your score.yaml and modify either the memory limits or requests. Make sure that whatever side you modify, requests <= limits.

git commit and push to kick-off the GitHub action once more to deploy the fixed version. This time the fix is permanent.

Check the result and in particular, have a look at the container logs:

  • In the left-hand menu, select Applications.
  • For your sample Application, select the “Development” environment.
  • Select the name of the “Active deploy”.
  • Select <your_app_name> of the Workload.
  • Select “Containers” → “frontend” (if you’re not using the scaffolding example, the container name might be different).

You’ll see the container log stream displayed in the UI. Note that further down the page, the newly defined container resources are also shown.

Recap

Congratulations! You have successfully completed the tutorial and learned how the Humanitec Platform Orchestrator helps you to diagnose an unhealthy application state quickly and safely. You’re able to do emergency rollbacks with support from the Orchestrator and thus keep the user impact and/or outage as low as possible.

Next Steps

Learn how to provide more support for the daily life of developers in our Ephemeral Environments tutorial.

Top