Troubleshooting Terraform Executions Using Cloudify

In the previous articles you saw how to run Terraform with Cloudify (here and here), and how to deploy an existing Terraform module using Cloudify. Enhancements in version 6.3 of Cloudify make it easier than ever to upload your existing Terraform modules into Cloudify.

What You’ll Learn

In this article we will go through several troubleshooting techniques and will learn how to perform Terraform executions with Cloudify. It can be done in different ways, depending on the specific user role and access level: 

  1. Using Cloudify Console 
  2. Following through Terraform execution events using Cloudify CLI
  3. Inspecting Cloudify manager logs and debugging the manager server locally.

Prerequisites

You will need the following

  1. Working Cloudify Manager. 

Please see our documentation for information about installing your Manager.

  1. Cloudify CLI installed and configured to work with your Cloudify Manager.

Please see our documentation for information on how to install and configure it.

What Is Cloudify?

Cloudify is an open source, multi-cloud orchestration platform featuring unique technology that packages infrastructure, networking, and existing automation tools into certified blueprints. So you can automate, orchestrate and manage your workloads and environments, while using all your DevOps tools in one place.

Let’s start

At this point you have already packaged the Terraform template as part of the Blueprint package and uploaded it to Cloudify manager, or created your blueprint from the Terraform module, as discussed in previous articles, and are ready to create deployment and execute (install). Let’s begin with how to follow through Terraform execution in Cloudify and troubleshoot if needed.

Follow Terraform execution in Cloudify console 

Once execution starts you can follow and monitor it in several areas in the Cloudify console. 

Let’s take a look at an execution that completed successfully. Navigate to the Deployment information page, Service widget.

Services widget provides all of the relevant details on a single screen

Let’s take a look at an execution that was not completed successfully and how to find the root cause.

 

You can see the Execution Task Graph, which displays the exact series and order of steps that Cloudify takes to convert the declarative blueprint into an actual deployment.

If anything goes wrong, you can take a look at the Deployment Events/Logs and zoom into a specific error to troubleshoot and resolve the issues.

Alternatively, you can navigate to the Executions information page and see where the error occurred and the error message itself by clicking on the Show Error Details as shown below.

Finally, you can check the logs and events in the System Set-up, System Logs widget. Filter by blueprint to see all the logs and events. You can also apply additional filters like log levels, node instances, and etc. 

Please refer to the documentation for more information.

Thanks to detailed output of the Terraform execution in Cloudify logs we can see the exact Terraform error and conclude based on the error that the issue is related to security group rules defined in Terraform template, main.tf.

The next step is to review the main.tf, fix the issue and try again.

See below another example of an error during Terraform execution. You can see the exact command executed by Cloudify and the detailed Terraform error from resource creation aws_instance.example_vm

/opt/manager/resources/deployments/default_tenant/tf-01-from-a-file/terraform_o10t7u/terraform apply -auto-approve -no-color -input=false -var-file /opt/manager/resources/deployments/default_tenant/tf-01-from-a-file/cloud_resources_hnqkws/tmpg42ac2tj.json, exit_code: 1, stdout: aws_instance.example_vm: Creating..., stderr: Error: Error launching source instance: VPCResourceNotSpecified: The specified instance type can only be used in a VPC. A subnet ID or network interface ID is required to carry out the request.	status code: 400, request id: ffe2407d-b93e-4a06-a5fc-dc5e4d53404e  on main.tf line 8, in resource "aws_instance" "example_vm":   8: resource "aws_instance" "example_vm"

As the error suggests, it doesn’t find the default VPC in the us-east-1 region. 

Next, is to review the main.tf, fix the issue and try again.

Follow the Terraform execution using Cloudify CLI

First let’s retrieve the deployment id

cfy deployments list

Then, select the executions list for the specific deployment id

cfy executions list -d 57b35cea-26bc-445b-86f4-fc21c1fa20a6

<53f38d62-4107-45ed-848f-77c9dd3caa64> is the execution id

Now you can list the events for the specific execution id in sequential order 

cfy events list  53f38d62-4107-45ed-848f-77c9dd3caa64

<53f38d62-4107-45ed-848f-77c9dd3caa64> is the execution id

It is also possible to run it during execution with option –tail , which will show live logs

The next step is to fix the issue and try again.

Inspect Cloudify manager logs and troubleshoot on the manager server locally.

Note: this option is available for those who have ssh access to Cloudify manager server.

Once connected to Cloudify Manager server switch the current working directory to /var/log/cloudify/mgmtworker/logs

You will find a dedicated log file created for each deployment where you can follow through all the events and see errors if occurred. 

Please refer to Cloudify’s  documentation for more information about Cloudify logging 

Review specific execution log file, 57b35cea-26bc-445b-86f4-fc21c1fa20a6.log in this case. You can see the exact command that was executed when the error occurred. See below:

script_runner.tasks.ProcessException: command: /opt/manager/resources/deployments/default_tenant/57b35cea-26bc-445b-86f4-fc21c1fa20a6/terraform_9tzism/terraform apply -auto-approve -no-color -input=false -var-file /opt/manager/resources/deployments/default_tenant/57b35cea-26bc-445b-86f4-fc21c1fa20a6/cloud_resources_mar1r5/tmptlqxsyhd.json, exit_code: 1, stdout: random_id.suffix: Creating...random_id.suffix: Creation complete after 0s [id=N1rjrA]aws_security_group.allow_ssh_and_http: Creating..., stderr: Error: error updating Security Group (sg-0b5969c5a0a83c09e): error authorizing Security Group (egress) rules: InvalidGroup.NotFound: The security group 'sg-0b5969c5a0a83c09e' does not exist      status code: 400, request id: 5a704e6c-29dc-4115-b020-92ec8cf00911  on main.tf line 6, in resource "aws_security_group" "allow_ssh_and_http":   6: resource "aws_security_group" "allow_ssh_and_http" {

The error above suggests there is an error in the Terraform template file. For more convenience and in order to shorten debug time you can fine tune your Terraform template locally on the Cloudify manager. 

Just switch your current working directory to the Terraform template location and you can execute the same command you’ve seen in the log above.

Fix the main.tf file and run the command again.

During the debugging process you can see what resources created already using familiar Terraform commands.

In case there are more debug printouts required to find the issue, reset the Terraform debug level and retry again.

export TF_LOG=DEBUG

You can also add it in the blueprint in terraform.Module environment_variables section. Please refer to Cloudify documentation for more details.

comments

    Leave a Reply

    Your email address will not be published.

    Back to top