Terraform Essentials
— Terraform, Infrastructure-as-Code, Introduction — 16 min read
Problem with manual configuration
Manually configuring cloud infrastructure allows us to easily start using new service offerings to quickly prototype architectures, however it comes with many downsides:
- Its easy to mis-configure a service through human error
- Its hard to manage the expected state of configuration for compliance
- Its hard to transfer configuration knowledge to other team members
Infrastructure as Code (IaC) - this involves writing a configuration script to automate creating, updating or destroying cloud infrastructure. IaC is a blueprint of our infrastructure. It allows us to easily share, version or inventory our cloud infrastructure.
Popular Infrastructure as Code tools
- Declarative type
- What you see is what you get (Explicit). More verbose, but zero chance of mis-configuration
- More verbose, but zero chance of mis-configuration
- Uses scripting languages. Ex: JSON, YAML, XML
- Examples: ARM Templates and Azure Blueprints for Azure, CloudFormation for AWS, Cloud Deployment Manager for Google Cloud, Terraform
- Imperative
- You say what you want, and the rest is filled in (Implicit)
- Less verbose, you could end up with misconfiguration
- Does more than Declarative
- Uses programming languages ex: Python, Ruby, JavaScript
- Examples: AWS Cloud Development Kit (CDK), Pulumi
Infrastructure Lifecycle
It is a number of clearly defined and distinct work phases which are used by DevOps Engineers to plan, design, build, test, deliver, maintain and retire cloud infrastructure. Day 0-2 is a simplified way to describe phases of an infrastructure lifecycle. Day 0 - Plan and Design. Day 1 - Develop and Iterate. Day 2 - Go live and maintain. Days do not literally mean a 24 hour days and is just a broad way of defining where a Infrastructure project would be.
- Reliability - IaC makes changes idempotent, consistent, repeatable and predictable. Idempotent means no matter how many times you run IaC, you will always end up with the same state that is expected
- Manageability - enables mutation via code; revised, with minimal changes
- Sensibility - avoid financial and reputational losses to even loss of life when considering government and military dependencies on infrastructure
Common Terms
- Provisioning - to prepare a server with systems, data and software, and make it ready for network operation. Using Configuration Management tools like Puppet, Ansible, Chef, Bash scripts, PowerShell or Cloud-Init you can provision a server. When you launch a cloud service and configure it you are "provisioning".
- Deployment - it is the act of delivering a version of your application to run a provisioned server. Deployment could be performed via AWS CodePipeline, Harness, Jenkins, Github Actions, CircleCI
- Orchestration - it is the act of co-ordinating multiple systems or services. It is a common term when working with microservices, containers and kubernetes. Orchestration could be Kubernetes, Salt, Fabric
- Configuration Drift - it is when provisioned infrastructure has an unexpected configuration change due to:
- team members manually adjusting configuration options
- malicious actors
- side effects from APIs, SDK or CLIs Configuration Drift going unnoticed could be loss or breach of cloud services and residing data or result in interruption of services or unexpected downtime. To detect configuration drift, use a compliance tool that can detect misconfiguration, example: AWS Config, Azure Policies, GCP Security Health Analytics, built-in support for drift detection like AWS CloudFormation Drift Detection, or storing the expected state using Terraform state files. To correct configuration drift, use a compliance tool that can remediate (correct) misconfiguration, like AWS Config, or use Terraform refresh and plan commands, manually correcting the configuration (not recommended), or tearing down and setting up the infrastructure again. To prevent configuration drift, keep immutable infrastructure i.e., always create and destroy but never reuse; use Blue/Green deployment strategy. Servers should never be modified after they are deployed, done by baking AMI Images or containers via AWS Image Builder or HashiCorp Packer, or a build server like GCP Cloud Run. Use GitOps to version control IaC, and peer review every single change via Pull Requests to infrastructure.
- GitOps - is when you take Infrastructure as Code (IaC) and you use a git repository to introduce a formal process to review and accept changes to infrastructure code; once that code is accepted, it automatically triggers a deploy
- Change Management - it is a standard approach to apply change, and resolving conflicts brought about by change. In the context of IaC, it is the procedure that will be followed when resources are modified and applied via configuration script
- Change Automation - it is a way of automatically creating a consistent, systematic and predictable way of managing change request via controls and policies. Terraform uses Change Automation in the form of Execution Plans and Resource Graphs to apply and review complex changesets. Change automation allows you to know exactly what Terraform will change and in what order, avoiding many possible human errors.
- ChangeSet - it is a collection of commits that represent changes mande to a versioning repository. IaC uses ChangeSets so you can see what has changed by who over time
Introduction
Terraform is an infrastructure as code (IaC) tool that allows us to build, change, and version infrastructure safely and efficiently. This includes both low-level components like compute instances, storage, and networking, as well as high-level components like DNS entries and SaaS features. It is an infrastructure as code tool that lets us define both cloud and on-prem resources in human-readable configuration files that we can version, reuse, and share. We can then use a consistent workflow to provision and manage all of your infrastructure throughout its lifecycle. Terraform is declarative but the Terraform Language features imperative-like functionality. Notable features of TerraForm include installable modules, planning and predicting changes, dependency graphing, state management, provisioning infrastructure in familiar languages such as AWS CDK, registry with 1000+ providers.
Terraform for windows downloads an executable file, which should be put on PATH environment variable (so that it can be invoked by CLI). Terraform is written in a language called HCL (hashicorp configuration language), and stored in a file with .tf
extension. To use Terraform, we need to define a provider. Provider is a plugin allow us to talk to specific set of APIs of an IaaS/Paas/SaaS service. Terraform relies on plugins called "providers" to interact with cloud providers, SaaS providers, and other APIs.
Terraform configurations must declare which providers they require so that Terraform can install and use them. Additionally, some providers require configuration (like endpoint URLs or cloud regions) before they can be used. Each provider adds a set of resource types and/or data sources that Terraform can manage. Every resource type is implemented by a provider; without providers, Terraform can't manage any kind of infrastructure. Most providers configure a specific infrastructure platform (either cloud or self-hosted). Providers can also offer local utilities for tasks like generating random numbers for unique resource names. Terraform is logically split into 2 main parts: terraform core, which uses remote procedure calls (RPC) to communicate with Terraform Plugins, and terraform plugins, which expose an implementation for a specific service, or provisioner. Terraform core is a statically-compiled binary writter in Go programming language.
The core Terraform workflow consists of three stages:
- Write: We define resources, which may be across multiple cloud providers and services. For example, we might create a configuration to deploy an application on virtual machines in a Virtual Private Cloud (VPC) network with security groups and a load balancer. Terraform reads all files having
.tf
extension, but standard practice is to keep it tomain.tf
if only one file is being used. - Plan: Terraform creates an execution plan describing the infrastructure it will create, update, or destroy based on the existing infrastructure and our configuration.
- Apply: On approval, Terraform performs the proposed operations in the correct order, respecting any resource dependencies. For example, if we update the properties of a VPC and change the number of virtual machines in that VPC, Terraform will recreate the VPC before scaling the virtual machines.
Terraform Use Cases
- IaC for Exotic Providers - Terraform supports a variety of providers outside of GCP, AWS, Azure and sometimes is the only provider. Terraform is open-source and extendable so that any API could be used to created IaC tooling for any kind of cloud platform or technology. Ex: Heroku, Spotify playlists
- Multi-Tier Applications - Terraform by default makes it easy to divide large and complex applications into isolated configuration scripts (modules). It has a complexity advantage over cloud-native IaC tools for its flexibility while retaining simplicity over imperative tools
- Disposable Environments - Easily stand up an environment for a software demo or a temporary development environment
- Resource Schedulers - Terraform is not just defined to infrastructure of cloud resources but can be used to dynamically schedule Docker containers, Hadoop, Spark and other software tools. We can provision our own scheduling grid.
- Multi-cloud Deployment - Terraform is cloud-agnostic and allows a single configuration to be used to manage multiple providers, and to even handle cross-cloud dependencies
Example: Using the AWS Provider
terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 3.0" } }}
# Configure the AWS Providerprovider "aws" { region = "us-east-1"}
The AWS provider offers a flexible means of providing credentials for authentication. The following methods are supported, in this order:
- Static credentials: Static credentials can be provided by adding an
access_key
andsecret_key
in-line in the AWS provider block:
provider "aws" { region = "us-west-2" access_key = "my-access-key" secret_key = "my-secret-key"}
- Environment variables: We can provide our credentials via the
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
, environment variables, representing our AWS Access Key and AWS Secret Key, respectively. TheAWS_DEFAULT_REGION
andAWS_SESSION_TOKEN
environment variables are also used, if applicable:
$ export AWS_ACCESS_KEY_ID="anaccesskey"$ export AWS_SECRET_ACCESS_KEY="asecretkey"$ export AWS_DEFAULT_REGION="us-west-2"$ terraform plan
- Shared credentials/configuration file: We can use an AWS credentials or configuration file to specify our credentials. The default location is
$HOME/.aws/credentials
on Linux and macOS, or%USERPROFILE%\.aws\credentials
on Windows. - CodeBuild, ECS, and EKS Roles: Using an IAM Task Role.
- EC2 Instance Metadata Service (IMDS and IMDSv2): If we're running Terraform from an EC2 instance with IAM Instance Profile using IAM Role, Terraform will just ask the metadata API endpoint for credentials.
Creating, Modifying and Deleting Resources
Terraform provides the same syntax for creating resources, regardless of cloud service we're using, like AWS, GCP or Azure. Creating resources has the following syntax:
resource "<provider>_<resource_type>" "resource_name" { config options in the form of key value pairs.... key1 = "value1" key2 = "value2"}
resource "aws_instance" "my-ec2-server" { ami = "ami-085925f297f89fce1" instance_type = "t2.micro"
tags = { Environment = "sometesting" }
}
Here resource_name
is not a name used in AWS, it is only for referencing within Terraform. First command to run is terraform init
which looks at all the .tf
config files defined, and the providers declared in them, and download the plugins to interact with necessary APIs for the providers specified. The next command is terraform plan
which does a dry run of code to display all the changes that will take place in the cloud service. terraform validate
ensures that all the declared types and values are valid, and required attributes are present. Then terraform apply
makes the changes specified in the cloud provider. Since Terraform is written in a declarative manner, running terraform apply
again would not create another resource in cloud. It will just refresh the state i.e., check if the resource is still alive and running in the cloud. Comments are written with hash (#) in Terraform. An Execution Plan is a manual review of what will add, change or destroy before you apply changes by using terraform apply
. We can visualize an execution plan as a graph using the terraform graph
command, which will output a GraphViz file. Terraform builds a dependency graph from the Terraform configurations, and walks this graph to generate plans, refresh state, and more. terraform fmt
formats the terraform config file to rewrite Terraform configuration files to a canonical format and style.
terraform plan
displays 3 kinds of changes made to cloud resources: creation marked with +, updation/modification marked with ~, and deletion marked with -. terraform destroy
deletes resources. By default, for deletion, Terraform destroys all resources declared within it, unless marked with some parameters for deleting a specific resource. Another way to delete a specific resource is to comment out its declaration and run terraform apply
.
Referencing Resources
We can reference other resources that are defined within code by using the name defined for those resources. The order in which referenced resources are declared does not matter; Terraform automatically infers which needs to be created first. --auto-approve
is used to skip prompt for running terraform commands. Example:
resource "aws_vpc" "my-vpc" { cidr_block = "10.0.0.0/16" tags = { Name = "myProdVPC" # Special tag that is used to name VPC within AWS }}
resource "aws_subnet" "my-subnet" { vpc_id = "${aws_vpc.my-vpc.id}" # every resource has an ID that can be accessed cidr_block = "10.0.1.0/24"
tags = { Name = "MyMainSubnet" }}
Terraform folder structure
.terraform
folder gets created when any plugins are initialized (terraform init
), and necessary plugins are installed. terraform.tfstate
represents all the state for terraform. It keeps track of all the resources created and deleted using Terraform. Variables can be defined in terraform.tfvars
file.
Example project
- Create VPC, Internet Gateway, Custom Route Table and a Subnet
- Associate Subnet with Route Table
- Create Security Group to allow ports 22, 80, 443, and a network interface with an IP in the subnet that was created in 2nd step.
- Assign an elastic IP to the network interface
- Create Ubuntu server and install/enable Apache2
For this, AWS requires us to use key pairs to connect to EC2 instances. Elastic IP relies on the deployment of Internet Gateway, which Terraform cannot figure out to create it first. So, depends_on
flag is used to explicitly set the order of creation of resources.
# 1. Create vpcresource "aws_vpc" "my-prod-vpc" { cidr_block = "10.0.0.0/16" tags = { Name = "production" }}
# 2. Create Internet Gatewayresource "aws_internet_gateway" "my-gateway" { vpc_id = aws_vpc.my-prod-vpc.id}
# 3. Create Custom Route Tableresource "aws_route_table" "my-prod-route-table" { vpc_id = aws_vpc.my-prod-vpc.id
route { cidr_block = "0.0.0.0/0" # default route. all traffic gets sent to the internet gateway gateway_id = aws_internet_gateway.my-gateway.id }
route { ipv6_cidr_block = "::/0" # it is same as 0.0.0.0/0 gateway_id = aws_internet_gateway.my-gateway.id }
tags = { Name = "Prod" }}
# 4. Create a Subnet
resource "aws_subnet" "my-subnet-1" { vpc_id = aws_vpc.my-prod-vpc.id cidr_block = "10.0.1.0/24" availability_zone = "us-east-1a" tags = { Name = "prod-subnet" }}
# 5. Associate subnet with Route Tableresource "aws_route_table_association" "a" { subnet_id = aws_subnet.my-subnet-1.id route_table_id = aws_route_table.my-prod-route-table.id}
# 6. Create Security Group to allow port 22,80,443resource "aws_security_group" "allow_web" { name = "allow_web_traffic" description = "Allow Web inbound traffic" vpc_id = aws_vpc.my-prod-vpc.id
ingress { description = "HTTPS" from_port = 443 # allows to specify range of ports to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # best practice is to put our own IP address for this cidr block }
ingress { description = "HTTP" from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }
ingress { description = "SSH" from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }
egress { from_port = 0 # 0 means allow all ports for egress to_port = 0 protocol = "-1" # this means any protocol is allowed cidr_blocks = ["0.0.0.0/0"] } tags = { Name = "allow_web" }}
# 7. Create a network interface with an ip in the subnet that was created in step 4resource "aws_network_interface" "my-web-server-nic" { subnet_id = aws_subnet.my-subnet-1.id private_ips = ["10.0.1.50"] security_groups = [aws_security_group.allow_web.id]}
# 8. Assign an elastic IP to the network interface created in step 7resource "aws_eip" "one" { vpc = true network_interface = aws_network_interface.my-web-server-nic.id associate_with_private_ip = "10.0.1.50" depends_on = [aws_internet_gateway.my-gateway] # explicitly set the order of creation of resources}
output "server_public_ip" { value = aws_eip.one.public_ip # this will be shown in the form of: server_public_ip = 54.158.243.220}
# 9. Create Ubuntu server and install/enable apache2resource "aws_instance" "my-web-server-instance" { ami = "ami-085925f297f89fce1" instance_type = "t2.micro" availability_zone = "us-east-1a" # AWS picks a random AZ if we don't specify one key_name = "main-key" # key pair to access ec2 network_interface { device_index = 0 # we can attach multiple interfaces, this is the first one network_interface_id = aws_network_interface.my-web-server-nic.id }
user_data = <<-EOF #!/bin/bash sudo apt update -y sudo apt install apache2 -y sudo systemctl start apache2 sudo bash -c 'echo your very first web server > /var/www/html/index.html' EOF tags = { Name = "web-server" }}
output "server_private_ip" { value = aws_instance.web-server-instance.private_ip}
output "server_id" { value = aws_instance.web-server-instance.id}
Terraform State Commands
terraform state list
command lists out all the resources that has terraform state (created with terraform). To see one of the resources in detail, use terraform state show <resource-name>
command, where resource-name is the name of the resource (example: aws_subnet.my-subnet-1). terraform output
command is used to show the details for any of the resources created by terraform just after creation. The value chosen for output will be displayed using the name declared for the value. If we don't wish to change state of the resources (especially in production), but just want to see the outputs, we can run terraform refresh
command to check the outputs.
In certain cases where we want to delete an individual resource or try to roll out staged deployments where only certain resources are to be deployed, we can target individual resources by passing --target
flag with terraform apply or destroy command, and provide the name of the resource.
Terraform Variables
Terraform allows the use of variables so that we can reuse values throughout the code without the need for repeating. To define a variable, use the following syntax:
variable "my_variable" { description = "some description of the variable" default = "hello_there" # default value if no value is passed for the variable type = string # use "any" if variable can be of any type}# use this variable anywhere using: var.my_variable
It takes in three parameters, all of which are optional. If there is no value assigned for the variable, terraform will prompt the user to provide it. We can also pass values for the variable through command line in this way: terraform apply --var "my_variable=hello"
. Best practice is to create a separate file for variables called terraform.tfvars
which is automatically scanned by terraform, and define variables like this: my_variable = "hello"
in each line. If the variables are defined in a different file, we can tell terraform to look for it using --var-file
parameter. Variable types can vary from string, number, array, object, tuples and so on.