Ark Liu

Databricks Connect to Amazon Kinesis

Intro

If you are trying to connect Databricks to Amazon Kinesis via Connect to Amazon Kinesis guide, you might find it very confusing as it suggests you to use instance profile but the document does not provide any information on how to set it up. After a few hours of googleing, you might find the Cross-account Kinesis access with an AssumeRole policy document but it is not very helpful either because it is expecting you to have the permission to create a new role in the AWS account which most likely you do not have it.

In this post, I will share with you the Terraform code to set up the instance profile based on the Cross-account Kinesis access with an AssumeRole policy document.

Preread

Before we start, I assume you have the basic understanding of the following things.

Step 1: Set up cross-account role in Kinesis account

Create a new IAM role in the Kinesis account and allow AssumeRole from the Databricks deployment account.

For inline_policy, you can either follow the document or use any AmazonKinesis*Access from AWS Managed Policy or create your own policy.

resource "aws_iam_role" "kinesis_cross_account_service_role" {
  name = "kinesis-cross-account-service-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Action = "sts:AssumeRole"
        Principal = {
          AWS = [
            "arn:aws:iam::<deployment-acct-id>:root"
          ],
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })

  inline_policy {
    name = "access-kinesis-stream"

    policy = jsonencode({
      Version = "2012-10-17"
      Statement = [
        {
          Effect = "Allow",
          Action = [
            "kinesis:Get*",
            "kinesis:Describe*",
            "kinesis:List*",
          ],
          Resource = "*"
        }
      ]
    })
  }
}

Step 2: Set up Assume Role in Databricks deployment account

There are 2 tasks in this step.

2.1 IAM role to allow Databricks deployment role to assume

This is the IAM role that Databricks deployment role will use to assume the role in the Kinesis account.

The policy is very simple and clean. It only allows sts:AssumeRole to the role created in the Step 1.

resource "aws_iam_role" "kinesis_service_role" {
  name = "kinesis-service-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Principal = {
          Service = "ec2.amazonaws.com"
        },
        Action = "sts:AssumeRole"
      }
    ]
  })

  inline_policy {
    name = "assume-kinesis-role"

    policy = jsonencode({
      Version = "2012-10-17"
      Statement = [
        {
          Effect = "Allow",
          Action = [
            "sts:AssumeRole"
          ],
          Resource = [
            "arn:aws:iam::<kinesis-owner-acct-id>:role/kinesis-cross-account-service-role"
          ]
        }
      ]
    })
  }
}

2.2 Update Databricks deployment role to allow PassRole

Instead of adding ec2 actions and PassRole to its existing policy, I prefer to create a new policy and attach it to the role. This is because it is easier to manage and understand.

data "aws_iam_policy_document" "kinesis_pass_role" {
  statement {
    effect = "Allow"
    actions = [
      "ec2:AssociateDhcpOptions",
      "ec2:AssociateRouteTable",
      "ec2:AttachInternetGateway",
      "ec2:AttachVolume"
    ]
    resources = [*]
    }

  statement {
    effect = "Allow"
    actions = [
      "iam:PassRole"
    ]
    resources = [
      "arn:aws:iam::<deployment-acct-id>:role/kinesis-service-role"
    ]
  }
}

resource "aws_iam_role_policy_attachment" "kinesis_pass_role" {
  role       = aws_iam_role.deployment_account_role.name
  policy_arn = aws_iam_policy.kinesis_pass_role.arn
}

Step 3: Add the instance profile DatabricksToKinesisAssumeRole to Databricks

Create a instance profile in AWS and databricks version based on the role created in the previous step.

resource "aws_iam_instance_profile" "kinesis_instance_profile" {
  name = "kinesis-instance-profile"
  role = aws_iam_role.kinesis_service_role.name
}

resource "databricks_instance_profile" "kinesis_instance_profile" {
  provider             = databricks.workspace
  instance_profile_arn = aws_iam_instance_profile.kinesis_instance_profile.arn
}

Step 4: Create a cluster with the instance profile

Following this databricks_cluster resource doc to add instance_profile_arn to the cluster configuration based no

Step 5: Validate connection to Kinesis

profit!

Limitation on the Instance Profile

Afaik, the instance profile in Databricks has been used for the use case below.

If you plan to use the instance profile for both use cases, you will have to alter the step 2 to include the S3 access as well. I will leave it to you to figure it out since I have not tried it myself yet.

Conclusion

I hope this post helps you to set up the instance profile for Databricks to connect to Amazon Kinesis. If you have any questions or suggestions, feel free to email me.