If you are trying to connect Databricks to Amazon Kinesis via Connect to Amazon Kinesis guide, you might find it very confusing as it suggests you to use instance profile but the document does not provide any information on how to set it up. After a few hours of googleing, you might find the Cross-account Kinesis access with an AssumeRole policy document but it is not very helpful either because it is expecting you to have the permission to create a new role in the AWS account which most likely you do not have it.
In this post, I will share with you the Terraform code to set up the instance profile based on the Cross-account Kinesis access with an AssumeRole policy document.
Before we start, I assume you have the basic understanding of the following things.
sts:AssumeRole
sts:AssumeRole
Databricks deployment role
Create a new IAM role in the Kinesis account and allow AssumeRole
from the
Databricks deployment account.
For inline_policy
, you can either follow the document or use any
AmazonKinesis*Access
from AWS Managed Policy or create your own policy.
resource "aws_iam_role" "kinesis_cross_account_service_role" {
name = "kinesis-cross-account-service-role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Action = "sts:AssumeRole"
Principal = {
AWS = [
"arn:aws:iam::<deployment-acct-id>:root"
],
Service = "ec2.amazonaws.com"
}
}
]
})
inline_policy {
name = "access-kinesis-stream"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow",
Action = [
"kinesis:Get*",
"kinesis:Describe*",
"kinesis:List*",
],
Resource = "*"
}
]
})
}
}
There are 2 tasks in this step.
This is the IAM role that Databricks deployment role will use to assume the role in the Kinesis account.
The policy is very simple and clean. It only allows sts:AssumeRole
to the
role created in the Step 1
.
resource "aws_iam_role" "kinesis_service_role" {
name = "kinesis-service-role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "ec2.amazonaws.com"
},
Action = "sts:AssumeRole"
}
]
})
inline_policy {
name = "assume-kinesis-role"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow",
Action = [
"sts:AssumeRole"
],
Resource = [
"arn:aws:iam::<kinesis-owner-acct-id>:role/kinesis-cross-account-service-role"
]
}
]
})
}
}
Instead of adding ec2 actions
and PassRole
to its existing policy, I prefer
to create a new policy and attach it to the role. This is because it is easier
to manage and understand.
data "aws_iam_policy_document" "kinesis_pass_role" {
statement {
effect = "Allow"
actions = [
"ec2:AssociateDhcpOptions",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AttachVolume"
]
resources = [*]
}
statement {
effect = "Allow"
actions = [
"iam:PassRole"
]
resources = [
"arn:aws:iam::<deployment-acct-id>:role/kinesis-service-role"
]
}
}
resource "aws_iam_role_policy_attachment" "kinesis_pass_role" {
role = aws_iam_role.deployment_account_role.name
policy_arn = aws_iam_policy.kinesis_pass_role.arn
}
Create a instance profile in AWS and databricks version based on the role created in the previous step.
resource "aws_iam_instance_profile" "kinesis_instance_profile" {
name = "kinesis-instance-profile"
role = aws_iam_role.kinesis_service_role.name
}
resource "databricks_instance_profile" "kinesis_instance_profile" {
provider = databricks.workspace
instance_profile_arn = aws_iam_instance_profile.kinesis_instance_profile.arn
}
Following this databricks_cluster resource doc to add
instance_profile_arn
to the cluster configuration based no
profit!
Afaik, the instance profile in Databricks has been used for the use case below.
If you plan to use the instance profile for both use cases, you will have to
alter the step 2
to include the S3 access as well. I will leave it to you to
figure it out since I have not tried it myself yet.
I hope this post helps you to set up the instance profile for Databricks to connect to Amazon Kinesis. If you have any questions or suggestions, feel free to email me.