Skip to content

snowplow-devops/terraform-azurerm-snowflake-loader-vmss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Release CI License Registry Source

terraform-azurerm-snowflake-loader-vmss

A Terraform module which deploys the Snowplow Snowflake Loader on an Azure virtual machine.

Telemetry

This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.

If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.

How do I disable it?

To disable telemetry simply set variable telemetry_enabled = false.

What are you collecting?

For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry

Usage

module "sf_loader_service" {
  source = "snowplow-devops/snowflake-loader-vmss/azurerm"

  accept_limited_use_license = true

  name                = var.name
  resource_group_name = var.resource_group_name
  subnet_id           = var.subnet_id

  queue_topic_name           = var.queue_event_hub_name
  queue_topic_kafka_password = var.queue_event_hub_read_only_primary_connection_string
  eh_namespace_name          = var.eh_namespace_name
  kafka_brokers              = var.kafka_brokers

  storage_account_name                          = var.storage_account_name
  storage_container_name_for_transformer_output = var.storage_container_name

  snowflake_loader_user = var.snowflake_loader_user
  snowflake_password    = var.snowflake_loader_password
  snowflake_warehouse   = var.snowflake_warehouse
  snowflake_database    = var.snowflake_database
  snowflake_schema      = var.snowflake_schema
  snowflake_region      = var.snowflake_region
  snowflake_account     = var.snowflake_account

  ssh_public_key = var.ssh_public_key
}

Requirements

Name Version
terraform >= 1.0.0
azuread >= 2.39.0
azurerm >= 3.58.0

Providers

Name Version
azuread >= 2.39.0
azurerm >= 3.58.0

Modules

Name Source Version
service snowplow-devops/service-vmss/azurerm 0.1.1
telemetry snowplow-devops/telemetry/snowplow 0.5.0

Resources

Name Type
azuread_application.app_registration resource
azuread_application_password.app_password resource
azuread_service_principal.sp resource
azurerm_eventhub_consumer_group.queue_topic resource
azurerm_network_security_group.nsg resource
azurerm_network_security_rule.egress_tcp_443 resource
azurerm_network_security_rule.egress_tcp_80 resource
azurerm_network_security_rule.egress_udp_123 resource
azurerm_network_security_rule.egress_udp_statsd resource
azurerm_network_security_rule.ingress_tcp_22 resource
azurerm_role_assignment.staging_blob_contributor_app_ra resource
azurerm_role_assignment.storage_account_blob_delegator_app_ra resource
azurerm_role_assignment.transformer_output_blob_contributor_app_ra resource
azuread_client_config.current data source
azurerm_resource_group.rg data source
azurerm_storage_account.storage_account data source
azurerm_storage_container.staging_sc data source
azurerm_storage_container.transformer_output_sc data source

Inputs

Name Description Type Default Required
kafka_brokers The brokers to configure for access to the Kafka Cluster (note: as default the EventHubs namespace broker) string n/a yes
name A name which will be pre-pended to the resources created string n/a yes
queue_topic_kafka_password Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for reading is expected) string n/a yes
queue_topic_name The name of the queue Event Hubs topic that the loader will read messages from string n/a yes
resource_group_name The name of the resource group to deploy the service into string n/a yes
snowflake_account Snowflake account string n/a yes
snowflake_database Snowflake database name string n/a yes
snowflake_loader_user Snowflake username used by loader to perform loading string n/a yes
snowflake_password Password for snowflake_loader_user used by loader to perform loading string n/a yes
snowflake_region Snowflake region string n/a yes
snowflake_schema Snowflake schema name string n/a yes
snowflake_warehouse Snowflake warehouse name string n/a yes
ssh_public_key The SSH public key attached for access to the servers string n/a yes
storage_account_name Storage Account name where data to load is stored string n/a yes
storage_container_name_for_transformer_output Storage Container name for transformer output - must be within 'storage_account_name' string n/a yes
subnet_id The subnet id to deploy the service into string n/a yes
accept_limited_use_license Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) bool false no
app_version App version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. string "5.7.5" no
associate_public_ip_address Whether to assign a public ip address to this instance bool true no
custom_iglu_resolvers The custom Iglu Resolvers that will be used by Stream Shredder
list(object({
name = string
priority = number
uri = string
api_key = string
vendor_prefixes = list(string)
}))
[] no
default_iglu_resolvers The default Iglu Resolvers that will be used by Stream Shredder
list(object({
name = string
priority = number
uri = string
api_key = string
vendor_prefixes = list(string)
}))
[
{
"api_key": "",
"name": "Iglu Central",
"priority": 10,
"uri": "http://iglucentral.com",
"vendor_prefixes": []
},
{
"api_key": "",
"name": "Iglu Central - Mirror 01",
"priority": 20,
"uri": "http://mirror01.iglucentral.com",
"vendor_prefixes": []
}
]
no
eh_namespace_name The name of the Event Hubs namespace (note: if you are not using EventHubs leave this blank) string "" no
folder_monitoring_enabled Whether folder monitoring should be activated or not bool false no
folder_monitoring_period How often to folder should be checked by folder monitoring string "8 hours" no
folder_monitoring_since Specifies since when folder monitoring will check string "14 days" no
folder_monitoring_until Specifies until when folder monitoring will check string "6 hours" no
health_check_enabled Whether health check should be enabled or not bool false no
health_check_freq Frequency of health check string "1 hour" no
health_check_timeout How long to wait for a response for health check query string "1 min" no
java_opts Custom JAVA Options string "-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" no
kafka_source The source providing the Kafka connectivity (def: azure_event_hubs) string "azure_event_hubs" no
queue_topic_kafka_username Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) string "$ConnectionString" no
retry_period How often batch of failed folders should be pulled into a discovery queue string "10 min" no
retry_queue_enabled Whether retry queue should be enabled or not bool false no
retry_queue_interval Artificial pause after each failed folder being added to the queue string "10 min" no
retry_queue_max_attempt How many attempt to make for each folder number -1 no
retry_queue_size How many failures should be kept in memory number -1 no
sentry_dsn DSN for Sentry instance string "" no
sentry_enabled Whether Sentry should be enabled or not bool false no
sp_tracking_app_id App id for Snowplow tracking string "" no
sp_tracking_collector_url Collector URL for Snowplow tracking string "" no
sp_tracking_enabled Whether Snowplow tracking should be activated or not bool false no
ssh_ip_allowlist The comma-seperated list of CIDR ranges to allow SSH traffic from list(string)
[
"0.0.0.0/0"
]
no
statsd_enabled Whether Statsd should be enabled or not bool false no
statsd_host Hostname of StatsD server string "" no
statsd_port Port of StatsD server number 8125 no
stdout_metrics_enabled Whether logging metrics to stdout should be activated or not bool false no
storage_container_name_for_folder_monitoring_staging Storage Container name for folder monitoring to stage data - must be within 'storage_account_name' (NOTE: must be set if 'folder_monitoring_enabled' is true) string "" no
tags The tags to append to this resource map(string) {} no
telemetry_enabled Whether or not to send telemetry information back to Snowplow Analytics Ltd bool true no
user_provided_id An optional unique identifier to identify the telemetry events emitted by this stack string "" no
vm_sku The instance type to use string "Standard_B2s" no
webhook_collector URL of webhook collector string "" no
webhook_enabled Whether webhook should be enabled or not bool false no

Outputs

Name Description
nsg_id ID of the network security group attached to the Loader Server nodes
vmss_id ID of the VM scale-set

Copyright and license

Copyright 2023-present Snowplow Analytics Ltd.

Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)