A Terraform module which deploys the Snowplow Snowflake Loader on an Azure virtual machine.
This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.
If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id
variable to include a valid email address which we can reach you at.
To disable telemetry simply set variable telemetry_enabled = false
.
For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry
module "sf_loader_service" {
source = "snowplow-devops/snowflake-loader-vmss/azurerm"
accept_limited_use_license = true
name = var.name
resource_group_name = var.resource_group_name
subnet_id = var.subnet_id
queue_topic_name = var.queue_event_hub_name
queue_topic_kafka_password = var.queue_event_hub_read_only_primary_connection_string
eh_namespace_name = var.eh_namespace_name
kafka_brokers = var.kafka_brokers
storage_account_name = var.storage_account_name
storage_container_name_for_transformer_output = var.storage_container_name
snowflake_loader_user = var.snowflake_loader_user
snowflake_password = var.snowflake_loader_password
snowflake_warehouse = var.snowflake_warehouse
snowflake_database = var.snowflake_database
snowflake_schema = var.snowflake_schema
snowflake_region = var.snowflake_region
snowflake_account = var.snowflake_account
ssh_public_key = var.ssh_public_key
}
Name | Version |
---|---|
terraform | >= 1.0.0 |
azuread | >= 2.39.0 |
azurerm | >= 3.58.0 |
Name | Version |
---|---|
azuread | >= 2.39.0 |
azurerm | >= 3.58.0 |
Name | Source | Version |
---|---|---|
service | snowplow-devops/service-vmss/azurerm | 0.1.1 |
telemetry | snowplow-devops/telemetry/snowplow | 0.5.0 |
Name | Type |
---|---|
azuread_application.app_registration | resource |
azuread_application_password.app_password | resource |
azuread_service_principal.sp | resource |
azurerm_eventhub_consumer_group.queue_topic | resource |
azurerm_network_security_group.nsg | resource |
azurerm_network_security_rule.egress_tcp_443 | resource |
azurerm_network_security_rule.egress_tcp_80 | resource |
azurerm_network_security_rule.egress_udp_123 | resource |
azurerm_network_security_rule.egress_udp_statsd | resource |
azurerm_network_security_rule.ingress_tcp_22 | resource |
azurerm_role_assignment.staging_blob_contributor_app_ra | resource |
azurerm_role_assignment.storage_account_blob_delegator_app_ra | resource |
azurerm_role_assignment.transformer_output_blob_contributor_app_ra | resource |
azuread_client_config.current | data source |
azurerm_resource_group.rg | data source |
azurerm_storage_account.storage_account | data source |
azurerm_storage_container.staging_sc | data source |
azurerm_storage_container.transformer_output_sc | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
kafka_brokers | The brokers to configure for access to the Kafka Cluster (note: as default the EventHubs namespace broker) | string |
n/a | yes |
name | A name which will be pre-pended to the resources created | string |
n/a | yes |
queue_topic_kafka_password | Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for reading is expected) | string |
n/a | yes |
queue_topic_name | The name of the queue Event Hubs topic that the loader will read messages from | string |
n/a | yes |
resource_group_name | The name of the resource group to deploy the service into | string |
n/a | yes |
snowflake_account | Snowflake account | string |
n/a | yes |
snowflake_database | Snowflake database name | string |
n/a | yes |
snowflake_loader_user | Snowflake username used by loader to perform loading | string |
n/a | yes |
snowflake_password | Password for snowflake_loader_user used by loader to perform loading | string |
n/a | yes |
snowflake_region | Snowflake region | string |
n/a | yes |
snowflake_schema | Snowflake schema name | string |
n/a | yes |
snowflake_warehouse | Snowflake warehouse name | string |
n/a | yes |
ssh_public_key | The SSH public key attached for access to the servers | string |
n/a | yes |
storage_account_name | Storage Account name where data to load is stored | string |
n/a | yes |
storage_container_name_for_transformer_output | Storage Container name for transformer output - must be within 'storage_account_name' | string |
n/a | yes |
subnet_id | The subnet id to deploy the service into | string |
n/a | yes |
accept_limited_use_license | Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) | bool |
false |
no |
app_version | App version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. | string |
"5.7.5" |
no |
associate_public_ip_address | Whether to assign a public ip address to this instance | bool |
true |
no |
custom_iglu_resolvers | The custom Iglu Resolvers that will be used by Stream Shredder | list(object({ |
[] |
no |
default_iglu_resolvers | The default Iglu Resolvers that will be used by Stream Shredder | list(object({ |
[ |
no |
eh_namespace_name | The name of the Event Hubs namespace (note: if you are not using EventHubs leave this blank) | string |
"" |
no |
folder_monitoring_enabled | Whether folder monitoring should be activated or not | bool |
false |
no |
folder_monitoring_period | How often to folder should be checked by folder monitoring | string |
"8 hours" |
no |
folder_monitoring_since | Specifies since when folder monitoring will check | string |
"14 days" |
no |
folder_monitoring_until | Specifies until when folder monitoring will check | string |
"6 hours" |
no |
health_check_enabled | Whether health check should be enabled or not | bool |
false |
no |
health_check_freq | Frequency of health check | string |
"1 hour" |
no |
health_check_timeout | How long to wait for a response for health check query | string |
"1 min" |
no |
java_opts | Custom JAVA Options | string |
"-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" |
no |
kafka_source | The source providing the Kafka connectivity (def: azure_event_hubs) | string |
"azure_event_hubs" |
no |
queue_topic_kafka_username | Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) | string |
"$ConnectionString" |
no |
retry_period | How often batch of failed folders should be pulled into a discovery queue | string |
"10 min" |
no |
retry_queue_enabled | Whether retry queue should be enabled or not | bool |
false |
no |
retry_queue_interval | Artificial pause after each failed folder being added to the queue | string |
"10 min" |
no |
retry_queue_max_attempt | How many attempt to make for each folder | number |
-1 |
no |
retry_queue_size | How many failures should be kept in memory | number |
-1 |
no |
sentry_dsn | DSN for Sentry instance | string |
"" |
no |
sentry_enabled | Whether Sentry should be enabled or not | bool |
false |
no |
sp_tracking_app_id | App id for Snowplow tracking | string |
"" |
no |
sp_tracking_collector_url | Collector URL for Snowplow tracking | string |
"" |
no |
sp_tracking_enabled | Whether Snowplow tracking should be activated or not | bool |
false |
no |
ssh_ip_allowlist | The comma-seperated list of CIDR ranges to allow SSH traffic from | list(string) |
[ |
no |
statsd_enabled | Whether Statsd should be enabled or not | bool |
false |
no |
statsd_host | Hostname of StatsD server | string |
"" |
no |
statsd_port | Port of StatsD server | number |
8125 |
no |
stdout_metrics_enabled | Whether logging metrics to stdout should be activated or not | bool |
false |
no |
storage_container_name_for_folder_monitoring_staging | Storage Container name for folder monitoring to stage data - must be within 'storage_account_name' (NOTE: must be set if 'folder_monitoring_enabled' is true) | string |
"" |
no |
tags | The tags to append to this resource | map(string) |
{} |
no |
telemetry_enabled | Whether or not to send telemetry information back to Snowplow Analytics Ltd | bool |
true |
no |
user_provided_id | An optional unique identifier to identify the telemetry events emitted by this stack | string |
"" |
no |
vm_sku | The instance type to use | string |
"Standard_B2s" |
no |
webhook_collector | URL of webhook collector | string |
"" |
no |
webhook_enabled | Whether webhook should be enabled or not | bool |
false |
no |
Name | Description |
---|---|
nsg_id | ID of the network security group attached to the Loader Server nodes |
vmss_id | ID of the VM scale-set |
Copyright 2023-present Snowplow Analytics Ltd.
Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)