Introduction
Snowflake offers an elegant data warehouse (DWH) solution developed specifically for the cloud. Customers do not need to install or maintain software or hardware, because Snowflake DWH operates as Software-as-a-Service (SaaS) on the main cloud platforms (Amazon Web Services, Google Cloud Storage and Microsoft Azure). Snowflake’s architecture is marked by data storage being independent from compute resources. This concept enables efficient usage and scaling of both in real-time. Thus, Snowflake can meet dynamically changing requirements of modern businesses. In this blog article, we will take first steps together towards migrating a DWH into Snowflake.
Step 1: Administration
The first task is to arrange the Snowflake account access, users and roles according to your business structure and use case. The access control in Snowflake uses two models:
- Discretionary Access Control (DAC): each object (database, schema, table) has an owner; an owner grants access to objects.
- Role-Base-Access-Control (RBAC): access privileges are assigned to roles; roles are assigned to users.
Snowflake has the following default system-defined roles: ACCOUNTADMIN, SECURITYADMIN, SYSADMIN, USERADMIN, and PUBLIC. In addition, Snowflake allows custom-defined roles.
Two ACCOUNTSADMINs are better than one
When a user creates a Snowflake account and login for the first time, the user is assigned by default the ACCOUNTADMIN role, the most powerful role in a Snowflake account. Snowflake recommends creating another user who also has the ACCOUNTADMIN role. This is useful in various cases. For instance, if one account administrator forgets the password, another account administrator can reset the password immediately. Due to security issues, it may take Snowflake support up to two business days to reset the password for a user with the ACCOUNTADMIN role. The ACCOUNTADMIN role is designed to manage objects on the account level and should not be used to create low-level database objects (database, schema, table, etc.). Snowflake recommends creating a role hierarchy according to your business structure instead of doing everything with the ACCOUNTADMIN role. The ACCOUNTADMIN role owns the SECURITY and SYSADMIN roles.
SECURITYADMIN
The account administrator creates one or several users with the SECURITYADMIN role. The SECURITYADMIN role has the privileges to create and modify users/roles. To improve separation of duties, Snowflake recently introduced the USERADMIN role. Although the SYSADMIN role owns the USERADMIN role, now it is possible to separate role/user management (SECURITYADMIN) and role/user creation (USERADMIN). In the current blog, we will skip the USERADMIN role for simplicity, but it can be easily implemented if required by a use case.
SYSADMIN
The security administrator creates one or several users with the SYSADMIN role and custom roles. The SYSADMIN role has the privileges to create virtual warehouses, databases, and any database objects (schemas, tables, etc.). The custom roles in turn are defined according to your business structure and use case. Snowflake does not put any limitations on the maximum number of custom roles.
Sample user/role Use Case
In the illustration below, we demonstrate a sample user/role hierarchy together with the SQL commands to be used in a Snowflake worksheet to create the users and roles.
First, using the ACCOUNTADMIN role, we create another account administrator with the name COMPANY_CHEF. Snowflake recommends that the account administrators use the multi-factor authentication und a default role different from the ACCOUNTADMIN. The default role is the role automatically assigned to a user after each new login into the Snowflake account. Note that assigning a default role to a user does not automatically grant this role. Then, we create a user with the name SECURITY_OFFICER and the SECURITYADMIN role, that in turn creates
- the DWH_AMINISTRATOR user with the SYSADMIN role and
- the roles and users for different business units (DB_TEAM, DATASCIENCE_TEAM, BI_TEAM).
The new custom roles are also granted to the SYSADMIN role, i.e. DWH_ADMINISTRATOR.
Control Your Usage
In order to prevent usage expenses exceeding the limit, Snowflake offers resource monitors. A resource monitor helps control credit usage for running virtual warehouses in your account. When a certain credit limit is exceeded, a notification or other action can be triggered. The monitor can be automatically reset on a periodic basis. In the example shown here, we define a resource monitor that starts immediately after creation with the limit of 200 credits. The monitor is reset monthly. When 90 percent of credits are spent, the account administrator will receive a notification. Note that only the ACCOUNTADMIN role can create a resource monitor. We will use the monitor ESTIMATED_LIMIT_PER_MONTH in the next step when designing a virtual warehouse.
Step 2: Virtual Warehouse und Database Organization
Now, it is time to start arranging objects needed to work with data.
Virtual Warehouse
Virtual warehouse (VWH) is a compute cluster to perform SQL operations. Snowflake offers various sizes of virtual warehouses. Snowflake recommends beginning with a small cluster and then extending it while monitoring query performance. In addition, it is good practice to assign separate VWH for different business units. In this blog, we will stick to a single VWH example. We create a VWH with the extra-small size (1 credit per hour). If the workload for our cluster is too much (which is defined by the scaling policy, in our case ECONOMY), Snowflake automatically spins up one additional cluster (MAX_CLUSTER_COUNT=2). When the workload decreases, Snowflake automatically shuts down the second cluster.
Database/Schema
The last step is to create a database and schema for development activities. By default, all database objects in Snowflake have the PERMANENT type. For permanent objects, Snowflake has a fail-safe region that saves backup information for 7 days. If your data gets corrupted, you can contact Snowflake support to recover your data from the fail-safe region for the last 7 days. We create a transient database and schema which means that we do not have fail-safe costs related to these objects. This is a good option for development, but not for production. After we granted privileges to the users, they can start operating in their accounts.
Next Steps
During the next steps, the users can load data and establish connections to other tools and services already used by your business:
- AWS S3
- Microsoft Azure Blob
- Google Cloud Storage
- Informatica
- Kafka
- Databricks
- MicroStrategy
- and many others.
We covered the first steps required for migrating a DWH into Snowflake. Snowflake offers many other features that we did not mention here. For more information, see our companion blogs on Snowflake.