Home » Running Power BI On-Premises Data Gateway on AWS for $0.12 a day

Running Power BI On-Premises Data Gateway on AWS for $0.12 a day

Today, I’d like to discuss in detail how to deploy a fully functional (and reliable) Power BI Data Gateway, and the difficulties to overcome in doing so. As a Christmas present, you’ll get code snippets and a fully functional GitHub repository to clone with Terraform code to deploy :)

What the h… is Power BI On-Premises Data Gateway and why on earth would you use it?

Why would you even need Power BI when you have the AWS cloud and its extensive range of data (management and analytics service (just to name a few: Glue ETL, S3 storage, Athena in-place querying, Redshift warehousing and Quicksight data visualisation) ?) ?

Hum.. remember how business users love Excel? Well, Power BI is the new Excel :) Its ability to pop a beautiful dashboard(if you’ve been in the software industry, you probably know from experience that no software sells well without a dashboard feature, even though no one will ever use it for real) from both a local xlsx file and a database makes them believe that data management is a piece of cake (sweeping under the rug the hardships of data lineage, cataloging and “minor” issues like scalability, security, governance).

So your app’s data is in the cloud, and someone wants to analyze it using Power BI. Being a good professional, you won’t let them run queries on your production database. Here comes the “Power BI On-Premise Data Gateway”!

  • The Gateway essentially proxies your app database for the Power BI service, removing the need for network connectivity between the service (or its users) and the database.

 

  • If you have no control on what users will do in Power BI, I advise you to avoid the Direct Query mode and set up a schedule data refresh in what Power BI names a Dataflow (a blob storage that users will then query).

Here’s what we’ll deploy

The code in the repository at the end of this article deploys the following set up:

The reasons for setting up the Data gateway as an EC2 instance within an AWS AutoScaling Group are the following:

  • Cost (and environmental footprint): the ASG is a nice way to schedule the creation and termination of our EC2 instance. In order to perform a scheduled refresh, we just need the instance to boot a couple of minutes before the refresh is triggered and we can kill it right afterwards. In this case, I also use spot request to cut cost by ~30%.. I might skip one refresh if AWS has no resources to spare. An m5a.large (the instance type that matches Microsoft rather greedy requirements), run 3 times a day, 20 minutes every time, will cost me $0.12 a day, that is less than $50 a year.
  • Reliability without any maintenance: the instance boots from the same image every time. This way, we’ll never face issues like a full disk, memory leaks or Windows getting slow (yeah, the Gateway doit run on a Windows server). Cattle, not pets!

Since Power BI experts told me the Gateway might need regular updates to follow Power BI’s release cycle, my initial design involved an EC2 Image Builder Pipeline to generate a brand new machine image, combining a. the latest Windows AMI b. the latest version of the Power BI Gateway package, and c. a scripted installation and configuration of the gateway.

Alas! The gateway is unfortunately a rather unfinished product, with major drawbacks when it comes to devops / including it in a continuous integration pipeline :

  • Even though Microsoft engineers came up with a Powershell module to automate set up tasks, the provided cmdlets can only create a full gateway cluster, but can’t register an installation of the gateway package as an additional member of an existing cluster. So much for idempotence.
  • The gateway app relies on external drivers. The driver for PostgreSQL is npgsql (in a rather old version, 4.0.10) and needs to be installed with a non-default option (install it in the Global Assembly Cache, or GAC) that is unavailable with the unattended / quiet install So much for automation.

For these reasons I had to come to terms with the fact that I’ll have to update the windows image manually.

Deploying the infrastructure

The infrastructure itself is quite simple. In the Autoscaling module :

  • We define schedules block that tell when to scal up or down.
  • We make sure we procure instances at spot price
  • We give the instance a role that enables to connect using Remote Desktop Protocol via AWS Systems Manager
  • We make sure that the instance can reach both the database and the Power BI service

Et voilà !

The fully functional Terraform code is available in this repository in my Github account: https://github.com/psantus/powerbi-onpremises-data-gateway.terraform

First, deploy it with AWS’s standard Windows AMI. Then, once you done the initial set-up described below, make a VM Snapshot and tell the ASG to use it for subsequent machine creation.

Initial set-up of the Power BI gateway

After you’ve deployed the standard Windows image, you can follow these steps to install the Power BI Gateway packages and configure them as well as the Power BI service.

  • Connect to the machine via AWS Systems Manager
  • Install Powershell 7 (yes, the DataGateway module is only compatible with that version) :
  • msiexec.exe /package https://github.com/PowerShell/PowerShell/releases/download/v7.2.6/PowerShell-7.2.6-win-x64.msi /quiet ADD_EXPLORER_CONTEXT_MENU_OPENPOWERSHELL=1 ADD_FILE_CONTEXT_MENU_RUNPOWERSHELL=1 ENABLE_PSREMOTING=1 REGISTER_MANIFEST=1 USE_MU=1 ENABLE_MU=1 ADD_PATH=1
  • Launch a PowerShell 7 session then install the DataGateway module
    pwsh
    Install-Module DataGateway -Force
    Import-Module DataGateway -Force
  • Create an Azure AD registered App. For that part you need to be Azure AD admin. The App needs to have Tenant.ReadWrite.All access on the PowerBI Service.
  • Create a secret. Note the following for later use: - App secret - ClientID - TenantID
  • Login on the Power BI service with the aforementioned app. Here, I assume the secret has been paste in a file named “secret.txt” on the machine Desktop (don’t forget to delete it!)
  • # Set secret in a secure string
    $secureClientSecret = (cat .\Desktop\secret.txt | ConvertTo-SecureString -AsPlainText -Force)
    
    # Connect to the PowerBI Service
    Connect-DataGatewayServiceAccount -ApplicationId $AppId -ClientSecret $secureClientSecret -Tenant $TenantId
    
  • Install the Power BI Gateway package
  • Install-DataGateway -AcceptConditions
    
    # Restart the service after installation (removes some random errors)
    net stop PBIEgwService
    net start PBIEgwService
    
  • Install the Npgsql v4.0.10 PostgreSQL driver. Only this version works (we need a version of Npgsql that relies on the same .NET framework version as the Power BI Gateway app cf. Microsoft site). When running the MSI, make sure you check the “Npgsql GAC Installation” checkbox.
  • Create a DataGateway on the Power BI service and configure the app to register as this cluster:
  • $GateWayDetails = Add-DataGatewayCluster -GatewayName "My Gateway" -RecoveryKey $secureClientSecret -OverwriteExistingGateway
    # Get gateways and find the one you just created
    Get-DataGatewayCluster
    # Put its detail in a variable 
    $GateWayDetails = Get-DataGatewayCluster -Id "xxxxxx-xxxxxx-xxxxxx-xxxxx"
  • Grant admin permissions on the gateway to a user (or rather a group of users)
  • Add-DataGatewayClusterUser -GatewayClusterId $GateWayDetails.Id -PrincipalObjectId "Azure AD User or Group ID here" -AllowedDataSourceTypes $null -Role Admin
  • From there you should see your gateway appear in Power BI web interface https://app.powerbi.com/groups/me/gateways
  • You can also grant more limited permissions to your gateway, for instance just the ability to connect the gateway to a PostgreSQL DB.
  • $dsTypes = New-Object 'System.Collections.Generic.List[Microsoft.PowerBI.ServiceContracts.Api.DatasourceType]'
    $dsTypes.Add([Microsoft.PowerBI.ServiceContracts.Api.DataSourceType]::PostgreSql)
    Add-DataGatewayClusterUser -GatewayClusterId $GateWayDetails.Id -PrincipalObjectId "Azure AD User or Group ID here" -AllowedDataSourceTypes $dsTypes -Role ConnectionCreator
    

Resources

The fully functional Terraform code is available in this repository in my Github account: https://github.com/psantus/powerbi-onpremises-data-gateway.terraform

If you found this blog post useful, or are graceful enough to suggest improvements (that’s my first post, so I’m sure there’s room for some!), or have questions, just leave a comment!

Leave a Reply