Automating LookML Code Review with lkml and GitHub Actions • ainoya.dev

In the world of data analytics, Looker has become an indispensable tool for many teams. Its core feature, LookML, allows users to define data models and manage projects efficiently through Git. This setup facilitates team development by enabling version control and collaborative work. However, as teams grow, ensuring adherence to specific LookML coding standards becomes increasingly challenging, especially during code reviews.

A particularly useful tool in this context is Spectacles, which assists in validating LookML code. Yet, there are times when teams require custom static analysis to enforce specific coding practices. This is where the lkml Python library comes into play, providing an easy way to create custom static analysis scripts.

For instance, consider a team wanting to enforce a consistent data protection policy across all LookML explore definitions by using access_filter. The lkml library enables the creation of a Python script to check this automatically.

Here’s an example script that parses LookML files, identifies explore definitions, and checks for the presence of specific access_filter settings:

# Script to parse LookML and check for issues with access_filter
# To be used in CI
import lkml
import pprint
import glob

for lkml_file_path in glob.glob('**/*.lkml', recursive=True):
    with open(lkml_file_path) as file:
        print("reading: {}".format(lkml_file_path))
        result = lkml.load(file)
        if not 'explores' in result:
            print("{} does not contain explore definitions. Skipping check for access_filter conditions.".format(lkml_file_path))
            continue
        else:
            print("{} contains explore definitions. Checking access_filter conditions.".format(lkml_file_path))

        for explore in result['explores']:
            print("Checking access_filter conditions for explore name: {}".format(explore['name']))
            try:
                access_filter = explore['access_filters'][0]
                pprint.pprint(access_filter)
                if access_filter['user_attribute'] != 'tenant_id':
                    raise Exception("Please specify user_attribute: tenant_id in access_filter conditions. There are issues with the scope of data exposure.")
            except KeyError:
                raise Exception("Please set access_filter conditions in explore. There are issues with the scope of data exposure.")
            print("ok")

This script can be integrated into a continuous integration (CI) pipeline, offering a more efficient and reliable way to ensure coding standards than manual code reviews. For teams managing their code repositories on GitHub, this script can be easily incorporated into their workflow using GitHub Actions.

Below is a sample GitHub Actions workflow definition that automates the execution of the script upon every push to the repository:

name: lookml test

on:
  push:

jobs:
  lookml-test:
    name: lookml-test
    timeout-minutes: 10
    runs-on: ubuntu-latest
    steps:
    - uses: actions/[email protected]
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v4
      with:
        python-version: "3.10"
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install lkml
    - name: Test
      run: |
        python scripts/lkml_validator/test.py

This setup not only saves time but also enhances the reliability of code reviews by automating the validation of access_filter configurations in LookML files. It’s a practical example of how tools like lkml and GitHub Actions can streamline development processes in the data analytics domain.