Emilie Schario, data analyst at GitLab, Inc. gave a presentation on her wealth of knowledge about DevOps for Data Engineering.
Every change to the web app creates a review app and quality tests. Why are the standards for data teams not the same? At GitLab, they’re adopting DataOps, applying the best practices of the DevOps lifecycle to data, furthering the premise the analytics is a subfield of software engineering. In this presentation, Emilie will share the merge-request-first workflow they’ve adopted at GitLab, and its effects on the business. The entire analytics stack, from ELT to visualization, is version controlled. Any changes are done in merge requests for testability and accountability. Every merge request has its own clone of the data warehouse so that there are no discrepancies between development and production results. Through open source tool dbt, all transformations in the data warehouse are version controlled, and documentation is created and stored. All of the ELT jobs, tests, and builds are orchestrated by GitLab CI. Utilizing these processes has enabled a 3 person data team to support the data needs of a billion dollar company.