In this post I will be describing what my experience was with preparing and taking the Google Cloud Data Engineer Exam. I will handle study materials used, general preparation outline and give some focus points and overall tips.
The exam consists of 50 multiple choice and multiple select questions and you will receive 2 hours to complete it. As it regards a Professional certification do not take the exam lightly. You will be expected to analyse and comprehend various problem statements in the data domain and apply a solution based on a combination of services. Most of these will come from the Google Data landscape but there will be some knowledge of the hadoop ecosystem required aswell.
Study material used
Below is a shortlist of links to the study material I used to pass the exam.
- Linux Academy video course
- Flash card deck accompanying Linux academy course
- Google practice Exam
- General documentation pages for each google product
- GCP Data Engineer study guide
- Data engineering on GCP Cheatsheet
To get familiar with the plethora of Google Data products, I started out following the Linux Academy video course. For people not familiar with most or any of the services Google provides in their data landscape, this (or any other video course) is a good starting point. This video course in particular provides excellent high level overviews, hands-on labs and practice questions in combination with an online dossier summarizing the most important terminology, use cases, best practices and pitfalls to avoid per product.
When you have completed the course and practice exam, note which points require attention and re-watch related course videos. Check out some flash card decks as they are a great help in learning key characteristics of the data products and data terminology. Knowing these will help you in distilling the correct answer during the exam.
I cannot emphasize enough how much help hands-on experience with any of the products is. So grab those documentation pages and log in to the Google Cloud Console and get started. Some inspiration for small projects:
- Create a storage bucket, configure a cloud function to load data from a file on this bucket into BigQuery when the file lands and create an authorized view from this dataset.
- Configure pubsub to load messages into dataflow. Configure windowing on dataflow and stream windowed data into BigQuery.
- Clean up a file using DataPrep, use the dataprep recommendations and field suggestions.
- Create a cloud Datastore project and create some tables. Interleave these ‘tables’ (Interleaving: create parent, child relationships). As follow up export datastore to cloud storage and load this export into BigQuery. Please note that Datastore has its own relational database concepts, refer to the documentation page for an overview.
Knowing the following points helped me tremendously in preparing and passing the Data Engineering exam. I would recommend spending some additional time to make sure you NAIL these:
- Google IAM best practices.
- Ownership of underlying data in Datastudio dashboards.
- Decision tree for determining when to use which Google Data storage solution.
- Machine Learning terminology. Which prediction and training models exist and what you would use them for. What are weights and biases etc.
- When to use BigQuery and when to use BigTable. (Time-series data, long and narrow tables)
- Know storage, backup and restore in time possibilites of the various Google Data Products.
- Which products in the Google Data landscape compare with which products in the Hadoop ecosystem. (Hive, Spark, Kafka)
- Which IAM Levels apply for specific resources in a product. (For example DataFlow has a specifc role for developers which allows for data privacy and a specific role for service accounts to manage workers.)
- Which monitoring options are available from Google. (Stackdriver, Stackdriver logging and metrics and the Stackdriver agents.)
To wrap this up here are some general tips that apply not only to the Data Engineering course but to studying in general.
Make sure you have a good understanding of terminology in the data engineering domain not only pertaining to Google products.
Be able to define the strengths and characteristics of each data product, to which use cases they would be a good fit and their general shortcomings. Often you can already narrow down your options just by knowing which solution is relevant if not best for a specific use case.
Google will not allow you anything to write or draw on during the exam (at least in my case). It helps to practice visualising the flow of data in your head and where certain components of a data pipeline would fit.
The exam questions will require you to apply reading comprehension and critical thinking. Read each question and answer carefully, look for key characteristics, narrow down the options and visualise that data flow!
The Google Cloud Professional Data Engineer exam should not be taken lightly. The composition of the questions require you to analyse a use case or problem statement and fit requirements from this use case to a solution. This solution is usually of a combination of data products. Make sure you can narrow down on the correct solution by knowing the strengths and shortcomings of provided options. Good luck!
I hope this short guide has been helpful. With good preparation, I am sure you will pass the exam.