
Latest [Jan 19, 2022] 100% Passing Guarantee - Brilliant Professional-Data-Engineer Exam Questions PDF
Professional-Data-Engineer Certification – Valid Exam Dumps Questions Study Guide! (Updated 253 Questions)
Training Courses Recommended for the Exam Preparation
Training courses are meant to help candidates to learn about the Google exam syllabus and prepare well. It has hands-on labs and expert support that will allow you to get in-depth knowledge of each domain covered in the test. So, these are some of the best training courses offered by Google for the Professional Data Engineer certification exam.
NEW QUESTION 133
As your organization expands its usage of GCP, many teams have started to create their own projects.
Projects are further multiplied to accommodate different stages of deployments and target audiences. Each project requires unique access control configurations. The central IT team needs to have access to all projects. Furthermore, data from Cloud Storage buckets and BigQuery datasets must be shared for use in other projects in an ad hoc way. You want to simplify access control management by minimizing the number of policies. Which two steps should you take? (Choose two.)
- A. Introduce resource hierarchy to leverage access control policy inheritance.
- B. Use Cloud Deployment Manager to automate access provision.
- C. Create distinct groups for various teams, and specify groups in Cloud IAM policies.
- D. Only use service accounts when sharing data for Cloud Storage buckets and BigQuery datasets.
- E. For each Cloud Storage bucket or BigQuery dataset, decide which projects need access. Find all the active members who have access to these projects, and create a Cloud IAM policy to grant access to all these users.
Answer: A,C
Explanation:
Google suggests that we should provide access by following google hierarchy and groups for users with similar roles.
NEW QUESTION 134
For the best possible performance, what is the recommended zone for your Compute Engine instance and Cloud Bigtable instance?
- A. Have both the Compute Engine instance and the Cloud Bigtable instance to be in the same zone.
- B. Have the Compute Engine instance in the furthest zone from the Cloud Bigtable instance.
- C. Have both the Compute Engine instance and the Cloud Bigtable instance to be in different zones.
- D. Have the Cloud Bigtable instance to be in the same zone as all of the consumers of your data.
Answer: A
Explanation:
Explanation
It is recommended to create your Compute Engine instance in the same zone as your Cloud Bigtable instance for the best possible performance, If it's not possible to create a instance in the same zone, you should create your instance in another zone within the same region. For example, if your Cloud Bigtable instance is located in us-central1-b, you could create your instance in us-central1-f. This change may result in several milliseconds of additional latency for each Cloud Bigtable request.
It is recommended to avoid creating your Compute Engine instance in a different region from your Cloud Bigtable instance, which can add hundreds of milliseconds of latency to each Cloud Bigtable request.
Reference: https://cloud.google.com/bigtable/docs/creating-compute-instance
NEW QUESTION 135
You have Cloud Functions written in Node.js that pull messages from Cloud Pub/Sub and send the data to BigQuery. You observe that the message processing rate on the Pub/Sub topic is orders of magnitude higher than anticipated, but there is no error logged in Stackdriver Log Viewer. What are the two most likely causes of this problem? (Choose two.)
- A. Error handling in the subscriber code is not handling run-time errors properly.
- B. Publisher throughput quota is too small.
- C. The subscriber code cannot keep up with the messages.
- D. Total outstanding messages exceed the 10-MB maximum.
- E. The subscriber code does not acknowledge the messages that it pulls.
Answer: A,E
Explanation:
C, E: By not acknowleding the pulled message, this result in it be putted back in Cloud Pub/Sub, meaning the messages accumulate instead of being consumed and removed from Pub/Sub. The same thing can happen ig the subscriber maintains the lease on the message it receives in case of an error. This reduces the overall rate of processing because messages get stuck on the first subscriber. Also, errors in Cloud Function do not show up in Stackdriver Log Viewer if they are not correctly handled.
A: No problem with publisher rate as the observed result is a higher number of messages and not a lower number.
B: if messages exceed the 10MB maximum, they cannot be published.
D: Cloud Functions automatically scales so they should be able to keep up.
NEW QUESTION 136
You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single
resource-constrained virtual machine. Which learning algorithm should you use?
- A. Recurrent neural network
- B. Linear regression
- C. Feedforward neural network
- D. Logistic classification
Answer: B
NEW QUESTION 137
Which of the following is NOT one of the three main types of triggers that Dataflow supports?
- A. Trigger based on element size in bytes
- B. Trigger that is a combination of other triggers
- C. Trigger based on time
- D. Trigger based on element count
Answer: A
Explanation:
There are three major kinds of triggers that Dataflow supports: 1. Time-based triggers 2.
Data-driven triggers. You can set a trigger to emit results from a window when that window has received a certain number of data elements. 3. Composite triggers. These triggers combine multiple time-based or data-driven triggers in some logical way Reference: https://cloud.google.com/dataflow/model/triggers
NEW QUESTION 138
By default, which of the following windowing behavior does Dataflow apply to unbounded data sets?
- A. Windows at every 1 minute
- B. Single, Global Window
- C. Windows at every 10 minutes
- D. Windows at every 100 MB of data
Answer: B
Explanation:
Explanation
Dataflow's default windowing behavior is to assign all elements of a PCollection to a single, global window, even for unbounded PCollections Reference: https://cloud.google.com/dataflow/model/pcollection
NEW QUESTION 139
You are working on a niche product in the image recognition domain. Your team has developed a model that is dominated by custom C++ TensorFlow ops your team has implemented. These ops are used inside your main training loop and are performing bulky matrix multiplications. It currently takes up to several days to train a model. You want to decrease this time significantly and keep the cost low by using an accelerator on Google Cloud. What should you do?
- A. Use Cloud GPUs after implementing GPU kernel support for your customs ops.
- B. Use Cloud TPUs after implementing GPU kernel support for your customs ops.
- C. Use Cloud TPUs without any additional adjustment to your code.
- D. Stay on CPUs, and increase the size of the cluster you're training your model on.
Answer: B
NEW QUESTION 140
Your startup has never implemented a formal security policy. Currently, everyone in the company has access to the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and they have not documented their use cases. You have been asked to secure the data warehouse. You need to discover what everyone is doing. What should you do first?
- A. Get the identity and access management IIAM) policy of each table
- B. Use Stackdriver Monitoring to see the usage of BigQuery query slots.
- C. Use Google Stackdriver Audit Logs to review data access.
- D. Use the Google Cloud Billing API to see what account the warehouse is being billed to.
Answer: C
NEW QUESTION 141
You store historic data in Cloud Storage. You need to perform analytics on the historic data. You want to use a solution to detect invalid data entries and perform data transformations that will not require programming or knowledge of SQL.
What should you do?
- A. Use Cloud Dataflow with Beam to detect errors and perform transformations.
- B. Use Cloud Dataproc with a Hadoop job to detect errors and perform transformations.
- C. Use federated tables in BigQuery with queries to detect errors and perform transformations.
- D. Use Cloud Dataprep with recipes to detect errors and perform transformations.
Answer: D
NEW QUESTION 142
MJTelco Case Study
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world.
The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost.
Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
* Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations.
* Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.
MJTelco will also use three separate operating environments - development/test, staging, and production - to meet the needs of running experiments, deploying new features, and serving production customers.
Business Requirements
* Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community.
* Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
* Provide reliable and timely access to data for analysis from distributed research workers
* Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data
Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure.
We also need environments in which our data scientists can carefully study and quickly adapt our models.
Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.
Given the record streams MJTelco is interested in ingesting per day, they are concerned about the cost of Google BigQuery increasing. MJTelco asks you to provide a design solution. They require a single large data table called tracking_table. Additionally, they want to minimize the cost of daily queries while performing fine-grained analysis of each day's events. They also want to use streaming ingestion. What should you do?
- A. Create sharded tables for each day following the pattern tracking_table_YYYYMMDD.
- B. Create a table called tracking_table with a TIMESTAMP column to represent the day.
- C. Create a partitioned table called tracking_table and include a TIMESTAMP column.
- D. Create a table called tracking_table and include a DATE column.
Answer: C
NEW QUESTION 143
You work for an advertising company, and you've developed a Spark ML model to predict click-through rates at advertisement blocks. You've been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be closing soon, so a rapid lift-and-shift migration is necessary. However, the data you've been using will be migrated to migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?
- A. Use Cloud ML Engine for training existing Spark ML models
- B. Use Cloud Dataproc for training existing Spark ML models, but start reading data directly from BigQuery
- C. Rewrite your models on TensorFlow, and start using Cloud ML Engine
- D. Spin up a Spark cluster on Compute Engine, and train Spark ML models on the data exported from BigQuery
Answer: A
NEW QUESTION 144
Your company's customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations. What should you do?
- A. Mount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.
- B. Connect an on-premises Apache Hadoop cluster to MySQL and perform ETL.
- C. Add a node to the MySQL cluster and build an OLAP cube there.
- D. Use an ETL tool to load the data from MySQL into Google BigQuery.
Answer: B
NEW QUESTION 145
Your team is responsible for developing and maintaining ETLs in your company. One of your Dataflow jobs is failing because of some errors in the input data, and you need to improve reliability of the pipeline (incl. being able to reprocess all failing data).
What should you do?
- A. Add a try... catch block to your sideOutput to create a PCollection that can be stored to PubSub later.
- B. Add a try... catch block to your DoFn that transforms the data, extract erroneous rows from logs.
- C. Add a filtering step to skip these types of errors in the future, extract erroneous rows from logs.
- D. Add a try... catch block to your DoFn that transforms the data, write erroneous rows to PubSub directly from the DoFn.
Answer: D
NEW QUESTION 146
Your startup has never implemented a formal security policy. Currently, everyone in the company has access to the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and they have not documented their use cases. You have been asked to secure the data warehouse. You need to discover what everyone is doing. What should you do first?
- A. Get the identity and access management IIAM) policy of each table
- B. Use Stackdriver Monitoring to see the usage of BigQuery query slots.
- C. Use Google Stackdriver Audit Logs to review data access.
- D. Use the Google Cloud Billing API to see what account the warehouse is being billed to.
Answer: C
Explanation:
First we need to know who is accessing what then we can create suitable policies. Stackdriver is used to track access logs for Bigquery.
NEW QUESTION 147
If a dataset contains rows with individual people and columns for year of birth, country, and income, how many of the columns are continuous and how many are categorical?
- A. 1 continuous and 2 categorical
- B. 2 continuous and 1 categorical
- C. 3 continuous
- D. 3 categorical
Answer: B
Explanation:
Explanation
The columns can be grouped into two types-categorical and continuous columns:
A column is called categorical if its value can only be one of the categories in a finite set. For example, the native country of a person (U.S., India, Japan, etc.) or the education level (high school, college, etc.) are categorical columns.
A column is called continuous if its value can be any numerical value in a continuous range. For example, the capital gain of a person (e.g. $14,084) is a continuous column.
Year of birth and income are continuous columns. Country is a categorical column.
You could use bucketization to turn year of birth and/or income into categorical features, but the raw columns are continuous.
Reference: https://www.tensorflow.org/tutorials/wide#reading_the_census_data
NEW QUESTION 148
Which of these sources can you not load data into BigQuery from?
- A. File upload
- B. Google Cloud Storage
- C. Google Drive
- D. Google Cloud SQL
Answer: D
Explanation:
You can load data into BigQuery from a file upload, Google Cloud Storage, Google Drive, or Google Cloud Bigtable. It is not possible to load data into BigQuery directly from Google Cloud SQL. One way to get data from Cloud SQL to BigQuery would be to export data from Cloud SQL to Cloud Storage and then load it from there.
Reference: https://cloud.google.com/bigquery/loading-data
NEW QUESTION 149
You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?
- A. Export the records from the database into a CSV file. Create a public URL for the CSV file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the CSV file into BigQuery using the BigQuery web UI in the GCP Console.
- B. Export the records from the database as an Avro file. Create a public URL for the Avro file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.
- C. Export the records from the database as an Avro file. Copy the file onto a Transfer Appliance and send it to Google, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.
- D. Export the records from the database as an Avro file. Upload the file to GCS using gsutil, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.
Answer: D
NEW QUESTION 150
How can you get a neural network to learn about relationships between categories in a categorical feature?
- A. Create a one-hot column
- B. Create a hash bucket
- C. Create a multi-hot column
- D. Create an embedding column
Answer: D
Explanation:
There are two problems with one-hot encoding. First, it has high dimensionality, meaning that instead of having just one value, like a continuous feature, it has many values, or dimensions. This makes computation more time-consuming, especially if a feature has a very large number of categories. The second problem is that it doesn't encode any relationships between the categories. They are completely independent from each other, so the network has no way of knowing which ones are similar to each other.
Both of these problems can be solved by representing a categorical feature with an embedding column. The idea is that each category has a smaller vector with, let's say, 5 values in it.
But unlike a one-hot vector, the values are not usually 0. The values are weights, similar to the weights that are used for basic features in a neural network. The difference is that each category has a set of weights (5 of them in this case).
You can think of each value in the embedding vector as a feature of the category. So, if two categories are very similar to each other, then their embedding vectors should be very similar too.
Reference: https://cloudacademy.com/google/introduction-to-google-cloud-machine-learning-engine-course/a-wide-and-deep-model.html
NEW QUESTION 151
You need to choose a database for a new project that has the following requirements:
* Fully managed
* Able to automatically scale up
* Transactionally consistent
* Able to scale up to 6 TB
* Able to be queried using SQL
Which database do you choose?
- A. Cloud Datastore
- B. Cloud Bigtable
- C. Cloud Spanner
- D. Cloud SQL
Answer: C
Explanation:
https://cloud.google.com/products/databases
NEW QUESTION 152
......
Professional-Data-Engineer are Available for Instant Access: https://www.practicematerial.com/Professional-Data-Engineer-exam-materials.html
Professional-Data-Engineer Dumps 2022 - New Google Professional-Data-Engineer Exam Questions: https://drive.google.com/open?id=174YYrvizX56Oh-rhOPCPhpKqruV7tXUX

