Free Amazon DAS-C01 Exam Questions

Question 1

An operations team notices that a few AWS Glue jobs for a given ETL application are failing. The AWS Glue jobs read a large number of small JSON files from an Amazon S3 bucket and write the data to a different S3 bucket in Apache Parquet format with no major transformations. Upon initial investigation, a data engineer notices the following error message in the History tab on the AWS Glue console: ''Command Failed with Exit Code 1.''
Upon further investigation, the data engineer notices that the driver memory profile of the failed jobs crosses the safe threshold of 50% usage quickly and reaches 90--95% soon after. The average memory usage across all executors continues to be less than 4%.
The data engineer also notices the following error while examining the related Amazon CloudWatch Logs.
What should the data engineer do to solve the failure in the MOST cost-effective way?

A : ange the worker type from Standard to G.2X.

B : dify the AWS Glue ETL code to use the 'groupFiles': 'inPartition' feature.

C : crease the fetch size setting by using AWS Glue dynamics frame.

D : dify maximum capacity to increase the total maximum data processing units (DPUs) used.

Answer: B

Question 2

A company has a business unit uploading .csv files to an Amazon S3 bucket. The company's data platform team has set up an AWS Glue crawler to do discovery, and create tables and schemas. An AWS Glue job writes processed data from the created tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creating the Amazon Redshift table appropriately. When the AWS Glue job is rerun for any reason in a day, duplicate records are introduced into the Amazon Redshift table.
Which solution will update the Redshift table without duplicates when jobs are rerun?

A : dify the AWS Glue job to copy the rows into a staging table. Add SQL commands to replace the existing rows in the main table as postactions in the DynamicFrameWriter class.

B : ad the previously inserted data into a MySQL database in the AWS Glue job. Perform an upsert operation in MySQL, and copy the results to the Amazon Redshift table.

C : Apache Spark's DataFrame dropDuplicates() API to eliminate duplicates and then write the data to Amazon Redshift.

D : the AWS Glue ResolveChoice built-in transform to select the most recent value of the column.

Answer: A

Question 3

An advertising company has a data lake that is built on Amazon S3. The company uses AWS Glue Data Catalog to maintain the metadat
a. The data lake is several years old and its overall size has increased exponentially as additional data sources and metadata are stored in the data lake. The data lake administrator wants to implement a mechanism to simplify permissions management between Amazon S3 and the Data Catalog to keep them in sync
Which solution will simplify permissions management with minimal development effort?

A : t AWS Identity and Access Management (1AM) permissions tor AWS Glue

B : AWS Lake Formation permissions

C : nage AWS Glue and S3 permissions by using bucket policies

D : Amazon Cognito user pools.

Answer: B

Question 4

An IOT company is collecting data from multiple sensors and is streaming the data to Amazon Managed Streaming for Apache Kafka (Amazon MSK). Each sensor type has its own topic, and each topic has the same number of partitions. The company is planning to turn on more sensors. However, the company wants to evaluate which sensor types are producing the most data sothat the company can scale accordingly. The company needs to know which sensor types have the largest values for the following metrics: ByteslnPerSec and MessageslnPerSec. Which level of monitoring for Amazon MSK will meet these requirements?

A : DEFAULT level

B : PER TOPIC PER BROKER level

C : PER BROKER level

D : PER TOPIC level

Answer: B

Question 5

A company developed a new elections reporting website that uses Amazon Kinesis Data Firehose to deliver full logs from AWS WAF to an Amazon S3 bucket. The company is now seeking a low-cost option to perform this infrequent data analysis with visualizations of logs in a way that requires minimal development effort.
Which solution meets these requirements?

A : an AWS Glue crawler to create and update a table in the Glue data catalog from the logs. Use Athena to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.

B : eate a second Kinesis Data Firehose delivery stream to deliver the log files to Amazon Elasticsearch Service (Amazon ES). Use Amazon ES to perform text-based searches of the logs for ad-hoc analyses and use Kibana for data visualizations.

C : eate an AWS Lambda function to convert the logs into .csv format. Then add the function to the Kinesis Data Firehose transformation configuration. Use Amazon Redshift to perform ad-hoc analyses of the logs using SQL queries and use Amazon QuickSight to develop data visualizations.

D : eate an Amazon EMR cluster and use Amazon S3 as the data source. Create an Apache Spark job to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.

Answer: A