Free Amazon AWS-DEA-C01 Exam Questions

Question 1

A company needs to load customer data that comes from a third party into an Amazon Redshift datawarehouse. The company stores order data and product data in the same data warehouse. The company wantsto use the combined dataset to identify potential new customers.A data engineer notices that one of the fields in the source data includes values that are in JSON format.How should the data engineer load the JSON data into the data warehouse with the LEAST effort?

A : Use the SUPER data type to store the data in the Amazon Redshift table.

B : Use AWS Glue to flatten the JSON data and ingest it into the Amazon Redshift table.

C : Use Amazon S3 to store the JSON data. Use Amazon Athena to query the data.

D : Use an AWS Lambda function to flatten the JSON data. Store the data in Amazon S3.

Answer: A

Question 2

A company has three subsidiaries. Each subsidiary uses a different data warehousing solution. The firstsubsidiary hosts its data warehouse in Amazon Redshift. The second subsidiary uses Teradata Vantage onAWS. The third subsidiary uses Google BigQuery.The company wants to aggregate all the data into a central Amazon S3 data lake. The company wants to useApache Iceberg as the table format.A data engineer needs to build a new pipeline to connect to all the data sources, run transformations by usingeach source engine, join the data, and write the data to Iceberg.Which solution will meet these requirements with the LEAST operational effort?

A : Use native Amazon Redshift, Teradata, and BigQuery connectors to build the pipeline in AWS Glue. Use native AWS Glue transforms to join the data. Run a Merge operation on the data lake Iceberg table.

B : Use the Amazon Athena federated query connectors for Amazon Redshift, Teradata, and BigQuery to build the pipeline in Athena. Write a SQL query to read from all the data sources, join the data, and run a Merge operation on the data lake Iceberg table

C : Use the native Amazon Redshift connector, the Java Database Connectivity (JDBC) connector for Teradata, and the open source Apache Spark BigQuery connector to build the pipeline in Amazon EMR. Write code in PySpark to join the data. Run a Merge operation on the data lake Iceberg table.

D : Use the native Amazon Redshift, Teradata, and BigQuery connectors in Amazon Appflow to write data to Amazon S3 and AWS Glue Data Catalog. Use Amazon Athena to join the data. Run a Merge operation on the data lake Iceberg table.

Answer: B

Question 3

A Data Engineering Consultant is tasked with establishing a CI/CD pipeline for a data engineering project in AWS. The project involves a multi-stage data processing application, requiring reliable build, test, and deployment phases, and should leverage infrastructure as code for consistency and speed. The team desires a highly automated pipeline, well integrated into the AWS ecosystem, with minimal manual interventions and quick turnaround times for deploying updates.

Which of the following setups would best meet these requirements?

A : Store the code in a GitHub repository, use Travis CI for continuous integration and deployment, Ansible for infrastructure provisioning, and deploy to AWS Elastic Beanstalk for managing application deployment.

B : Implement source control with GitLab, use AWS CodeBuild for build and test, AWS CloudFormation for infrastructure management, and manually deploy using AWS Elastic Container Service (ECS) tasks and services.

C : Utilize AWS CodeCommit for source control, AWS CodeBuild for building the application, AWS CodePipeline for orchestrating the CI/CD process, AWS CloudFormation for infrastructure as code, and AWS CodeDeploy for deploying the application across environments.

D : Adopt AWS CodeCommit for version control, integrate CircleCI for building and testing, use Terraform for infrastructure as code, and script deployment processes using AWS CLI within CircleCI jobs.

Answer: C

Question 4

A data engineer must orchestrate a data pipeline that consists of one AWS Lambda function and one AWS Glue job. The solution must integrate with AWS services. Which solution will meet these requirements with the LEAST management overhead?

A : Use an AWS Step Functions workflow that includes a state machine. Configure the state machine to run the Lambda function and then the AWS Glue job.

B : Use an Apache Airflow workflow that is deployed on an Amazon EC2 instance. Define a directed acyclic graph (DAG) in which the first task is to call the Lambda function and the second task is to call the AWS Glue job.

C : Use an AWS Glue workflow to run the Lambda function and then the AWS Glue job.

D : Use an Apache Airflow workflow that is deployed on Amazon Elastic Kubernetes Service (Amazon EKS). Define a directed acyclic graph (DAG) in which the first task is to call the Lambda function and the second task is to call the AWS Glue job.

Answer: A

Question 5

A mobile app tracks user activity data, which is continuously streamed to Amazon Kinesis Data Streams. The app requires a solution to process this data in real-time and update user profiles stored in Amazon DynamoDB based on the activity data.

What combination of AWS services should be used for real-time processing of the stream and updating the user profiles in DynamoDB?

A : Use AWS Lambda to process data from Kinesis Data Streams and update records in Amazon DynamoDB.

B : Configure Amazon EMR to handle stream processing and update DynamoDB with batch jobs.

C : Deploy Amazon EC2 instances to consume and process the stream, then write to DynamoDB.

D : Use AWS Glue to extract and transform streaming data, and load into DynamoDB.

Answer: A