diff --git a/Exam Dumps (Questions)/DP-203 Exam Dumps (Questions).pdf b/Exam Dumps (Questions)/DP-203 Exam Dumps (Questions).pdf new file mode 100644 index 0000000..08926c4 Binary files /dev/null and b/Exam Dumps (Questions)/DP-203 Exam Dumps (Questions).pdf differ diff --git a/Exam Dumps (Questions)/README.md b/Exam Dumps (Questions)/README.md new file mode 100644 index 0000000..707ab5f --- /dev/null +++ b/Exam Dumps (Questions)/README.md @@ -0,0 +1,1674 @@ +# DP-203 + +Author: Badal Prasad Singh [**Follow**](http://linkedin.com/in/badalprasadsingh/) + +You use Azure Stream Analytics to receive data from Azure Event Hubs and to output the data? to an Azure Blob Storage account. You need to output the count of records received from the last five minutes every minute. Which windowing function should you use? +a) Session +b) Tumbling +c) Sliding +**d) Hopping** + +You are designing the folder structure for an Azure Data Lake Storage Gen2 container. Users will query data by using a variety of services including Azure Databricks and Azure Synapse Analytics serverless SQL pools. The data will be secured by subject area. Most queries will include data from the current year or current month. Which folder structure should you recommend to support fast queries and simplified folder security? +**a) /{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}*{YYYY}*{MM}_{DD}.csv** + +You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The solution must meet the customer sentiment analytic requirements. Which three Transact-SQL DDL commands should you run in sequence? To answer, move the appropriate commands from the list of commands to the answer area and arrange them in the correct order. +NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select. +Commands +CREATE EXTERNAL DATA SOURCE +CREATE EXTERNAL FILE FORMAT +CREATE EXTERNAL TABLE +CREATE EXTERNAL TABLE AS SELECT +CREATE EXTERNAL SCOPED CREDENTIALS +**Answer Area +CREATE EXTERNAL DATA SOURCE +CREATE EXTERNAL FILE FORMAT +CREATE EXTERNAL TABLE AS SELECT** + +You have created an external table named ExtTable in Azure Data Explorer. Now, a database user needs to run a KQL (Kusto Query Language) query on this external table. Which of the following function should he use to refer to this table? +**a) external_table()** +b) access_table() +c) ext_table() +d) None of the above + +You are working as a data engineer in a company. Your company wants you to ingest data onto cloud data platforms in Azure. Which data processing framework will you use? +a) Online transaction processing (OLTP) +b) Extract, Transform, and Load (ETL) +**c) Extract, Load and Transform (ELT)** + +You have an Azure Synapse workspace named MyWorkspace that contains an Apache Spark database named mytestdb. You run the following command in an Azure Synapse Analytics Spark pool in MyWorkspace. CREATE TABLE mycestdb.myParquetTable (EmployeeID int, EmployeeName string, EmpIoyeeStartDate date) +USING Parquet - You then use Spark to insert a EmployeeName EmployeeID EmplyeeStartDate row into mytestdb.myParquetTable. The row contains the following data. One minute later, you execute the following query from a serverless SQL pool in MyWorkspace. What will be returned by the Peter 1001 28-July-2022 query? + +SELECT Employee1D - FROM mytestdb.dbo.myParquetTab1e WHERE name = `Peter`; +a) 24 +**b) en error** +c) a null value + +In Structured data you define data type at query time. +True +**False** + +In Un-Structured data you define data type at query time. +**True** +False + +When you create a temporal table in Azure SQL Database, it automatically creates a history table in the same database for capturing the historical records. Which of the following statements are true about the temporal table and history table? [Select all options that are applicable] + +a) A temporal Tablo must have 1 primary key. +**b) To create a temporal table, System Versioning needs to be set to On.** +c) To create a temporal table, System Versioning needs to be set to Off. +d) It is mandatory to mention the name of the history table when you create the temporal table. +**e) If you don't specify the name for the history table, the default naming convention is used for the history table.** +f) You can specify the table constraints for the history table. + +To create Data Factory instances, the user account that you use to sign into Azure must be a member of: [Select all options that are applicable] +**a) contributor** +**b) owner role** +**c) administrator of the Azure subscription** +d) write + +You need to output files from Azure Data Factory. Which file format should you use for each type +of output? To answer, select the appropriate options in the answer NOTE: Each correct +selection is worth one point. +Columnar format +Avro +GZIP +**Parquet** +TXT +JSON with a timestamp +**Avro** +GZIP +Parquet +TXT + +Working as a data engineer for a car sales company you need to design an application that would accept market information as an input. Using the machine learning classification model, application will classify the input data into two categories: + +a) Car models that sell more with buyers between 18-40 years and +b) Car models that sell more with people above 40 + +What would you recommend to train the model? +a) Power BI Models +b) Text Analytics API +c) Computer Vision API +**d) Apache Spark MLIib** + +You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once. +Solution: You use a session window that uses a timeout size of 10 seconds. +Does this meet the goal? +Yes +**No** + +You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to count the tweets in each I O-second window. The solution must ensure that each tweet is counted only once. +Solution: You use a sliding window, and you set the window size to 10 seconds. Does this meet the +goal? +Does this meet the goal? +Yes +**No** + +You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once. +Solution: You use a tumbling window, and you set the window size to 10 seconds. Does this meet the goal? Does this meet the goal? +**Yes** +No + +What are the key components of Azure Data Factory. [Select all options tha are applicable] +a) Database +b) Connection String +**c) Pipelines +d) Activities +e) Datasets +f) Linked services +g) Data Flows +h) Integration Runtimes** + +Which of the following are valid trigger types of Azure Data Factory. [Select all options that are +applicable] +a) Monthly Trigger +**b) Schedule Trigger** +c) Overlap Trigger +**d) Tumbling Window Trigger** +**e) Event-based Trigger** + +You are designing an Azure Stream Analytics solution that receives instant messaging data from an Azure Event Hub. You need to ensure that the output from the Stream Analytics job counts the number of messages per time zone every 1 5 seconds. How should you complete the Stream Analytics query? To answer, select the appropriate options in the answer area. +NOTE: Each correct selection is worth one point. +Select TimeZone, count(*) as MessageCount +From MessageStream +GROUP BY Timezone, Last +Over +SYSTEM.TIMESTAMP() +**TIMESTAMP BY** + +Created at +HOPPINGWINDOW +SESSIONWINDOW +SLIDINGWINDOW +**TUMBLINGWINDOW** +(Second, 1 5) + +Duplicating customer content for redundancy and meeting service-level agreements (SLAs) is Azure Maintainability. +Yes +**No** + +Duplicating customer content for redundancy and meeting service-level agreements (SLAs) is Azure High availability. +**Yes** +No + +You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Contacts. Contacts contains a column named Phone. You need to ensure that users in a specific role only see the last four digits of a phone number when querying the Phone column. What should you include in the solution? +a) column encryption +**b) dynamic data masking** +c) a default value +d) table partitions +e) row level security (RLS) + +A company has data lake which is accessible only via an Azure virtual network. You are building a SQL pool in Azure Synapse which will use data from the data lake and is planned to load data to the SQL pool every hour. You need to make sure that the SQL pool can load the data from the data lake. Which TWO actions should y u perform? +a) Create a service principal +**b) Create a managed identity** +c) Add an Azure Active Directory Federation Service ( ADFS ) account +**d) Configure managed identity as credentials for the data loading process** + +You have an Azure Data Lake Storage Gen2 container. Data is ingested into the container, and then transformed by a data integration application. The data is NOT modified after that. Users can read files in the container but cannot modify the files. You need to design a data archiving solution that meets the +following requirements: +• New data is accessed frequently and must be available as quickly as possible. +Data that is older than five years is accessed infrequently but must be available within one second when requested. Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the +lowest cost possible. +• Costs must be minimized while maintaining the required availability. +How should you manage the data? To answer, select the appropriate options in the answer area + +Five-Year-old data +Delete the Blob +Move to Hot storage +**Move to Cool storage** +Move to Archive storage +Seven-year-old data +Delete the Blob +Move to Hot storage +Move to Cool storage +**Move to Archive storage** + +As a data engineer you need to suggest Stream Analytics data output format to make sure that the queries from Databricks and PolyBase against the files encounter with less errors. The solution should make sure that the files can be queried fast, and the data type information is kept intact. What should you suggest? +a) JSON +b) XML +c) Avro +**d) Parquet** + +Which role works with Azure Cognitive Services, Cognitive Search, and the Bot Framework ? +a) A data engineer +b) A data scientist +**c) An Al engineer** + +Which role is correct for a person who works being responsible for the provisioning and configuration of +both on;premises and cloud data platform technoi6gies? +**a) A data engineer** +b) A data scientist +c) An AI engineer + +Who performs advanced analytics to help drive value from data. +a) A data engineer +**b) A data scientist** +c) An AI engineer + +Choose the valid examples of Structured Data. +**a) Microsoft SQL Server** +b) Binary Files +**c) Azure SQL Database** +d) Audio Files +**e) Azure SQL Data Warehouse (Azure Synapse)** +f) Image Files + +Choose the valid examples of Un-Structured Data. +a) Microsoft SQL Server +**b) Binary Files** +c) Azure SQL Databqse +**d) Audio Files** +e) Azure SQL Data Warehouse +**f) Image Files** + +Azure Databricks encapsulates which Apache Storage technology? +a) Apache HDInsight +b) Apache Hadoop +**c) Apache Spark** + +Which security features does Azure Databricks not support? +a) Azure Active Directory +**b) Shared Access Keys** +c) Role-based access + +Which of the following Azure Databricks is used for support f R, SQL, Python, Scala, and Java? +a) MLIib +b) GraphX +**c) Spark Core API** + +Which Notebook format is used in Databricks? +**a) DBC** +b) .notebook +c) spark + +You use Azure Data Factory to prepare data to be queried by Azure Synapse Analytics serverless SQL pools. Files are initially ingested into an Azure Data Lake Storage Gen2 account as 10 small JSON files. Each file contains the same data attributes and data from a subsidiary of your company. You need to move the files to a different folder and transform the data to meet the following requirements: +• Provide the fastest possible query times. +• Automatically infer the schema from the underlying files. +How should you configure the Data Factory copy activity? To answer, select the appropriate options in the answer area. +NOTE: Each correct selection is worth onepoint. +Copy behavior +Flatten hierarchy +**Merge files** +Preserve hierarchy +Sink File Type +csv +json +**Parquet** +TXT + +All dimension tables are Replicated while all fact tables are hash distributed + +You are designing a data engineering solution for data stream processing. You need to recommend a solution for data ingestion, in order to meet the following requirements: +• Ingest millions of events per second +• Easily scale from streaming megabytes of data to terabytes while keeping control over when and how much to scale: +• Integrate with Azure Functions +• Natively connected with Stream Analytics to build an end-to-end serverless streaming solution. +What would you recommend? +a) Azure Cosmos DB +b) Apache Spark +c) Azure Synapse Analytics +**d) Azure Event Hubs** + +You are a data engineer implementing a lambda architecture on Microsoft Azure. You use an open-source +big data solution to collect, process, and maintain data. The analytical data store performs poorly. +You must implement a solution that meets the following requirements: +• Provide data warehousing +• Reduce ongoing management activities +• Deliver SQL query responses in less than one second +You need to create an HDInsight cluster to meet the requirements. Which type of cluster should you create? +a) Apache HBase +b) Apache Hadoop +c) Interactive Query +**d) Apache Spark** + +Which data platform technology is a globally distributed, multi-model database that can perform queries in +less than a second? +a) SQL Database +b) Azure SQL database +c) Apache Hadoop +**d) Cosmos DB** +e) Azure SQL Synapse + +The open-source world offers four types of NoSQL databases. Select all options that are applicable. NOTE: Each correct selection is worth one point. +a) SQL DataEase +b) Apache Hadoop +**c) Key-value store** +**d) Document database** +**e) Graph database** +**f) Column database** +g) Cosmos DB +h) Azure SQL Synapse + +Azure Databricks is the least expensive choice when you want to store data but don't need to query +Yes +**No** + +Azure Storage is the least expensive choice when you want to store data but don't need to query it? +**Yes** +No + +Unstructured data is stored in nonrelational systems, commonly called unstructured or NoSQL +No +**Yes** + +You are designing an Azure Stream Analytics job to process incoming events from sensors in retail environments. You need to process the events to produce a running average of shopper counts during the previous 1-5 minutes, calculated at five-minute intervals. Which type of window should you use? +a) snapshot +b) tumbling +**c) hopping** +d) sliding + +You are implementing an Azure Data Lake Gen2 storage account. You need to ensure that data will be accessible for both read and write operations, even if an entire data center (zonal or non-zonal) becomes unavailable. Which kind of replication would you use for the storage account? (Choose +the solution with minimum cost) +a) Locally-redundant storage (LRS) +**b) Zone-redundant storage (ZRS)** +c) Geo-redundant storage (GRS) +d) Geo-zone-redundant storage (GZRS) + +You have an Azure Data Lake Storage Gen2 container that contains 100 TB of data. You need to +ensure thai the data in the container is available for read workloads in a secondary region if an +outage occurs in the primary region. The solution must minimize costs. Which type of data +redundancy should you use? +a) geo-redundant storage (GRS) +**b) read-access geo-redundant storage (RA-GRS)** +c) zone-redundant storage (ZRS) +d) locally-redundant storage (LRS) + +You plan to implement an Azure Data Lake Gen 2 storage account. You need to ensure that the +data lake will remain available if a data center fails in the primary Azure region. The solution must +minimize costs. Which type of replication should you use for the storage accoynt? +a) geo-redundant storage (GRS) +b) geo-zone-redundant storage (GZRS) +**c) zone-redundant storage (ZRS)** +d) locally-redundant storage (LRS) + +You need to design an Azure Synapse Analytics dedicated SQL pool that meets the following +requirements: +• Can return an employee record from a given point in time. +• Maintains the latest employee information. +• Minimizes query complexity. +How should you model the employee data? +a) as a temporal table +b) as a SQL graph table +c) as a degenerate dimension table +**d) as a Type 2 slowly changing dimension (SCD) table** + +You have a SQL pool in Azure Synapse that contains a table named dbo.Customers. The table +contains a column name Email. You need to prevent non administrative users from seeing the full +email addresses in the Email column. The users must see values in a format of [abc@xxxx.com](mailto:abc@xxxx.com) +instead. What should you do? +**a) From Microsoft SQL Server Management Studio, set an email mask on the Email column.** +b) From the Azure portal, set a mask on the Email column. +c) From Microsoft SQL Server Management studio, grant the SELECT permission to the users for +all the columns in the dbo.Customers table except Email. +d) From the Azure portal, set a sensitivity classification of Confidential for the Email column. + +You have a SQL pool in Azure Synapse. A user reports that queries against the pool take longer +than expected to complete. You need to add monitoring to the underlying *orage to help diagnose +the issue. Which two metrics should you monitor? +**a) Cache hit percentage** +b) Active queries +c) Snapshot Storage Size +d) DWU Limit +**e) Cache used percentage** + +You have a SQL pool in Azure Synapse. You discover that some queries fail or take a long time to complete. You need to monitor for transactions that have rolled back. Which dynamic management view should you query? +**a) sys.dm_pdw_nodes_tran_database_transactions** +b) sys.dm_pdw_waits +c) sys.dm_pdw_request_steps +d) sys.dm_pdw_exec_sessions + +You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day. You need to persist the events in the table for use in incremental load +pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load +times. What should you include in the solution? +a) Partition by DateTime fields. +**b) Sink to Azure Queue storage.** +c) Include a watermark column. +d) Use a JSON format for physical data storage. + +You have a partitioned table in an Azure Synapse Analytics dedicat$d SQL pool. You need to design queries to maximize the benefits of partition elimination. What should you include in the Transact- +SQL queries? +a) JOIN +**b) WHERE** +c) DISTINCT +d) GROUP BY + +You have an Azure Synapse Analytics dedicated SQL pool that contains a large fact table. The +table contains 50 columns and 5 billion rows and is a heap. Most queries against the table +aggregate values from approximately 100 million rows and return only two columns. You discover +that the queries against the fact table are very slow. Which type of index should you add to provide +the fastest query times? +a) nonclustered columnstore +**b) clustered columnstore** +c) nonclustered +d) clustered + +You need to create a partitioned table in an Azure Synapse Analytics dedicated SQL pool. How shouldSyou complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. +Values +CLUSTERED INDEX +COLLATE +**DISTRIBUTION** +**PARTITION** +PARTITION FUNCTION +PARTITION SCHEME +Answer Area +CREATE TABLE tablel +ID INTEGER, +col1 VARCHAR(IO), +col2 VARCHAR(IO) +) WITH += HASH (ID) +(ID RANGE LEFT FOR ) + +You have an Azure Databricks workspace named wor$spacel in the Standard pricing tier. You need to configure workspacel to support autoscaling all-purpose clusters. The solution must meet the following requirements: +• Automatically scale down workers when the cluster is underutilized for three minutes. +• Minimize the time it takes to scale to the maximum number of workers. +• Minimize costs. +What should you do first? +a) Enable container services for workspace1. +**b) Upgrade workspace1 to the Premium pricing tier.** +c) Set Cluster Mode to High Concurrency. +d) Create a cluster policy in workspacel. + +You have an enterprise-wide Azure Data Lake Storage Gen2 account. The data lake is accessible only through an Azure virtual network named VNETI. You are building a SQL pool in Azure Synapse that will use data from the data lake. Your company has a sales team. All the members of the team are in an Azure Active Directory group named Sales. POSIX controls are used to assign the Sales group access to the files in the data lake. You plan to load data to the SQL pool every hour. You need to ensure that the SQL pool can load the sales data from the data lake. Which three actions should you perform? Each correct answer presents part of the solution. NOTE: Each area selection is worth one point. +**a) Add the managed identity to the Sales group. (2)** +**b) Use the managed identity as the credentials for the data load process. (3)** +c) Create a shared access signature (SAS). +d) Add your Azure Active Directory (Azure AD) account to the Sales group. +e) Use the shared access signature (SAS) as the credentials for the data load process. +**f) Create a managed identity. (1)** + +You are designing a monitoring solution for a fleet of 500 vehicles. Each vehicle has a GPS tracking device that sends data to an Azure event hub once per minute. You have a CSV file in an Azure Data Lake Storage Gen2 container. The file maintains the expected geographical area in which each vehicle should be. You need to ensure that when a GPS position is outside the expected area, a message is added to another event hub for processing within 30 seconds. The solution must minimize cost. What should you include in the solution? +Service +An Azure Synapse Analytics Apache Spark pool +An Azure Synapse Analytics serverless SQL pool +Azure Data Factory +**Azure Stream Analytics** +Window +**Hopping** +No Window +Session +Tumbling +Analysis Type +Event pattern matching +Lagged record comparision +**Point with Polygon** +Polygon overlap + +You are moving data from an Azure Data Lake Gen2 store to Azure Synapse Analytics. Which Azure +Data Factory integration runtime would be used in a data copy activit ? +a) Azure pipeline +b) Azure-SSIS +**c) Azure** +d) Self Hosted + +You are developing a solution that will use Azure Stream Analytics. The solution will accept an +Azure Blob storage file named Customers. The file will contain both in-store and online customer +details. The online customers willprovide a mailing address. You have a file in Blob storage named +'Locationlncomes' that contains median incomes based on location. The file rarely changes. You +need to use an address to look up a median income based on location. You must output the data +to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term +retention. +Solution: You implement a Stream Analytics job that has two streaming inputs, one query, and two +outputs. Does this meet the goal? +Yes +**No** + +Solution: You implement a Stream Analytics job that has one query, and two outputs. Does this +meet the goal? +Yes +**No** + +Solution: You implement a Stream Analytics job that has one streaming input, one reference input, +two queries, and four outputs. Does this meet the goal? +**Yes** +No + +You have an Azure Data Lake Storage account that contains a staging zone. You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics. +Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that copies the data to a staging table in the data warehouse, and then uses a stored procedure to execute the R script. +Does this meet the goal? + +Yes +**No** + +Which Azure Data Factory component contains the transformation logic or the analysis commands of the Azure Data Factory's work? +a) Linked Services +b) Datasets +**c) Activities** +d) Pipelines + +You have an Azure Data Factory thatecontains IO pipelines. You need to label each pipeline with its main purpose of either ingest, transform, or load. The labels must be available for grouping and filtering when using the monitoring experience in Data Factory. What should you add to each +pipeline? +a) a resource tag +b) a user property +**c) an annotation** +d) a run group ID +e) a correlation ID + +You have an Azure Storage account and an Azure SQL data warehouse in the UK South region. You need to copy blob data from the storage account to the data warehouse by using Azure Data +Factory. The solution must meet the following requirements: +• Ensure that the data remains in the UK South region at all times. +• Minimize administrative effort. +Which type of integration runtime should you use? +**a) Azure integration runtime** +b) Self-hosted integration runtime +c) Azure-SSIS integration runtime + +You are planning to use Azure Databrich9 clusters for a single user. Which type of Databricks +cluster should you use? +**a) Standard** +b) Single Node +c) High Concurrency + +You are planning to use Azure Databricks clusters that provide fine-grained sharing for maximum +resource utilization and minimum query latencies. It should also be a managed cloud resource. +Which type of Databricks cluster should you use? +a) Standard +b) Single Node +**c) High Concurrency** + +You are planning to use Azure Databricks clusters with no workers and runs Spark jobs on the +driver node. Which type of Databricks cluster should you use? +a) Standard +**b) Single Node** +c) High Concurrency + +Which Azure Data Factory component orchestrates a Vansformation job or runs a data movement +command? +**a) Linked Services** +b) Datasets +c) Activities + +You have an Azure virtual machine that has Microsoft SQL Server installed. The server contains a table named Tablel. You need to copy the data from Tablel to an Azure Data Lake Storage Gen2 +account by using an Azure Data Factory V2 copy activity. +Which type of integration runtime should you use? +**a) Azure integration runtime** +b) Self-hosted integration runtime +c) Azure-SSIS integration runtime + +Which browsers are recommended for best use with Azure Databricks? +**a) Google Chrome** +**b) Firefox** +**c) Safari** +**d) Microsoft Edge** +e) Internet Explorer +f) Mobile browsers + +How do you connect your Spark cluster to the Azure Blob? +a) By calling the .connect() function on the Spark Cluster. +**b) By mounting it** +c) By calling the .connect() function on the Azure Blob. + +How does Spark connect to databases like MySQL, Hive and other data stores? +**a) JDBC** +b) ODBC +c) Using the REST API Layer + +You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake Storage Gen2 container. Which resource provider should you enable? +a) Microsoft.SqI +b) Microsoft.Automation +**c) Microsoft.EventGrid** +d) Microsoft.EventHub + +You plan to perform batch processing in Azure Databricks once daily. Which Azure Databricks Cluster should you choose? +a) High Concurrency +b) interactive +**c) automated** + +Which Azure Data Factory component contains thetransformation logic or the analysis commands +of the Azure Data Factory's work? +a) Linked Services +b) Datasets +**c) Activities** +d) Pipelines + +You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be +stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and +PolyBase in Azure Synapse Analytics. You need to recommend a Stream Analytics data output +format to ensure that the queries from Databricks and PolyBase against the files encounter the +fewest possible errors. The solution must ensure that the files can be queried quickly, and that the +data type information is retained. What should you recommend? +a) JSON +**b) Parquet** +c) csv +d) Avro + +You have a self-hosted integration runtime in Azure Data Factory. +The current status of the integration runtime has the following +configurations: . +Status: Running +Type: Self-Hosted +• Version: 4.4.7292.1 +• Running / Registered Node(s): 1/1 +High Availability Enabled: False +• Linked Count: 0 +Queue Length: 0 +• Average Queu%Duration. 0.00s +If the X-M node becomes unavailable, all executed pipelines will: +**fail until the node comes back online** +switch to another integration runtime +exceed the CPU limit + +The integration runtime has the following node details: +• Name: X-M +• Status: Running +• Version: 4.4.7292.1 +• Available Memory: 7697MB +• CPU Utilization: +• Network (In/Out): 1.21KBps/O.83KBps +• Concurrent Jobs (Running/Limit): 2/14 +• Role: Dispatcher/Worker +• Credential Status: In Sync +e number of concurrent jobs and the CPU usage +indicate that the Concurrent jobs (Running/Limit values +hould be: +Raised +**Lowered** +Left AS-IS + +You have an Azure Databricks resource. You need to log actions that relate to compute changes +triggered by the Databricks resources. Which Databricks services should you log? +a) workspace +b) SSH +c) DBFS +**d) clusters** +e) jobs + +Which Azure data platform is commonly used to pgocess data in an ELT framework? +**a) Azure Data Factory** +b) Azure Databricks +c) Azure Data Lake Storage + +Which Azure service is tOe best choice to manage and govern your data? +a) Azure Data Factory +**b) Azure Purview** +c) Azure Data Lake Storage + +Applications that publish messages to Azure Event Hub very frequently will get the best performance using Advanced Message Queuing Protocol (AMQP) because it establishes a persistent socket. +**True** +False + +You have an Azure Synapse Analytics dedicated SQL pool named Pooll. Pooll contains a +partitioned fact table named dbo.Sales and a staging table named stg.Sales that has the matching +table and partition definitions. You need to overwrite the content of the first partition in dbo.Sales +with the content of the same partition in stg.Sales. The solution must minimize load times. +What should you do? +a) Insert the data from stg.Sales into dbo.Sales. +**b) Switch the first partition from dbo.Sales to stg.Sales.** +c) Switch the first partition from stg.Sales to dbo.Sales. +d) Update dbo.Sales from stg.Sales. + +You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will +contain the following three workloads; +• A workload for data engineers who will use Python and SQL +• A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL +• A workload that data scientists will use to perform ad hoc analysis in Scala and R +The enterprise architecture team identifies the following standards for Databricks environments: +• The data engineers must share a cluster. +• The job cluster will be managed by using a request process whereby data scientists and data +engineers provide packaged notebooks for deployment to the cluster. +• All the data scientists must be assigned their own cluster that terminates automatically after 120 +minutes of inactivity. Currently, there are three data scientists. +You need to create the Databrick clusters for the workloads. +Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster +for the data engineers, and a Standard cluster for the jobs. Does this meet the goal? +Yes +**No** + +Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for e +data engineers, and a High Concurrency cluster for the jobs. Does this meet the goal? +Yes +**No** + +Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the +data engineers, and a Standard cluster for the jobs. Does this meet the goal? +**Yes** +No + +If an Event Hub goes offline before a consumer group can process the events it holds, those +events will be lost. +True +**False** + +Events are persistent. + +You are a Data Engineer for Contoso. You want to view key health metrics of your Stream Analytics +j8bs. Which tool in Streaming Analytics should you use? +**a) Dashboards** +b) Alerts +c) Diagnostics + +You are designing a real-time dashboard solution that will visualize streaming data from remote +sensors that connect to the internet. The streaming data must be aggregated to show the average +value of each 10-second interval. The data will be discarded after being displayed in the +dashboard. The solution will use Azure Stream Analytics and must meet the following +requirements: + +- Minimize latency from an Azure Event hub to the dashboard. +- Minimize the required storage. +- Minimize development effort. +What should you include in the solution? +Azure Stream Analytics input type +**Azure Event Hub** +Azure SQL Database +Azure steam analytics +Azure Power Bl +Azure Stream Analytics output type +Azure Event Hub +Azure SQL Database +Azure steam analytics +**Azure Power Bl** +Aggregation Query location +Azure Event Hub +Azure SQL Database +**Azure steam analytics** +Azure Power Bl + +Publishers can use either HTTPS or AMQP. AMQP opens a socket and can send multiple messages +over that socket. How many default partitions are available? +a) 1 +b) 2 +**c) 4** +d) 8 +e) 12 + +You are designing an enterprise data warehouse in Azure Synapse Analytics that will contain a +table named Customers. Customers will contain credit card information. You need to recommend a +solution to provide salespeople with the ability to view all the entries in Customers. The solution +must prevent all the salespeople from viewing or inferring the credit card information. What should +you include in the recommendation? +a) data masking +b) Always Encrypted +**c) column-level security** +d) row-level security + +You have an enterprise data warehouse in Azure Synapse8nalytics. Using PolyBase, you create an +external table named to query Parquet files stored in Azure Data Lake Storage Gen2 +without importing the data to the data warehouse. The external table has three columns. You +discover that the Parquet files have a fourth column named ItemlD. Which command should you +run to add the ItemlD column to the external table? +ALTER EXTERNAL TABLE (Ext) . titemsJ +ADD [ItemiDJ int; + +**DROP EXTERNAL TABLE [Ext) • (Items) +CREATE EXTERNAL TABLE (Ext) . (Items) +(titemIDl tint) NULL, +(ItemNamel nvarchar (SO) NULL, +[Itemtype) nvarchar (20) NULL, +[ItemDescriptionJ nvarchar (2 SO) ) +WITH +LOCATION: ' / Items/' +DATA SOURCE = Azureoatazakestore, +FILE FORMAT = PARQUET, +REJECT TYPE VALVE, +REJECT VALVE = O** + +ALTER TABLE [Ext) . [Items) +ADD [ItemlD) int; + +DROP EXTERNAL FILE FORMAT +CREATE EXTERNAL FILE FORMAT parquetfiiel +FORMAT TYPE PARQUET, +COMPRESSION 'org.apache.hadoop.io.compress.snappycodec' + +You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool. Analysts write a +complex SELECT query that contains multiple JOIN and CASE statements to transform data for use +in inventory reports. The inventory reports will use the data and additional WHERE parameters +depending on the report. The reports will be produced once daily. You need to implement a +solution to make the dataset available for the reports. The solution must minimize query times. +What should you implement? +a) an ordered clustered column store index +**b) a materialized view** +c) result set caching +d) a replicated view + +You are designing a partition strategy for. fact table in an Azure Synapse Analytics dedicated SQL +pool. The table has the following specifications: +• Contain sales data for 20,000 products. +e Use hash distribution on a column named ProductlD. +• Contain 2.4 billion records for the years 2021 and 2022. +Which number of partition ranges provides optimal compression and performance for the +clustered columnstore index? +**a) 40** +b) 240 +c) 400 +d) 2400 + +Records / (1 million * 60) = 2,400,000,000 / 60,000,000 = 40 + +You are designing an Azure Synapse Analytics dedicated SQL pool. You need to ensure that you +can audit access to Personally Identifiable Information (PII). What should you include in the +solution? +a) column-level security +b) dynamic data masking +c) row-level security (RLS) +**d) sensitivity classifications** + +You are designing a security model for an Azure Synapse Analytics dedicated SQL pool that will +support multiple companies. You need to ensure that users from each company can view only the +data of their respective company. Which two objects should you include in the solution? Each +correct answer presents part of the solution. NOTE: Each correct selection is worth one point. +**a) a security policy** +**b) a custom role-based access control (RBAC) role** +c) a function +d) a column encryption key +e) asymmetric keys + +You have an Azure Synapse Analytics job that uses Scala. You need to view the status of the job. +What should you do? +a) From Synapse Studio, select the workspace. From Monitor, select SQL requests. +b) From Azure Monitor, run a Kusto query against the AzureDiagnostics table. +**c) From Synapse Studio, select the workspace. From Monitor, select Apache Sparks applications.** +d) From Azure Monitor, run a Kusto query against the SparkLoggingEvent_CL table. + +You have an Azure Synapse Analytics database, within this, you have a dimension table named +Stores that contains store information. There is a total of 263 stores nationwide. Store information +is retrieved in more than half of the queries that are issued against this database. These queries +include staff information per store, sales information per store and finance information. You want +to improve the query performance of these queries by configuring the table geometry of the stores +table. Which is the appropriate table geometry to select for the stores table? +a) Round Robin +b) Non-Clustered +**c) Replicated table** + +What is the default pqrt for connecting to an enterprise data warehouse in Azure Synapse Analytics? +a) TCP port 1344 +b) UDP port 1433 +**c) TCP port 1433** + +WITH +stepl AS (SELECT * +FROM inputl +PARTITION BY StatelD +INTO IO) , +step2 AS (SELECT +FROM input2 +PARTITION BY Staten +INTO 10) +INTO output +FROM stepl +PARTITION BY StatelD +UNION +SELECT * INTO output +FROM step2 +PARTITION BY statelD +Statement +The query combines two streams of partitioned +data. **Yes** +The stream scheme key and count must match +the output scheme **Yes** +Providing 60 streaming units will optimize the +performance of the query. **Yes** + +You have a table in an Azure Synapse Analytics dedicated SQL pool. The table was created by using the following Transact-SQL statement. +-REATE TABLE [dbo] . [DimEmp10yee] ( +You need to alter the table to meet the following +(EmployeeKey] tint) IDENTITY (1, 1) NOT NULL, +[EmployeeIDJ [intl NOT NULL, +(FirstName) (varchar) (100) NOT NULL, +(LastName] (varchar) (100) NOT NULL, +[ JobTitIeJ [varchar] (100) NULL, +[LastHireDatel [date] NULL, +[StreetAddress] [varchar] (500) NOT NULL, +(City) (varchavl (200) NOT NULL, +[StateProvincel [varchar] (50) NOT NULL, +(Portalcodel [varchar] (10) NOT NULL +Which column should you add to the table? +requirements: +[ManagerEmployeelD] [smallint] NULL +[ManagerEmployeeKeyl [smallint] NULL +**[ManagerEmployeeKey] [int] NULL** +[ManagerName] [[varchar](https://www.notion.so/200)](200) NULL +- Ensure that users can identify the current +manager of employees: +- Support creating an employee reporting +hierarchy for your entire company. +- Provide fast lookup of the managers' +attributes such as name and job title. + +You need to implement a Type 3 slowly changing dimension (SCD) for product category data in an +Azure Synapse Analytics dedicated SQL pool. You have a table that was created by using the +following Transact-SQL statement. +CREATE TABLE [DBO] . [DimProductJ ( +[ productKey) tint) IDENTITY (1, 1) NOT NULL, +[Productsource1D) [intl NOT NULL, +t ProductName) tnvazchar) (100) NULL, +(IS) NULL, +tse11StartDate) [date) NOT NULL, +tse11EndDate) [date) NULL, +[RowlnsertedDateTime) [datetime) NOT NULL, +[RowupdatedDateTime) [datetime) NOT NULL, +[ETLAudit1D) [intl NOT NULL +Which two columns should you add to the table? +Each correct answer presents part of the solution? +a) [EffectiveStartDate] [datetime] NOT NULL, +**[CurrentProductCategoryl [nvarchar] (100) NOT NULL,** +b) +[EffectiveEndDatel [datetime] NULL, +c) +[ProductCategoryl [nvarchar] (100) NOT NULL, +d) +**[OriginalProductCategory] Invarcharl (100) NOT NULL,** +e) + +You have a SQL pool in Azure Synapse. You plan to load data from Azure Blob storage to a staging table. Approximately 1 million rows of data will be loaded daily. The table will be truncated before each daily load. You need to create the staging table. The solution must minimize how long it takes to load the data to the staging table. How should you configure the table? To answer, select the +appropriate options in the answer area. +Distribution +**Hash** +Replicated +Round Robin + +Indexing +Clustered +**Clustered Columnstore** +Heap + +Partitioning +**Date** +None + +You have a table named SalesFact in an enterprise data warehouse in Azure Synapse Analytics. +SalesFact contains sales data from the past 36 months and has the following characteristics: +a) Is partitioned by month +b) Contains one billion rows +c) Has clustered columnstore indexes +Beginning of each month, you need to remove data from SalesFact that is older than 36 months as +quickly as possible. Which three actions should you perform in sequence in a stored procedure? +**Switch the partition containing the stale data +from SalesFact to SalesFact Work. 2** +Truncate the partition containing the stale data +**Drop the SalesFact_Work table. 3** +**Create an empty table named SalesFact_Work +that has the same schema as SalesFact. 1** +Execute a DELETE statement where the value in +the Date column is more than 36 months ago. +Copy the data to a new table by using CREATE +TABLE AS SELECT (CTAS). + +You develop data engineering solutions for a company. A project requires analysis of real-time Twitter feeds. Posts that contain specific keywords must be stored and processed on Microsoft +Azure and then displayed by using Microsoft Power Bl. You need to implement the solution. Which +five actions should you perform in sequence? +Create an HDInsight cluster with the Hadoop cluster type. +**Create a Jupyter Notebook 2** +**Run a job that uses the spark streaming API to ingest data** +**from Twitter 4** +Create a Runbook +**Create an HDInsight cluster with the Spark cluster type 1** +**Create a table 3** +**Load the hvac table to Power BI Desktop 5** + +You have an Azure SQL database named DBI in the East US 2 region. You need to build a secondary geo-replicated copy of DBI in the West US region on a new server. Which three actions should you perform in se uence? +Implement log shipping +**On the secondary server create logins that match the SIDS on +the primary server 3** +**Create a target server and select a pricing tier 2** +Set the quorum mode and create a failover policy +**From the Geo replication settings of the DBI select West US 1** + +You need to create an Azure Cosmos DB account that will use encryption keys managed by your +organization. Which four actions should you perform in sequence? +**Generate a new key in the Azure Key vault 4** +**Create an Azure Key vault and enable purge protection 1** +Create a new Azure Cosmos DB account and set Data +Encryption to Service Managed Key +**Add an Azure Key vault access policy to grant permissions to +the Azure Cosmos DB principal 3 +Create a new Azure Cosmos DB account set Data Encryption +to Customer managed key (Enter key URI) and enter the key +URI 2** + +You are planning the deployment of Azure Data Lake Storage Gen2. You have the following two +reports that will access the data lake: +• Reportl: Reads three columns from a file that contains 50 columns. +• Report2: Queries a single record based on a timestamp. +You need to recommend in which format to store the data in the data lake to support the reports. +The solution must minimize read times. What should you recommend for each report? To answer, +select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. +Report 1 +Avro +csv +**Parquet** +TSV +Report 2 +**Avro** +csv +Parquet +TSV + +How long is the Recovery Point Objective for AzureSynapse Analytics? +a) 4 hours +**b) 8 hours** +c) 12 hours +d) 16 hours + +You have an enterprise data warehouse in Azure Synapse Analytics named DWI on a server named +Serverl. You need to verify whether the size of the transaction log file for each distribution of DWI +is smaller than 160 GB. What should you do? +**a) On the master database, execute a query against the +sys.dm_pdw_nodes_os_performance_counters dynamic management view.** +b) From Azure Monitor in the Azure portal, execute a query against the logs of DWI. +c) On DWI, execute a query against the sys.database_files dynamic management view. +d) Execute a query against the logs of DWI by using the Get-AzOperationallnsightSearchResult +PowerShell cmdlet. + +You have an enterprise data warehouse in Azure Synapse Analytics. You need to monitor the data +warehouse to identify whether you must scale up to a higher service level to accommodate the +current workloads. Which is the best metric to monitor? More than one answer choice may achieve +the goal. Select the BEST answer. +a) CPU percentage +**b) DWU used** +c) DWU percentage +d) Data IO percentage + +You are a data architect. The data engineering team needs to configure a synchronization of data between an on-premises Microsoft SQL Server database to Azure SQL Database. Ad-hoc and reporting queries are being overutilized the on-premises production instance. The synchronization process must: +• Perform an initial data synchronization to Azure SQL Database with minimal downtime +• Perform bi-directional data synchronization after initial synchronization +You need to implement this synchronization solution. Which synchronization method should you +use? +a) transactional replication +b) Data Migration Assistant (DMA) +c) backup and restore +d) SQL Server Agent job +**e) Azure SQL Data Sync** + +You have an Azure subscription that contains an Azure Storage account. You plan to implement +changes to a data storage solution to meet regulatory and compliance Standards. +Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100 +days. +Solution: You schedule an Azure Data Factory pipeline with a delete activity. Does this meet the +goal? +Yes +**No** + +Solution: You apply an expired tag to the blobs in the storage account. Does this meet the goal? +Yes +No + +Solution: You apply an Azure Blob storage lifecycle policy. Does this meet the goal? +**Yes** +No + +You want to ingest data from a SQL Server database hosted on an on-premises Windows Server. +What integration runtime is required for Azure Data Factory to ingest data from the on-premises +server? +a) Azure integration runtime +b) Azure-SSIS integration runtime +**c) Self-hosted inteqration runtime** + +By default, how long are the Azure Data Factory diagnostic logs retained for? +a) 15 days +b) 30 days +**c) 45 days** + +You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake Storage Gen2 container. Which resource provider should you enable? +a) Microsoft.SqI +b) Microsoft.Automation +**c) Microsoft.EventGrid** +d) Microsoft.EventHub + +You have an Azure Data Factory instance that contains two pipelines named Pipelinel &Pipeline2. +Pipelinel has the activities shown in the following exhibit. +Pipeline2 has the activities shown in the following exhibit. +Stored procedure +Set variable +Stored procedurel +Set variablel +Exectne Pipebne E' +Execute Pipelinel +Set vanable +(x) +Set variable 1 +You execute Pipeline2, and Stored procedure1 in Pipeline1 fails. What is the status of the pipeline +runs? +**a) Pipeline1 and Pipeline2 succeeded.** +b) Pipelinel and Pipeline2 failed. +c) Pipelinel succeeded and Pipeline2 failed. +d) Pipelinel failed and Pipeline2 succeeded. + +You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns: +• TransactionType: 40 million rows per transaction type +• CustomerSegment: 4 million per customer segment +• TransactionMonth: 65 million rows per month +• AccountType: 500 million per account type +You have the following query requirements: +• Analysts will most commonly analyze transactions for a given month. +• Transactions analysis will typically summarize transactions by transaction type, customer segment, +and/or account type +You need to recommend a partitiGh strategy for the table to minimize query times. On which column should +you recommend partitioning the table? +a) CustomerSegment +b) AccountType +c) TransactionType +**d) TransactionMonth** + +Your company wants to route data rows to different streams based on matching conditions. Which +transformation in the Mapping Data Flow should you use? +**a) Conditional Split** +b) Select +c) Lookup + +Which transformation is used to load data into a data store or compute resource? +a) Source +b) Destination +**c) Sink** +d) Window + +A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses Azure Event Hub to ingest data and an Azure Stream Analytics cloud job to analyze the data. +The cloud job is configured to use 120 Streaming Units (SU). You need to optimize performance for +the Azure Stream Analytics job. Which two actions should you perform? Each correct answer +presents part of the solution. NOTE: Each correct selection is worth one point. +a) Implement event ordering. +b) Implement Azure Stream Analytics user-defined functions (UDF). +c) Implement query parallelization by partitioning the data output. +**d) Scale the SU count for the job up.** +e) Scale the SU count for the job down. +**f) Implement query parallelization by partitioning the data input.** + +By default, how are corrupt records dealt with using spark.read.json()? +**a) They appear in a column called "_corrupt_record"** +b) They get deleted automatically +c) They throw an exception and exit the read operation + +How do you specify parameters when reading data? +a) Using .option() during your read allows you to pass key/value pairs specifying aspects of your +read +b) Using .parameter() during your read allows you to pass key/value pairs specifying aspects of +your read +**c) Using Beys() during your read allows you to pass key/value pairs specifying aspects of your read** + +You create an Azure Databricks cluster and specify an additional library to install. When you +attempt to load the library to a notebook, the library in not found. You need to identify the cause of +the issue. What shouldyou review? +a) notebook logs +**b) cluster event logs** +c) global init scripts logs +d) workspace logs + +Your company analyzes images from security cameras and sends alerts to security teams that respond to unusual activity. The solution uses Azure Databricks. You need to send Apache Spark level events, Spark Structured Streaming metrics, and application metrics to Azure Monitor. Which +three actions should you perform in sequence? +Create a data source in Azure Monitor. +Configure the Databricks cluster to use the Databricks +monitoring library **1** +Deploy Grafana to an Azure virtual machine +Build a spark-listeners-loganaIytics-1.0-SNAPSHOT.jar JAR +file. **2** +Create Dropwizard counters in the application code **3** + +You have an Azure Data Lake Storage Gen2 account that contains JSON files for customers. The files contain two attributes named FirstName and LastName. You need to copy the data from the JSON files to an Azure Synapse Analytics table by using Azure Databricks. A new column must be +created that concatenates the FirstName and LastName values. You create the following +components: + +- A destination table in Azure Synapse +- An Azure Blob storage container +- A service principal +Which five actions should you perform in sequence next in a Databricks notebook? +Specify a temporary folder to stage the data **4** +Write the results to Data Lake Storage +Drop the data frame +Read the file into a data frame +Write the results to a table in Azure Synapse **5** +Perform transformations on the data frame **3** +Mount the Data Lake Storage onto DBFS **1** +Perform transformations on the file **2** + +You are designing an Azure Databricks interactive cluster. You need to ensure that the cluster +meets the following requirements: + +- Enable auto-termination +- Retain cluster configuration indefinitely after cluster termination. +What should you recommend? + +a) Start the cluster after it is terminated. +b) Pin the cluster +c) Clone the cluster after it is terminated. +d) Terminate the cluster manually at process completion. + +You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day. You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load +times. What should you include in the solution? +a) Partition by DateTime fields. +**b) Sink to Azure Queue storage.** +c) Include a watermark column. +d) Use a JSON format for physical data storage. + +You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files. The +size of the files will vary based on the number of events that occur per hour. File sizes range from +4 KB to 5 GB. You need to ensure that the files stored in the container are optimized for batch +processing. What should you do? +a) Convert the files to JSON +b) Convert the files to Avro +c) Compress the files +**d) Merge the files** + +You are planning a solution to aggregate streaming data that originates in Apache Kafka and is +output to Azure Data Lake Storage Gen2. The developers who will implement the stream +processing solution use Java. Which service should you recommend using yo process the +streaming data? +a) Azure Event Hubs +b) Azure Data Factory +c) Azure Stream Analytics +**d) Azure Databricks** + +You need to implement an Azure Databricks cluster that automatically connects to Azure Data +Lake Storage Gen2 by using Azure Active Directory (Azure AD) integration. How should you +configure the new cluster? +ier +**Premium** +Standard +Advanced option to enrble +**Azure Data Lake Storage Credential Passthrough** +Table access control + +Which Azure Data Factory process involves using compute services to produce data to feed +production environments with cleansed data? +a) Connect and collect +**b) Transform and enrich** +c) Publish +d) Monitor + +You have a new Azure Data Factory environment. You need to periodically analyze pipeline +executions from the last 60 days to identify trends in execution durations. The solution must use +Azure Log Analytics to query the data and create charts. Which diagnostic settings should you +configure in Data Factory? To answer, select the appropriate options in the answer area. +og Type +ActivityRuns +AllMetrics +**PipelineRuns** +TriggerRuns +torage Location +An Azure event hub +**An Azure storage account** +Azure Log Analytics + +You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL +pool. You create a table by using the Transact-SQL statement shown in the following exhibit. +CREATE TABLE [DBOJ . [Dim?roduct] ( +[ProductKeyJ [intl IDENTITY (1, 1) NOT NULL, +( productsource1D] tint] NOT NULL, +(ProductName] [nvarchar] ( 100) NOT NULL, +[ProductNumber] Invarchar] (25) NOT NULL, +[Color] (nvarchar) ( 15) NULL, +[Size) [nvarchar] (5) NULL, +[Weight] [decimal] (8, 2) NULL, +[ ProductCategory] [nvarchar] ( 100) NULL, +(SellStartDate] (date) NOT NULL, +[SellEndDate] [date] NULL, +[RowlnsertedDateT Ime ] [datetime) NOT NULL, +(RowUpdatedDateTime) (datetime) NOT NULL, +[ETLAudit1DJ [intl NOT NULL +Use the drop-down menus to select the answer +choice that completes each statement based +on the information presented in the graphic. +imProduct is a [answer choicel slowly changing +dimension (SCD) +Type 0 +Type 1 +**Type 2** +Advanced option to enable +A surrogate key +**A business key** +Ap audit column + +You need to schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure +Data Lake Storage Gen2 container. Which type of trigger should you use? +a) on-demand +b) tumbling window +c) schedule +**d) event** + +You have two Azure Data Factory instances named ADFdev and ADFprod. ADFdev connects to an Azure DevOps Git repository. You publish changes from the main branch of the Git repository to ADFdev. You need to deploy the artifacts from ADFdev to ADFprod. What should you do first? +a) From ADFdev, modify the Git configuration. +b) From ADFdev, create a linked service. +**c) From Azure DevOps, create a release pipeline.** +d) From Azure DevOps, update the main branch. + +You have an Azure data factory. You need to examine the pipeline failures from the last 60 days. What should you use? +a) the Activity log blade for the Data Factory resource +b) the Monitor & Manage app in Data Factory +c) the Resource health blade for the Data Factory resource +**d) Azure Monitor** + +Your company is building a Datawarehouse where they want to keep track of changes in customer mailing address. You want to keep the current mailing address and the previous one. Which SCD +type should you use? +a) Type 1 SCD +b) Type 2 SCD +**c) Type 3 SCD** +d) Type 6 SCD + +your company is building a Datawarehouse where they want to keep only the latest vendor's +company name from whom your company purchases raw materials. Which SCD type should you +use? +**a) Type 1 SCD** +b) Type 2 SCD +c) Type 3 SCD +d) Type 6 SCD + +Your company is building a Datawarehouse where they want to keep track of changes in customer +mailing address. You want to keep the current mailing address and the previous one. Both new and +old mailing address should be stored as different rows. Which SCD type should you use? +a) Type 1 SCD +**b) Type 2 SCD** +c) Type 3 SCD +d) Type 6 SCD + +You are building an Azure Analytics query that will receive input data from Azure IoT Hub and write +the results to Azure Blob storage. You need to calculate the difference in readings per sensor per +hour. How should ou com lete the uer ? +SELECT sensorld, +growth = reading +**LAG** +LAST +LEAD +(reading)OVER (PARTITION BY sensorld +**LIMIT +DURATION** +Biackboård +OFFSET +WHEN +(hour, I)) FROM input + +You have an Azure Synapse Analytics dedicated SQL pool. You need to ensure that data in the pool +is encrypted at rest. The solution must NOT require modifying applications that query the data. +What should you do? +a) Enable encryption at rest for the Azure Data Lake Storage Gen2 account. +**b) Enable a Transparent Data Encryption TDE for the pool** +c) Use a customer-managed key to enable double encryption for the Azure Synapse workspace. +d) Create an Azure key vault in the Azure subscription grant access to the pool. + +You have an Azure subscription that contains a logical Microsoft SQL server named Server1. Server1 hosts an Azure Synapse Analytics SQL dedicated pool named Pool. You need to recommend a Transparent Data Encryption (TDE) solution for Serverl. The solution must meet the +following requirements: + +- Track the usage of encryption keys. +- Maintain the access of client apps to Pooll in the event of an Azure datacenter outage that +affects the availability of the encryption keys. +What should you include in the recommendation? +To Track encryption key usage +Always Encrypted +**TDE with customer-managed keys** +TDE with platform-managed keys +To maintain client app access in the event of a +datacenter outage +**Create and configure Azure key vaults in two Azure +regions** +Enable Advanced Data security on serverl +Implement the client apps by using a Microsoft .NET +Framework data provider + +You plan to create an Azure Synapse Analytics dedicated SQL pool. You need to minimize the time +it takes to identify queries that return confidential information as defined by the company's data +privacy regulations and the users who executed the queues. Which two components should you +include in the solution? +**a) sensitivity-classification labels applied to columns that contain confidential information** +b) resource tags for databases that contain confidential information +c) audit logs sent to a Log Analytics workspace +**d) dynamic data masking for columns that contain confidential information** + +While using Azure Data Factory you want to parameterize a linked service and pass dynamic values +at run time. Which supported connector should you use? +a) Azure Data Eke Storage Gen2 +b) Azure Data Factory variables +**c) Azure Synapse Analytics** +d) Azure Key Vault + +Which file formats Azure Data Factory support? +a) Avro format +b) Binary format +c) Delimited text format +d) Excel format +e) JSON format +f) ORC format +g) Parquet format +h) XML format +**i) ALL OF THE ABOVE** + +Which property indicates the parallelism, you want the copy activity to use? +**a) parallelCopies** +b) stagedCopies +c) multiCopies + +Using the Azure Data Factory user interface (UX) you want to create a pipeline that copies and +transforms data from an Azure Data Lake Storage (ADLS) Gen2 source to an ADLS Gen2 sink using +mapping data flow. Choose the correct steps in right ogder. +a) Create a data factory account +**b) Create a data factory.** +c) Create a copy activity +**d) Create a pipeline with a Data Flow activity.** +e) Validate copy activity +**f) Build a mapping data flow with four transformations.** +**g) Test run the pipeline.** +**h) Monitor a Data Flow activity** + +In Azure Data Factory: What is an example of a bronching activity used in control flows? +**a) The If-condition** +b) Until-condition +c) Lookup-condition + +Which activity can retrieve a dataset from any of the data sources supported by data factory and +Synapse pipelines? +a) Find activity +**b) Lookup activity** +c) Validate activity + +You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool. Bnalysts write a complex SELECT query that contains multiple JOIN and CASE statements to transform data for use +in inventory reports. The inventory reports will use the data and additional WHERE parameters +depending on the report. The reports will be produced once daily. You need to implement a +solution to make the dataset available for the reports. The solution must minimize query times. +What should you implement? +a) an ordered clustered columnstore index +**b) a materialized view** +c) result set caching +d) a replicated table + +You have an Azure subscription that contains an Azure Storage account. You plan to implement changes to a data storage solution to meet regulatory and compliance standards. Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100 days. + +Solution: You apply an expired tag to the blobs in the storage account. Does this meet the goal? +Yes +**No** + +You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data thaå has an average length of 1.1 MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure +Synapse Analytics. You need to prepare the files to ensure that the data copies quickly. +Solution: You copy the files to a table that has a columnstore index. Does this meet the goal? + +Yes +**No** + +Solution: You modify the files to ensure that each row is more than I MB. Does this meet the goal? +**No** + +You have an Azure Storage account that contains 100 GB of files. The files contain rows of text +and numerical values. 75% of the rows contain description data that has an average length of 1.1 +MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure +Synapse Analytics. You need to prepare the files to ensure that the data copies quickly. +Solution: You convert the files to compressed delimited text files. +**Yes** +No + +You have an Azure Synapse Analytics workspace named WSI that contains an Apache Spark pool +named Pooll. You plan to create a database named DBI in Pooll. You need to ensure that when +tables are created in DBI, the tables are available automatically as external tables to the built-in +serverless SQL pool. Which format should you use for the tables in DBI? +a) CSV +b) ORC +c) JSON +**d) Parquet** + +You are planning a solution to aggregate streaming data that originates in Apache Kafka and is output to Azure Data Lake Storage Gen2. The developers who will implement the stream +processing solution use Java. Which service should you recommend using to process the +streaming data? +a) Azure Event Hubs +b) Azure Data Factery +c) Azure Stream Analytics +**d) Azure Databricks** + +You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse Analytics dedicated SQL pool. You plan to keep a record of changes to the available fields. +The supplier data contains the following columns. +SupplierSystemlD +SupplierName +SupplierDescription +SupplierCategory +Name +SupplierAddressl +SupplierAddress2 • +SupplierCity +SupplierCountry +SupplierPostalCode +Which three additional columns should you add +to the data to create a Type 2? +**a) surrogate primary key** +**b) effective start date** +c) business key +**d) last modified date** +e) effective end date +f) foreign key + +You have a Microsoft SQL Server database theat uses a third normal form schema. You plan to migrate the data in the database to a star schema in an Azure Synapse Analytics dedicated SQL +pool. You need to design the dimension tables. The solution must optimize read operations. What +should you include in the solution? +ransform data for dimension tables by +Maintaining to a third normal form +Normalizing to a fourth normal form +**Denormalizing to a second normal form** +For primary key columns in dimension tables use +**New IDENTITY columns** +A new computed columns +The business key column from the source system + +You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL pool. You create a table by using the Transact-SQL statement shown in the following exhibit. +CREATE TABLE [DBO] . [DimProductJ ( +[ProductKeyJ [int] IDENTITY (1, 1) NOT NULL, +(ProductSource1D1 [intl NOT NULL, +(ProductName I [nvarchar] (100) NOT NULL, +[ProductNumber] [nvarchar] (25) NOT NULL, +(Colorl [nvarchar] (15) NULL, +[Size] [nvarchar] (5) NULL, +[Weight) [decimal] (8, 02) NULL, +[ ProductCategoryJ [nvarchar] ( 100) NULL, +(SellStartDateJ (date] NOT NULL, +[SellEndDateJ [date] NULL, +(RowinsertedDateTime1 [datetime] NOT NULL, +(RowUpdatedDateTime] (datetime) NOT NULL, +[ETLAudit1DJ Cint] NOT NULL +Use the drop-down menus to select the answer +choice that completes each statement based on +the information presented in the graphic. +DimProduct is a ---- slowly changing dimension (SCD) +Type 1 +**Type 2** +Type 3 +The ProductKey column is ---- +**a surrogate key** +A business key +An audit column + +You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL pool. You create a table by usingåhe Transact-SQL statement shown in the following exhibit. +CREATE TABLE [DBOI . [Dimproduct] ( +[ProductKeyJ [intl IDENTITY (1, 1) NOT NULL, +[ProductSource1D] tint] NOT NULL, +[ ProductName J [nvarchar] (100) NOT NULL, +(ProductNurnber] [nvarchar] (25) NOT NULL, +(Color) (nvarcharl ( 15) NULL, +[Size) [nvarchar] (5) NULL, +[Weightl [decimall (8, 2) NULL, +[ ProductCategory] (nvarchar) ( 100) NULL, +[SellStartDate] [date] NOT NULL, +[SellEndDateJ [datel NULL, +[RowinsertedDateTime] (datetime) NOT NULL, +( RowUpdatedDateTimel [da NOT NULL, +[ETLAudit1DJ tint J NOT NULL +Which two columns should you add to the table +so that the table supports storing two versions +of a dimension member as separate columns? +Each correct answer presents part of the +solution? +a) +b) +c) +d) +e) +[EffectiveStartDatel [datetime] NOT NULL, +**[CurrentProductCategoryl Invarchar] (100) NOT +NULL,** +[EffectiveEndDatel [datetimel NULL, +[ProductCategoryl [nvarcharl (100) NOT NULL, +**[OriginalProductCategoryJ [nvarcharl (100) NOT +NULL,** + +You are designing a data mart for the human resources (HR) department at your company. The data mart will contain employee information and employee transactions. From a source system, +you have a flat extract that has the following fields: +EmployeelD +FirstName +• LastName +• Recipient +• GrossAmount +• Transac\ionlD +• GovernmentlD +NetAmountPaid +TransactionDate +You need to design a star schema data model in an Azure Synapse +Analytics dedicated SQL pool for the data mart. Which two tables +should you create? +a) a dimension table for Transaction +b) a dimension table for EmployeeTransaction +**c) a dimension table for Employee** +d) a fact table for Employee +**e) a fact table for Transaction** + +You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool. You need +to create a surrogate key for the table. The solution must provide the fastest query performance. +What should you use for the surrogate key? +a) a GUID column +b) a sequence object +**c) an IDENTITY column** + +You are implementing a batch dataset in the Parquet format. Data files will be produced be using +Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an +Azure Synapse Analytics serverless SQL pool. You need to minimize storage costs for the solution. +What should you do? +a) Use Snappy compression for the files. +b) Use OPENROWSET to query the Parquet files. +**c) Create an external table that contains a subset of columns from the Parquet files.** +d) Store all data as string in the Parquet files. + +You have an Azure subscription that contains an Azure Blob Storage account named storagel and an Azure Synapse Analytics dedicated SQL pool named Pooll. You need to store data in storagel. +The data will be read by Pooll. The solution must meet the following requirements: +• Enable Pooll to skip columns and rows that are unnecessary in a query. +• Automatically create column statistics. +• Minimize the size of files. +Which type of file should you use? +a) JSON +**b) Parquet** +c) Avro +d) CSV + +You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use +in an analytical workload. You need to recommend a format for the transformed files. The solution +must meet the following requirements: +• Contain information about the data types of each column in the files. +Support querying a subset of columns in the files. +• Support read-heavy analytical workloads. +• Minimize the file size. +What should you recommend? +a) JSON +b) CSV +c) Apache Avro +**pac e arquet** + +A company purchases IoT devices to monitor manutactunng machinery. The company uses an IoT +appliance to communicate with the IoT devices. The company must be able to monitor the devices +in real-time. You need to design the solution. What should you recommend? +) Azure Data Factory instance using Azure PowerShell +) Azure Analysis Services using Microsoft Visual Studio +**) Azure Stream Analytics cloud job using Azure PowerShell** +) Azure Data Factory instance using Microsoft Visual Studio + +You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Contacts. +Contacts contains a column named Phone. You need to ensure that users in a specific role only +see the last four digits of a phone number when querying the Phone column. What should you +include in the solution? +a) column encryption +**b) dynamic data masking** +c) a default value +d) table partitions +e) row level security (RLS) + +You plan to ingest streaming social media data byusing Azure Stream Analytics. The data will be +stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and +PolyBase in Azure SQL Data Warehouse. You need to recommend a Stream Analytics data output +format to ensure that the queries from Databricks and PolyBase against the files encounter the +fewest possible errors. The solution must ensure that the files can be queried quickly and that the +data type information is retained. What should you recommend? +**a) Avro** +b) CSV +c) Parquet +d) JSON + +You have an Azure Storage account. You plan to copy one million image files to the storage +account. You plan to share the files with an external partner organization. The partner organization +will analyze the files during the next year. You need to recommend an external access solution for +the storage account. The solution must meet the following requirements: + +- Ensure that only the partner organization can access the storage account. +- Ensure that access of the partner organization is removed automatically after 365 days. +What should you include in the recommendation? +a) shared keys +b) Azure Blob storage lifecycle8nanagement policies +c) Azure policies +**d) shared access signature (SAS)** + +You work in ABC company and you as data engineer is given the responsibility to manage the jobs +in Azure. You decide to add a new job. While specifying the job constraints you set +maxWallClockTime property to 30 minutes. What is the impact of this? +a) The job can be in a ready state for a maximum of 30 minutes +b) The job can be in an inactive state for a maximum of 30 minutes +**c) The job can be in the active or running state for a maximum of 30 minutes** +d) The job will automatically start in 30 minutes + +You have an Azure data factory named ADFI. You currently publish all pipeline authoring changes +directly to ADFI. You need to implement version control for the changes made to pipeline artifacts +The solution must ensure that you can apply version control to the resources currently defined in +the UX Authoring canvas for ADFI. Which two actions should you perform? +a) from the U"uthoring canvas selegVSet [up.de](http://up.de/) repository +b) Create a Git repository. +c) Create a GitHub action. +ckboardl +d) Create an Azure Data Factory trigger. +e) From the UX Authoring canvas, select Publish. +f) From the UX Authoring canvas, run Publish All. \ No newline at end of file diff --git a/README.md b/README.md index 9d2a5d3..05a73b2 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,14 @@ # Azure Data Engineer Associate Questions + ## Suggestions + - Read this [DP203 Notes1](/files/Dp203DEnotes.pdf) - Read this [DP203 Notes2](/files/Dp203notes.pdf) - Practice this [github by Microsoft](https://microsoftlearning.github.io/dp-203-azure-data-engineer/) - Udemy [course](https://www.udemy.com/course/data-engineering-on-microsoft-azure/)[I personally opted this which includes practice questions too. Use Udemy for Bussiness for free] Repo [link](https://github.com/Amrit-Hub/Azure-Data-Engineer-Associate-Questions) + ## Questions 1. hot vs cold vs archive tiers- days/when to choose @@ -44,7 +47,7 @@ Repo [link](https://github.com/Amrit-Hub/Azure-Data-Engineer-Associate-Questions 34. Read json synapse query - filedquote? 35. cross apply - openjson/opendataset/openrowset 36. txt file has list of table name. read thos tables in adf - filter/lookup -37. %%scala, scala_df.write.______(db) - load/saveastable/synapsesql +37. %%scala, scala_df.write.**\_\_**(db) - load/saveastable/synapsesql 38. synapse spark pool measuting unit - monitor? 39. Trigger type from given scenario 40. data>10000? from the shown table, dbcc pdw_showspace.... @@ -81,3 +84,7 @@ Repo [link](https://github.com/Amrit-Hub/Azure-Data-Engineer-Associate-Questions 18. [access-tiers-overview](https://learn.microsoft.com/en-us/azure/storage/blobs/access-tiers-overview?tabs=azure-portal) 19. [monitor-using-azure-monitor](https://learn.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor) 20. [synapse-analytics](https://learn.microsoft.com/en-us/azure/databricks/external-data/synapse-analytics) +21. [Dynamic Data Masking](https://learn.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking?view=sql-server-ver16) +22. [Azure Storage redundancy](https://learn.microsoft.com/en-us/azure/storage/common/storage-redundancy) +23. [Transact-SQL reference (Database Engine)](https://learn.microsoft.com/en-us/sql/t-sql/language-reference?view=sql-server-ver16) +24. [Temporal Tables](https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver16)