Published: February 14, 2017
Audiences: Data scientists
Technology: Azure Machine Learning, Bot Framework, Cognitive Services
Credit toward certification: MCSE
This exam measures your ability to accomplish the technical tasks listed below. View video tutorials about the variety of question types on Microsoft exams.
Please note that the questions may test on, but will not be limited to, the topics described in the bulleted text.
Do you have feedback about the relevance of the skills measured on this exam? Please send Microsoft your comments. All feedback will be reviewed and incorporated as appropriate while still maintaining the validity and reliability of the certification process. Note that Microsoft will not respond directly to your feedback. We appreciate your input in ensuring the quality of the Microsoft Certification program.
If you have concerns about specific questions on this exam, please submit an exam challenge.
If you have other questions or feedback about Microsoft Certification exams or about the certification program, registration, or promotions, please contact your Regional Service Center.
Prepare Data for Analysis in Azure Machine Learning and Export from Azure Machine Learning
Import and export data to and from Azure Machine Learning
Import and export data to and from Azure Blob storage, import and export data to and from Azure SQL Database, import and export data via Hive Queries, import data from a website, import data from on-premises SQL
Explore and summarize data
Create univariate summaries, create multivariate summaries, visualize univariate distributions, use existing Microsoft R or Python notebooks for custom summaries and custom visualizations, use zip archives to import external packages for R or Python
Cleanse data for Azure Machine Learning
Apply filters to limit a dataset to the desired rows, identify and address missing data, identify and address outliers, remove columns and rows of datasets
Perform feature engineering
Merge multiple datasets by rows or columns into a single dataset by columns, merge multiple datasets by rows or columns into a single dataset by rows, add columns that are combinations of other columns, manually select and construct features for model estimation, automatically select and construct features for model estimation, reduce dimensions of data through principal component analysis (PCA), manage variable metadata, select standardized variables based on planned analysis
Develop Machine Learning Models
Select an appropriate algorithm or method
Select an appropriate algorithm for predicting continuous label data, select an appropriate algorithm for supervised versus unsupervised scenarios, identify when to select R versus Python notebooks, identify an appropriate algorithm for grouping unlabeled data, identify an appropriate algorithm for classifying label data, select an appropriate ensemble
Initialize and train appropriate models
Tune hyperparameters manually; tune hyperparameters automatically; split data into training and testing datasets, including using routines for cross-validation; build an ensemble using the stacking method
Score and evaluate models, select appropriate evaluation metrics for clustering, select appropriate evaluation metrics for classification, select appropriate evaluation metrics for regression, use evaluation metrics to choose between Machine Learning models, compare ensemble metrics against base models
Operationalize and Manage Azure Machine Learning Services
Deploy models using Azure Machine Learning
Publish a model developed inside Azure Machine Learning, publish an externally developed scoring function using an Azure Machine Learning package, use web service parameters, create and publish a recommendation model, create and publish a language understanding model
Manage Azure Machine Learning projects and workspaces
Create projects and experiments, add assets to a project, create new workspaces, invite users to a workspace, switch between different workspaces, create a Jupyter notebook that references an intermediate dataset
Consume Azure Machine Learning models
Connect to a published Machine Learning web service, consume a published Machine Learning model programmatically using a batch execution service, consume a published Machine Learning model programmatically using a request response service, interact with a published Machine Learning model using Microsoft Excel, publish models to the marketplace
Consume exemplar Cognitive Services APIs
Consume Vision APIs to process images, consume Language APIs to process text, consume Knowledge APIs to create recommendations
Use Other Services for Machine Learning
Build and use neural networks with the Microsoft Cognitive Toolkit
Use N-series VMs for GPU acceleration, build and train a three-layer feed forward neural network, determine when to implement a neural network
Streamline development by using existing resources
Clone template experiments from Cortana Intelligence Gallery, use Cortana Intelligence Quick Start to deploy resources, use a data science VM for streamlined development
Perform data sciences at scale by using HDInsights
Deploy the appropriate type of HDI cluster, perform exploratory data analysis by using Spark SQL, build and use Machine Learning models with Spark on HDI, build and use Machine Learning models using MapReduce, build and use Machine Learning models using Microsoft R Server
Perform database analytics by using SQL Server R Services on Azure
Deploy a SQL Server 2016 Azure VM, configure SQL Server to allow execution of R scripts, execute R scripts inside T-SQL statements
You are building an Azure Machine Learning Solution for an Online retailer.
When a customer selects a product, you need to recommend products that the customer might like to purchase at the same time. The recommendation should be based on what other customers purchased the same product.
Which model should you use?
A. Collaborative Filtering
B. Boosted Decision Tree Regression Model
C. Two-Class boosted decision tree
D. K-Means Clustering
You are analyzing taxi trips in New York City. You leverage the Azure Data Factory to create data pipelines and to orchestrate data movement.
You plan to develop a predictive model for 170 million rows (37 GB) of raw data in Apache Hive by using Microsoft R Serve to identify which factors contributes to the passenger tipping behavior.
All of the platforms that are used for the analysis are the same. Each worker node has eight processor cores and 28 GB Of memory.
Which type of Azure HDInsight cluster should you use to produce results as quickly as possible?
C. Interactive Hive
Note: This question is part of a series of questions that present the same Scenario.
Each question I the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution while others might not have correct solution.
Start of repeated Scenario:
A Travel agency named Margie’s Travel sells airline tickets to customers in the United States.
Margie’s Travel wants you to provide insights and predictions on flight delays. The agency is considering implementing a system that will communicate to its customers as the flight departure near about possible delays due to weather conditions.
The flight data contains the following attributes:
* DepartureDate: The departure date aggregated at a per hour granularity.
* Carrier: The code assigned by the IATA and commonly used to identify a carrier.
* OriginAirportID: An identification number assigned by the USDOT to identify a unique airport (the flight’s Origin)
* DestAirportID: The departure delay in minutes.
*DepDet30: A Boolean value indicating whether the departure was delayed by 30 minutes or more ( a value of 1 indicates that the departure was delayed by 30 minutes or more)
The weather data contains the following Attributes: AirportID, ReadingDate (YYYY/MM/DD HH), SKYConditionVisibility, WeatherType, Windspeed, StationPressure, PressureChange and HourlyPrecip.
End of repeated Scenario:
You plan to predict flight delays that are 30 minutes or more.
You need to build a training model that accurately fits the data. The solution must minimize over fitting and minimize data leakage. Which attribute should you remove?