Skip to content

Latest commit

 

History

History
60 lines (53 loc) · 5.73 KB

File metadata and controls

60 lines (53 loc) · 5.73 KB

Machine Learning (ML)

  • ML uses historical data to make predictions.

  • It looks at patterns within that data to create a mathematical model of the data, and then it processes similar data to make predictions.

  • Diagram illustrates this sequence, ML — A Conceptual Representation

  • image

    1. This process works because the new data is not memorized.
    2. Instead, based on the initial dataset supplied for creating the model, the ML process can find the same or similar patterns in the new data, and as such, make predictions.
    3. The initial/histrorical dataset supplied must be good—in and good quality—curated data. So it will lead to a model producing good predictions.
    4. Otherwise low-quality data will lead to a model that will likely make poor or inaccurate predictions, leading to erroneous results.
    5. ML uses specific algorithms to find patterns and retrieve insights from data.
  • 3 common categories of prediction

    1. image classification (is the image a dog or a cat?)
    2. tabular data (what is the creditworthiness of the loan applicant?)
    3. Natural language(NL) processing (is this restaurant review positive or negative?).

Common ML approaches

  • 2 mainstream approaches (more, but basic) - 1. supervised and 2. unsupervised.

  • Other types of ML approaches (but not limited to) - 1. semi-supervised, 2. reinforcement, 3. meta-learning, 4. topic modeling, 5.deep learning based on the extensive use of artificial neural networks.

  • Supervised ML - your dataset will contain a column (label) whose values you’ll want to predict.

    1. For example, Analyst at Interpol and have a list of fugitives and a column that indicates the probability of finding the offender, use this column to produce a prediction.
    2. Example inputs and desired outputs (resultant columns or labels) are presented, allowing the computer to find patterns and rules that map inputs to the desired outcomes.
    3. Algorithm types
    4. Regression: Used to predict values, such as an employee's salary. An increase is often used for time-series problems.
    5. Classification: Used for predicting classes, such as classifying pictures into different class types: houses, cars, airplanes, toys, etc. 1. Classification is usually divided into binary classification (prediction with precisely two possible values) or multiclass classification (three or more possible values).
  • Unsupervised - the dataset would not have the column or label containing the probability of finding the fugitive for the algorithm to predict.

    1. In this case, the algorithm is left on its own to find structure within the input provided—thus, it is called unsupervised.
    2. No result columns or labels are given to the algorithm with unsupervised learning, so it must find patterns and structure within the data provided.
    3. Algorithm types
    4. Clustering: Used for predicting groups with similar patterns, such as grouping online banking users based on their app usage habits.
    5. Anomaly detection: Used to predict elements that don’t align with the pattern found in the rest of the data, such as financial fraud.

How machine learning works in a nutshell

  • Use the historical data, go through data preparation or data cleaning — this includes performing activities such as adding, updating, or removing missing values.
    1. Other data preparation activities include converting text values into numerical values, given that most algorithms work best with numerical values.
  • After the data preparation step, create the model, including deciding which algorithm type to use.
  • Once the model has been created, the next step is to evaluate it to check if it executes well and gives good results for new data it hasn’t processed.
    1. To avoid providing the same data used to create the model (because that data is well-known). Instead, new data is used to check whether the model can generalize well.
    2. It's an iterative process.
    3. The model may not perform well when you evaluate it with the new data.
    4. In that case, you might have to try with a different algorithm, recreate the model, then try the new model with the new data as often as required to achieve good results.
  • Once you have a good-enough model, the next step is to deploy it in production where it will be used.
    1. It is possible once the model is in production that the data will change over time, invalidating the previously created and deployed model and degrading the system's performance and results.
    2. In that case, you’ll have to go over the initial data-gathering step and repeat the process.
  • image

ML.NET

  • ML.NET is a .NET ML Nuget library package
  • Model Builder, which uses AutoML (automated ML), provides an approachable and easy user interface to create, train, and deploy ML models.
    1. With Model Builder, it’s possible to employ different algorithms, metrics, and configuration options to create the best possible model for your data.
    2. It offers several built-in scenarios with different machine-learning use cases.
    3. Beyond producing a trained model, Model Builder can work with CSV files and SQL databases and generate the source code required to load your model, make predictions, and connect to cloud resources.
  • Using the latest model builder - if VS is not showing Model Builder UI, download and install the latest version of Model Builder.

Using ML.NET with Visual Studio

  • Create the Console App project (POCML)
  • Installing the Microsoft.ML NuGet Package