TDSP Lifecycle – Modeling

Microsoft in the News

A year ago, Microsoft launched AI for Earth in London pledging $2 million towards the effort.  The goal of AI for Earth is to provide AI and cloud tools to researchers working on the frontlines of environmental challenges.  That areas covered are agriculture, water, biodiversity and climate change.

Over the course of the year, AI for Earth has grown to be a $50 million, five-year program.  They currently have 112 grantees in 27 countries, with 7 feature projects.

In the area of biodiversity, a few grantees taking advantage of AI for Earth include:

  • Disney, who are using it to study the Purple Martins that nest at their park. Purple Martin population has dropped 40% since 1966.
  • Protection Assistant for Wildlife Security (PAWS) is using the AI to assist parks in designing patrol patterns in order to catch poachers.
  • Wild Me is using computer vision coupled with machine learning to identify individual animals, eliminating the need for tagging them, which can harm the animal. This also allows animals to be identified from other photos posted to the internet, thus allowing researchers the ability to better track and understand their behavior.

TDSP Lifecycle – Modeling

Modeling is the fun part.  This is where you get to play with the AI and see how powerful a tool it has become.


In this section, you will have three goals as shown in the diagram.

  1. Choose what you believe will be the best data features to use in the machine-learning (“ML”) model.
  2. Choose the appropriate machine-learning process and create a model that will predict your target accurately, while remaining flexible.
  3. Once you have determined which model is best, create a production-ready model.

How to do it

            Feature Engineering

A “feature” is an attribute or property that is shared by all the independent units on which your analysis will be done.  Any attribute or property can be a feature, provided it is useful in your model.

Feature engineering is the process of wading through the raw data and creating features that contribute positively to your machine learning algorithms.  This is done by:

  • Choosing what raw data to include.
  • Aggregating the raw data where appropriate.
  • Transforming the included and aggregated data into features that can be used in your model.

By this time, you are very familiar with the data, but despite this expertise, this process is time consuming.  Expect mistakes and be prepared to come back to this step often to engineer new features to test.  It is rare that you will know in advance all the variables that will contribute to improving your model and all the variables that only introduce noise.

Some of the processes you may need to do to create features are:

  • Normalize the data to ensure it is within specified ranges.
  • Binning, which created discreet variables that can be broken down into binary features.
  • Create meta features that express the necessary information, but with fewer dimensions.

In order for you to derive insights as to what data is important and what is noise, you must have a good understanding of how your chosen features relate to each other, and how your ML algorithm processes them.

I will continue with Modeling in my next blog.

Leave a Reply

Your email address will not be published. Required fields are marked *