chOpen2017: Machine Learning for Software Developer

by resema — on  , 

"If A is a success in life, then A equals X plus Y plus Z. Work is X; Y is play, and Z is keeping your mouth shut." Einstein

Different categories for ML

Supervised Learning

An example with the housing prices and the known prices for a cluster of houses. This can be used to predict the housing price of a new house. The every data feature has a clear label.

Unsupervised Learning

Examples are clustering of e.g. flowers, customer segmetation. The data features have no clear label at the beginning. Cluster can become labels.

Reinforcement Learning

Characterised with missing classical inputs and outputs. Examples are Alpha Go, Robotic control, … Labels are completely unclear.

Semi-Supervised Learning

Labeling, which is generally expensive and/or time consuming, is reduced to a small set.

Recommender Systems

Used when existing data is extrem sparse. Cold-start. Examples are Amazon as seller.

Preconditions for Supervised Learning

  • Useful topic
    • Automated decision taking in a process
    • Operative decision
    • Repeated, time consuming or difficult decisions
    • Non-deterministic probleme
    • Root and cause
  • Data
    • Relevant data
    • Correct data
    • Enough data (depending also on algorithm)
    • Unaggregated data (better to have atomic data)
    • Data with labels
  • Feedback loop
    • Prediction of algorithm has value or not
    • Controlling of estimation
    • Adapt the model

Process Steps for Supervised Learning

Check follow-along.ipynb file.

  • Domain knowledge and data
  • Data preparation, clearing and correction
  • Feature engineering
  • Validation methodology
  • Training and model building
    • Hyperparams optimize
  • Ensembling (Combine multiple algorithms)
  • Prediction

Feature Engineering

Needs a lot of domain knowledge to create set of features. Has a big impact on performance. A neural network handles the feature generation by itself.

Possible steps:

  • Extraction
  • Construction
  • Selection

SW Developpers

A pyramide shows the needs on the SW devs:

  • predict
  • report & visualize
  • process & query
  • measure, structure & store

Algorithms

Extreme Gradient Boosting

Neural Networks

Is a universal function approximator. Used for supervised and unsupervised learning.

Deep Learning

Is a deep neural network, meaningly a higher number of hidden layers.

  • CNN: Convolutional Neural Network
  • RNN: Recurrent Neural Network

KMeans

Example in SciPy “iris”

Pseudo Code

1
2
3
4
5
6
7
8
Create k points for starting centroids (often randomly)
While any point has changed cluster assignment
	for every point in our dataset:
		for every centroid
			calculate the distance between the centroid and point
		assign the point to the cluster with the lowest distance
	for every cluster calculate the mean of the points in that cluster
		assign the centroid to the mean

Random Forest

Libraries

  • face_recognition 0.1.13, Python, https://face-recognition.readthedocs.io/en/latest/face_recognition.html
  • Dlib, C++, www.dlib.net

Personalities

  • Tom Mitchell
  • Jason Brownlee
  • Brian D. Ripley
  • www.kaggle.com
  • www.datarobot.com

Comments