Introduction to Machine Learning

Human beings are exposed to data from birth. The eyes, ears, nose, skin, and tongue are continuously gathering various forms of data which the brain translates to sight, sound, smell, touch, and taste. The brain then processes various forms of raw data it receives through sensory organs and translates it to speech, which is used to express opinion about the nature of raw data received.

In today's world, sensors attached to machines are applied to gather data. Data is collected from Internet through various websites and social networking sites. Electronic forms of old manuscripts that have been digitized also add to data sets. Data is also obtained from the Internet through various websites and social networking sites. Data is also gathered from other electronic forms such as old manuscripts that have been digitized. These rich forms of data gathered from multiple sources require processing so that insight can be gained and a more meaningful pattern may be understood.

Machine learning algorithms help to gather data from varied sources, transform rich data sets, and help us to take intelligent action based on the results provided. Machine learning algorithms are designed to be efficient and accurate and to provide general learning to do the following:

  • Dealing with large scale problems
  • Making accurate predictions
  • Handling a variety of different learning problems
  • Learning which can be derived and the conditions under which they can be learned

Some of the areas of applications of machine learning algorithms are as follows:

  • Price prediction based on sales
  • Prediction of molecular response for medicines
  • Detecting motor insurance fraud
  • Analyzing stock market returns
  • Identifying risk ban loans
  • Forecasting wind power plant predictions
  • Tracking and monitoring the utilization and location of healthcare equipment
  • Calculating efficient use of energy
  • Understating trends in the growth of transportation in smart cities
  • Ore reserve estimations for the mining industry

Linear regression models present response variables that are quantitative in nature. However, certain responses are qualitative in nature. Responses such as attitudes (strongly disagree, disagree, neutral, agree, and strongly agree) are qualitative in nature. Predicting a qualitative response for an observation can be referred to as classifying that observation, since it involves assigning the observation to a category or class. Classifiers are an invaluable tool for many tasks today, such as medical or genomics predictions, spam detection, face recognition, and finance.

Clustering is a division of data into groups of similar objects. Each object (cluster) consists of objects that are similar between themselves and dissimilar to objects of other groups. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. Clustering can be used in varied areas of application from data mining (DNA analysis, marketing studies, insurance studies, and so on.), text mining, information retrieval, statistical computational linguists, and corpus-based computational lexicography. Some of the requirements that must be fulfilled by clustering algorithms are as follows:

  • Scalability
  • Dealing with various types of attributes
  • Discovering clusters of arbitrary shapes
  • The ability to deal with noise and outliers
  • Interpretability and usability

The following diagram shows a representation of clustering:

Supervised learning entails learning a mapping between a set of input variables (typically a vector) and an output variable (also called the supervisory signal) and applying this mapping to predict the outputs for unseen data. Supervised methods attempt to discover the relationship between input variables and target variables. The relationship discovered is represented in a structure referred to as a model. Usually models describe and explain phenomena, which are hidden in the dataset and can be used for predicting the value of the target attribute knowing the values of the input attributes.

Supervised learning is the machine learning task of inferring a function from supervised training data (set of training examples). The training data consists of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function.

In order to solve the supervised learning problems, the following steps must be performed:

  1. Determine the type of training examples.
  2. Gather a training set.
  3. Determine the input variables of the learned function.
  4. Determine the structure of the learned function and corresponding learning algorithm.
  5. Complete the design.
  6. Evaluate the accuracy of the learned function.

The supervised methods can be implemented in a variety of domains such as marketing, finance, and manufacturing.

Some of the issues to consider in supervised learning are as follows:

  • Bias-variance trade-off
  • Function complexity and amount of training data
  • Dimensionality of the input space
  • Noise in the output values
  • Heterogeneity of the data
  • Redundancy in the data
  • Presence of interactions and non-linearity

Unsupervised learning studies how systems can learn to represent particular input patterns in a way that reflects the statistical structure of the overall collection of input patterns. Unsupervised learning is important since it is likely to be much more common in the brain than supervised learning. For example, the activities of photoreceptors in the eyes are constantly changing with the visual world. They go on to provide all the information that is available to indicate what objects there are in the world, how they are presented, what the lighting conditions are, and so on. However, essentially none of the information about the contents of scenes is available during learning. This makes unsupervised methods essential, and allows them to be used as computational models for synaptic adaptation.

In unsupervised learning, the machine receives inputs but obtains neither supervised target outputs, nor rewards from its environment. It may seem somewhat mysterious to imagine what the machine could possibly learn given that it doesn't get any feedback from its environment. However, it is possible to develop a formal framework for unsupervised learning, based on the notion that the machine's goal is to build representations of the input that can be used for decision making, predicting future inputs, efficiently communicating the inputs to another machine, and so on. In a sense, unsupervised learning can be thought of as finding patterns in the data above and beyond what would be considered noise.

Some of the goals of unsupervised learning are as follows:

  • Discovering useful structures in large data sets without requiring a target desired output
  • Improving learning speed for inputs
  • Building a model of the data vectors by assigning a score or probability to each possible data vector

Reinforcement learning is the problem of getting an agent to act in the world so as to maximize its rewards. It is about what to do and how to map situations to actions so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. The two most important distinguishing features of reinforcement learning are trial and error and search and delayed reward. Some examples of reinforcement learning are as follows:

  • A chess player making a move, the choice is informed both by planning anticipating possible replies and counter replies.
  • An adaptive controller adjusts parameters of a petroleum refinery's operation in real time. The controller optimizes the yield/cost/quality trade-off on the basis of specified marginal costs without sticking strictly to the set points originally suggested by engineers.
  • A gazelle calf struggles to its feet minutes after being born. Half an hour later it is running at 20 miles per hour.
  • Teaching a dog a new trick--one cannot tell it what to do, but one can reward/punish it if it does the right/wrong thing. It has to figure out what it did that made it get the reward/punishment, which is known as the credit assignment problem.

Reinforcement learning is like trial and error learning. The agent should discover a good policy from its experiences of the environment without losing too much reward along the way. Exploration is about finding more information about the environment while Exploitation exploits known information to maximize reward. For example:

  • Restaurant selection: Exploitation; go to your favorite restaurant. Exploration; try a new restaurant.
  • Oil drilling:Exploitation; drill at the best-known location. Exploration; drill at a new location.

Major components of reinforcement learning are as follows:

  • Policy: This is the agent's behavior function. It determines the mapping from perceived states of the environment to actions to be taken when in those states. It corresponds to what in psychology would be called a set of stimulus-response rules or associations.
  • Value Function: This is a prediction of future reward. The value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. Whereas rewards determine the immediate, intrinsic desirability of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow, and the rewards available in those states.
  • Model: The model predicts what the environment will do next. It predicts the next state and the immediate reward in the next state.

Structured prediction is an important area of application for machine learning problems in a variety of domains. Considering an input x and an output y in areas such as a labeling of time steps, a collection of attributes for an image, a parsing of a sentence, or a segmentation of an image into objects, problems are challenging because the y's are exponential in the number of output variables that comprise it. These are computationally challenging because prediction requires searching an enormous space, and also statistical considerations, since learning accurate models from limited data requires reasoning about commonalities between distinct structured outputs. Structured prediction is fundamentally a problem of representation, where the representation must capture both the discriminative interactions between x and y and also allow for efficient combinatorial optimization over y.

Structured prediction is about predicting structured outputs from input data in contrast to predicting just a single number, like in classification or regression. For example:

  • Natural language processing--automatic translation (output: sentences) or sentence parsing (output: parse trees)
  • Bioinformatics--secondary structure prediction (output: bipartite graphs) or enzyme function prediction (output: path in a tree)
  • Speech processing--automatic transcription (output: sentences) or text to speech (output: audio signal)
  • Robotics--planning (output: sequence of actions)

Neural networks represent a brain metaphor for information processing. These models are biologically inspired rather than an exact replica of how the brain actually functions. Neural networks have been shown to be very promising systems in many forecasting applications and business classification applications due to their ability to learn from the data.

The artificial neural network learns by updating the network architecture and connection weights so that the network can efficiently perform a task. It can learn either from available training patterns or automatically learn from examples or input-output relations. The learning process is designed by one of the following:

  • Knowing about available information
  • Learning the paradigm--having a model from the environment
  • Learning rules--figuring out the update process of weights
  • Learning the algorithm--identifying a procedure to adjust weights by learning rules

There are four basic types of learning rules:

  • Error correction rules
  • Boltzmann
  • Hebbian
  • Competitive learning

Deep learning refers to a rather wide class of machine learning techniques and architectures, with the hallmark of using many layers of non-linear information processing that are hierarchical in nature. There are broadly three categories of deep learning architecture:

  • Deep networks for unsupervised or generative learning
  • Deep networks for supervised learning
  • Hybrid deep networks