1. Decision tree
According to some features (features) for classification, each node asks a question, through judgment, the data is divided into two categories, and then continues to ask questions. These problems are learned based on existing data, and when new data is input, the data can be divided into appropriate leaves according to the problems on the tree.
2. Random Forest
Random forest is a subclass of ensemble learning, which relies on the voting selection of decision trees to determine the final classification result. Ensemble learning solves a single prediction problem by building a combination of several models. The simple principle of ensemble learning is to generate multiple classifiers/models, each of which learns and makes predictions independently. These predictions are finally combined into a single prediction, thus outperforming any single-class prediction.
The construction process of the random forest:
Assuming that N represents the number of training cases (samples) and M represents the number of features, the construction process of the random forest is as follows:
The number of input features m is used to determine the decision result of a node on the decision tree; where m should be much smaller than M.
Sampling N times from N training cases (samples) with replacement to form a training set, and use the unsampled use cases (samples) as predictions to uate their errors.
For each node, m features are randomly selected, and the decision of each node on the decision tree is determined based on these features. According to the m features, calculate the best splitting method.
Each tree grows completely without pruning, which may be used after building a normal tree classifier.
Repeat the above steps to build another decision tree until a predetermined number of decision trees is reached, that is, the random forest is constructed. Among them, the number of preselected variables (m) and the number of trees in the random forest are important parameters, which are very critical to the tuning of the system. These parameters also play a crucial role in tuning the accuracy of the random forest model. The scientific use of these indicators will significantly improve the efficiency of the random forest model.
3. Logistic regression
Basically, logistic regression models are a member of the family of supervised classification algorithms. Logistic regression measures the relationship between dependent and independent variables by estimating probabilities using a logistic function.
Logistic regression is similar to linear regression, but the result of logistic regression can only have two values. If linear regression is predicting an open number, logistic regression is more of a yes-or-no question.
The Y value in the logistic function ranges from 0 to 1. It is a probability value. Logistic functions are usually S-shaped, and the curve divides the graph into two regions, making it suitable for classification tasks.
For example, the logistic regression graph above shows the relationship between the probability of passing the test and the study time, which can be used to predict whether the test can be passed.
4. Linear regression
The so-called linear regression is a statistical analysis method that uses regression analysis in mathematical statistics to determine the interdependent quantitative relationship between two or more variables.
Linear regression is probably the most popular machine learning algorithm. It attempts to represent independent variables (x values) and numerical outcomes (y values) by fitting a straight line equation to this data. This line can then be used to predict future values!
The most commonly used technique for this algorithm is the least squares method. This method calculates the best fit line that minimizes the vertical distance from each data point on the line. The total distance is the sum of the squares of the vertical distances (green line) of all data points. The idea is to fit a model by minimizing this squared error or distance.
5. Naive Bayes
Naive Bayes is based on Bayes' theorem, the relationship between two conditions. It measures the probability of each class, and the conditional probability of each class gives the value of x. This algorithm is used for classification problems and gets a binary "yes/no" result.
The Naive Bayes classifier is a popular statistical technique whose classic application is spam filtering.
6. Neural Networks
Neural Networks fit an input that may fall into at least two categories: NNs consist of several layers of neurons and the connections between them. The first layer is the input layer and the last layer is the output layer. Both the hidden layer and the output layer have their own classifiers.
The input is input into the network, activated, the calculated score is passed to the next layer, the subsequent neural layer is activated, and finally, the scores on the nodes of the output layer represent the scores belonging to each category. The example in the figure below obtains the classification result as class 1; The same input is transmitted to different nodes, and the reason for getting different results is that each node has different weights and biases, which is forward propagation.
K-means are clustered by classifying a dataset. For example, this algorithm can be used to group users based on purchase history. It finds K clusters in the dataset. K-means is used for unsupervised learning, so we just use the training data X, and the number of clusters K we want to identify.
First, divide a set of data into three categories, the pink value is large, and the yellow value is small. Initialize first, and choose the simplest 3.2.1 as the initial value of each type. In the remaining data, each distance is calculated from the three initial values and then classified into the category of the initial value closest to it.
Widely used in fraud detection, such as medical insurance and insurance fraud detection
8. Support Vector Machines
To separate the two types and want to get a hyperplane, the optimal hyperplane is to maximize the margin of the two types, and the margin is the distance between the hyperplane and the closest point to it.
is a supervised algorithm for classification problems. SVM tries to draw two lines between data points with the largest margin between them. To do this, we plot the data items as points in an n-dimensional space, where n is the number of input features. Based on this, the SVM finds an optimal boundary, called a hyperplane, which best separates possible outputs by class labels.
9. K-nearest neighbor algorithm
When new data is given, among the k points closest to it, whichever category has more categories, the data belongs to which category.
Example: To distinguish between "cats" and "dogs", if you judge by the two features "claws" and "sound", circles and triangles are known to be classified, so which category does this "star" represent?
10. Dimensionality reduction
Dimensionality reduction attempts to solve this problem by combining specific features into higher-level features without losing the most important information. The principal component analysis is the most popular dimensionality reduction technique.
PCA reduces the dimensionality of a dataset by compressing it into low-dimensional lines or hyperplanes/subspaces. This preserves the salient features of the original data as much as possible.