Machine Learning #5 — Linear Classifiers, Logistic Regression, Regularization

4 min readJul 18, 2022

Hello,
If you are not interested in the Turkish version of this article, you can skip this paragraph.

Bu yazı, öğrendiklerimi pekiştirmek ve paylaşmak için yazdığım Makine Öğrenimi serimin beşinci makalesidir. Serinin Software Development Türkiye Medium yayınında yayınlanan dördüncü yazısı için: link

Makalenin Türkçe versiyonu için: link

Hello, in the fifth article of the Machine Learning series;

Recognize linear classifiers,
We will examine logistic regression and related concepts with examples.

What is Logistic Regression?

In the third of our article series, we touched on the topic of Linear Regression. We then used this method to create a house price estimator. The concepts of Linear Regression and Logistic Regression should not be confused with each other.

Logistic Regression is a Linear Classifier. A statistical method used to analyze a dataset with one or more independent variables that determine a class. The outcome is measured with a binary variable (there are only two possible outcomes). That instance may or may not belong to the class we are looking for (it belongs to one of the other classes).

For example, let’s take our house price estimator again. We had a feature class, some of which was point (for example, the number of rooms in the house) and some continuous (for example, the square meter of the house). Based on these features, we were trying to predict the house price feature, which is also a continuous variable.

In Logistic Regression applications, the feature set still contains continuous values. However, our goal is to specify a class instead of a continuous variable. We also use a threshold value when determining this class. For example, if the price of the house is below $50,000 it is cheap (we can represent it with y=0), if it is over $50,000 it is expensive (we can represent it with y=1).

We can start by importing the libraries we will use.

Decision Boundary, Linear Classifier, Linear Separability

Before continuing with the theoretical explanation, let’s see what a Linear Classifier looks like visually and how it works.

Although there were a small number of incorrectly predicted samples, our classifier was able to divide this dataset into two classes with a linear line. This means that our classifier is a Linear Classifier and our dataset is Linear Separable.

Logistic Regression Application

For practice, we will take a telecom company’s customer dataset and predict which customer is churn or not.

churn: It means that the customer is no longer a customer of that company.

Dataset source: https://archive.ics.uci.edu/ml/datasets/Iranian+Churn+Dataset

After assigning our dataset to the variable, we classically continue with the exploratory data analysis parts.

Our data set includes features such as how long a telecom customer has subscribed, frequency of SMS, number of calls made, and how big/small the bill amount is. The Churn property here is our target variable. We will ask our model to predict this variable as y=0 (Still a customer of the company) or y=1 (No longer a customer).

In the preprocessing step of our dataset, scaling should not be skipped. Because, when we compare the values of the features such as maximum, mean, we see that they can be quite different from each other.

With Grid search, we saw that the optimal C parameter is 0.1. So how does our regression model decide this? Let’s visualize it.

When we rank both the train and test errors according to the value of the C parameter, we see that the first and largest downside break is 0.1. After that, the slope of the lines starts to rise again. Therefore, we can say that the optimum value of parameter C is 0.1.

While trying to find the most suitable parameters, we determined the ‘random_state’ and ‘penalty’ values ourselves. However, we made a total of 6 trials for the ‘C’ value and determined the most appropriate value to be 0.1. So what is this C parameter?

Regularization Effect in Logistic Regression

The C parameter represents the inverse of regularization for our scikit-learn library. Bigger C means less regularization, smaller C means more regularization.

In previous articles, we mentioned that regularization is “a penalty function for large coefficients”. By examining the l1 and l2 regularizations separately, we observed how they affect feature selection.

Here, too, we compared the training errors for two different C values. We found that as regularization increases (for smaller C value), training success decreases. Excessive use of the penalty function causes some important coefficients to be smaller than they should be and become more dysfunctional. This reduces training success.

When we examine this effect on the test data, we see that high regularization increases the test success. This is because, as we mentioned earlier, the coefficients for dysfunctional features can be reset, which improves performance.

note: If you leave the solver at default, you will get an error.

When we use l1 for editing, we see that the coefficients of many features that will not work for us are set to zero. This increases test success.

penalty is ‘l2’ by default.

l2 shrinks the coefficients as we would expect from a regularization.

With this regularization, we saw that the model success was 89.2%. Finally, let’s plot the ROC curve.

A ROC curve (Receiver Operating Characteristic) is another way to view model performance. It is a visualization of truly positive predicted values (True Positive) and falsely positive predicted values. Ideally we would like to see a curve that approaches the upper left corner.

For our example, we see customers who churn correctly on the left, and customers who are predicted as churn on the right axis, although they do not churn. Note that the AUC(Area Under Curve) value, which represents the area under the graph, is the same as accuracy.

See you in our next articles.

Machine Learning #5 — Linear Classifiers, Logistic Regression, Regularization

What is Logistic Regression?

Decision Boundary, Linear Classifier, Linear Separability

Logistic Regression Application

Regularization Effect in Logistic Regression

Written by Göker Güner