Linear Regression with Sklearn

25 November 2020

import pandas as pd
import seaborn as sns

DATA_PATH = '../input/iris-flower-dataset/IRIS.csv'

df = pd.read_csv(DATA_PATH)

df.head(1)

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	Iris-setosa

A Linear Regression model tries to fit a straight line to the given data set, to train a linear regression model with sklearn you will need to first split the data into x and y components:

x_labels = ['sepal_length', 'petal_width']
y_labels = ['petal_length']

np_x = df[x_labels].to_numpy()
np_y = df[y_labels].transpose().to_numpy()[0]

Next, we need to split our training and testing data

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(np_x, np_y)

Once we've got our data split into test and train sets, we can train our model using the test data using the fit function of the LinearRegression model

from sklearn.linear_model import LinearRegression

model = LinearRegression().fit(x_train, y_train)

Lastly, we can use model.score to get the $R^2$ value for the model

model.score(x_test, y_test)

0.931396085809373

You can find out more about how the LinearRegression model works in the Sklearn Docs, and you can find an interactive version of this notebook on Kaggle