Linear Regression with Sklearn
25 November 2020
import pandas as pd
import seaborn as sns
DATA_PATH = '../input/iris-flower-dataset/IRIS.csv'
df = pd.read_csv(DATA_PATH)
df.head(1)
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
A Linear Regression model tries to fit a straight line to the given data set, to train a linear regression model with sklearn
you will need to first split the data into x
and y
components:
x_labels = ['sepal_length', 'petal_width']
y_labels = ['petal_length']
np_x = df[x_labels].to_numpy()
np_y = df[y_labels].transpose().to_numpy()[0]
Next, we need to split our training and testing data
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(np_x, np_y)
Once we've got our data split into test
and train
sets, we can train our model using the test
data using the fit
function of the LinearRegression
model
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(x_train, y_train)
Lastly, we can use model.score
to get the value for the model
model.score(x_test, y_test)
0.931396085809373
You can find out more about how the
LinearRegression
model works in the Sklearn Docs, and you can find an interactive version of this notebook on Kaggle