# Linear Regression with Sklearn

25 November 2020

```
import pandas as pd
import seaborn as sns
DATA_PATH = '../input/iris-flower-dataset/IRIS.csv'
df = pd.read_csv(DATA_PATH)
df.head(1)
```

sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|

0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |

A Linear Regression model tries to fit a straight line to the given data set, to train a linear regression model with `sklearn`

you will need to first split the data into `x`

and `y`

components:

```
x_labels = ['sepal_length', 'petal_width']
y_labels = ['petal_length']
np_x = df[x_labels].to_numpy()
np_y = df[y_labels].transpose().to_numpy()[0]
```

Next, we need to split our training and testing data

```
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(np_x, np_y)
```

Once we've got our data split into `test`

and `train`

sets, we can train our model using the `test`

data using the `fit`

function of the `LinearRegression`

model

```
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(x_train, y_train)
```

Lastly, we can use `model.score`

to get the $R^2$ value for the model

```
model.score(x_test, y_test)
```

```
0.931396085809373
```

You can find out more about how the

`LinearRegression`

model works in the Sklearn Docs, and you can find an interactive version of this notebook on Kaggle