상세 컨텐츠

본문 제목

로지스틱 회귀(Logistic Regression) 2

인공지능/머신러닝

by 2^7 2022. 6. 7. 17:55

본문

Binary Classification

1. 탐색적 데이터 분석

1-1. 빈도분석

DF.default.value_counts()

No 9667

Yes 333

Name: default, dtype: int64


1-2. 분포 시각화

import matplotlib.pyplot as plt

plt.figure(figsize = (9, 6))
plt.boxplot([DF[DF.default == 'No'].balance,
             DF[DF.default == 'Yes'].balance],
            labels = ['No', 'Yes'])
plt.show()


2. Data Preprocessing

2-1. Standardization

X = DF[['balance']]
y = DF['default']
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_Scaled = scaler.fit_transform(X)

X_Scaled[:5]


2-2. Train & Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_Scaled, y,
                                                    test_size = 0.3,
                                                    random_state = 2045)

print('Train Data : ', X_train.shape, y_train.shape)
print('Test Data : ', X_test.shape, y_test.shape)

Train Data : (7000, 1) (7000,)

Test Data : (3000, 1) (3000,)


3. Modeling

3-1. Train_Data로 모델 생성

from sklearn.linear_model import LogisticRegression

Model_lr = LogisticRegression()
Model_lr.fit(X_train, y_train)


3-2. Test_Data에 Model 적용

y_hat = Model_lr.predict(X_test)
y_hat

array(['No', 'No', 'No', ..., 'No', 'No', 'No'], dtype=object)


4. Model Validation

4-1. Accuracy

#Train Accuracy

Model_lr.score(X_train, y_train)

0.9724285714285714

#Test Accuracy

Model_lr.score(X_test, y_test)

0.9736666666666667


4-2. Confusion Matrix

# 'No'(상환) 기준

from sklearn.metrics import confusion_matrix

confusion_matrix(y_test, y_hat)

#'Yes'(연체) 기준

from sklearn.metrics import confusion_matrix

confusion_matrix(y_test, y_hat, labels = ['Yes','No'])


4-3. Accuracy, Precision, Recall - 'No(상환)'

from sklearn.metrics import accuracy_score, precision_score, recall_score

print(accuracy_score(y_test, y_hat))
print(precision_score(y_test, y_hat, pos_label = 'No'))
print(recall_score(y_test, y_hat, pos_label = 'No'))

0.9736666666666667

0.9756838905775076

0.9975828729281768


4-4. Accuracy, Precision, Recall - 'Yes(연체)'

from sklearn.metrics import accuracy_score, precision_score, recall_score

print(accuracy_score(y_test, y_hat))
print(precision_score(y_test, y_hat, pos_label = 'Yes'))
print(recall_score(y_test, y_hat, pos_label = 'Yes'))

0.9736666666666667

0.8205128205128205

0.3076923076923077


4-5. F1_Score - 'No(상환)'

from sklearn.metrics import f1_score

f1_score(y_test, y_hat, pos_label = 'No')

0.9865118661430767


4-6. F1_Score - 'Yes(연체)'

from sklearn.metrics import f1_score

f1_score(y_test, y_hat, pos_label = 'Yes')

0.44755244755244755


4-7. Classification Report

from sklearn.metrics import classification_report

print(classification_report(y_test, y_hat, 
                            target_names = ['No', 'Yes'],
                            digits = 5))

 

728x90

관련글 더보기