로지스틱 회귀(Logistic Regression) 2

by 2^7 2022. 6. 7. 17:55

Binary Classification

1. 탐색적 데이터 분석

1-1. 빈도분석

DF.default.value_counts()

No 9667

Yes 333

Name: default, dtype: int64

1-2. 분포 시각화

import matplotlib.pyplot as plt

plt.figure(figsize = (9, 6))
plt.boxplot([DF[DF.default == 'No'].balance,
             DF[DF.default == 'Yes'].balance],
            labels = ['No', 'Yes'])
plt.show()

2. Data Preprocessing

2-1. Standardization

X = DF[['balance']]
y = DF['default']

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_Scaled = scaler.fit_transform(X)

X_Scaled[:5]

2-2. Train & Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_Scaled, y,
                                                    test_size = 0.3,
                                                    random_state = 2045)

print('Train Data : ', X_train.shape, y_train.shape)
print('Test Data : ', X_test.shape, y_test.shape)

Train Data : (7000, 1) (7000,)

Test Data : (3000, 1) (3000,)

3. Modeling

3-1. Train_Data로 모델 생성

from sklearn.linear_model import LogisticRegression

Model_lr = LogisticRegression()
Model_lr.fit(X_train, y_train)

3-2. Test_Data에 Model 적용

y_hat = Model_lr.predict(X_test)

y_hat

array(['No', 'No', 'No', ..., 'No', 'No', 'No'], dtype=object)

4. Model Validation

4-1. Accuracy

#Train Accuracy

Model_lr.score(X_train, y_train)

0.9724285714285714

#Test Accuracy

Model_lr.score(X_test, y_test)

0.9736666666666667

4-2. Confusion Matrix

# 'No'(상환) 기준

from sklearn.metrics import confusion_matrix

confusion_matrix(y_test, y_hat)

#'Yes'(연체) 기준

from sklearn.metrics import confusion_matrix

confusion_matrix(y_test, y_hat, labels = ['Yes','No'])

4-3. Accuracy, Precision, Recall - 'No(상환)'

from sklearn.metrics import accuracy_score, precision_score, recall_score

print(accuracy_score(y_test, y_hat))
print(precision_score(y_test, y_hat, pos_label = 'No'))
print(recall_score(y_test, y_hat, pos_label = 'No'))

0.9736666666666667

0.9756838905775076

0.9975828729281768

4-4. Accuracy, Precision, Recall - 'Yes(연체)'

from sklearn.metrics import accuracy_score, precision_score, recall_score

print(accuracy_score(y_test, y_hat))
print(precision_score(y_test, y_hat, pos_label = 'Yes'))
print(recall_score(y_test, y_hat, pos_label = 'Yes'))

0.9736666666666667

0.8205128205128205

0.3076923076923077

4-5. F1_Score - 'No(상환)'

from sklearn.metrics import f1_score

f1_score(y_test, y_hat, pos_label = 'No')

0.9865118661430767

4-6. F1_Score - 'Yes(연체)'

from sklearn.metrics import f1_score

f1_score(y_test, y_hat, pos_label = 'Yes')

0.44755244755244755

4-7. Classification Report

from sklearn.metrics import classification_report

print(classification_report(y_test, y_hat, 
                            target_names = ['No', 'Yes'],
                            digits = 5))

728x90

저작자표시 비영리 변경금지 (새창열림)

'인공지능 > 머신러닝' 카테고리의 다른 글

의사결정 나무(Decision Tree) (0)	2022.06.08
로지스틱 회귀(Logistic Regression) 3 (0)	2022.06.07
로지스틱 회귀(Logistic Regression) 1 (0)	2022.06.07
회귀분석(Regression Analysis) 4 (0)	2022.06.07
회귀분석(Regression Analysis) 3 (2)	2022.06.07

2^7

고정 헤더 영역

메뉴 레이어

메뉴 리스트

검색 레이어

검색 영역

상세 컨텐츠

본문 제목

본문

Binary Classification

'인공지능 > 머신러닝' 카테고리의 다른 글

관련글 더보기

추가 정보

인기글

최신글

티스토리툴바