1. 탐색적 데이터 분석
1-1. 빈도분석
DF.default.value_counts()
No 9667
Yes 333
Name: default, dtype: int64
1-2. 분포 시각화
import matplotlib.pyplot as plt
plt.figure(figsize = (9, 6))
plt.boxplot([DF[DF.default == 'No'].balance,
DF[DF.default == 'Yes'].balance],
labels = ['No', 'Yes'])
plt.show()
2. Data Preprocessing
2-1. Standardization
X = DF[['balance']]
y = DF['default']
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_Scaled = scaler.fit_transform(X)
X_Scaled[:5]
2-2. Train & Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_Scaled, y,
test_size = 0.3,
random_state = 2045)
print('Train Data : ', X_train.shape, y_train.shape)
print('Test Data : ', X_test.shape, y_test.shape)
Train Data : (7000, 1) (7000,)
Test Data : (3000, 1) (3000,)
3. Modeling
3-1. Train_Data로 모델 생성
from sklearn.linear_model import LogisticRegression
Model_lr = LogisticRegression()
Model_lr.fit(X_train, y_train)
3-2. Test_Data에 Model 적용
y_hat = Model_lr.predict(X_test)
y_hat
array(['No', 'No', 'No', ..., 'No', 'No', 'No'], dtype=object)
4. Model Validation
4-1. Accuracy
#Train Accuracy
Model_lr.score(X_train, y_train)
0.9724285714285714
#Test Accuracy
Model_lr.score(X_test, y_test)
0.9736666666666667
4-2. Confusion Matrix
# 'No'(상환) 기준
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_hat)
#'Yes'(연체) 기준
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_hat, labels = ['Yes','No'])
4-3. Accuracy, Precision, Recall - 'No(상환)'
from sklearn.metrics import accuracy_score, precision_score, recall_score
print(accuracy_score(y_test, y_hat))
print(precision_score(y_test, y_hat, pos_label = 'No'))
print(recall_score(y_test, y_hat, pos_label = 'No'))
0.9736666666666667
0.9756838905775076
0.9975828729281768
4-4. Accuracy, Precision, Recall - 'Yes(연체)'
from sklearn.metrics import accuracy_score, precision_score, recall_score
print(accuracy_score(y_test, y_hat))
print(precision_score(y_test, y_hat, pos_label = 'Yes'))
print(recall_score(y_test, y_hat, pos_label = 'Yes'))
0.9736666666666667
0.8205128205128205
0.3076923076923077
4-5. F1_Score - 'No(상환)'
from sklearn.metrics import f1_score
f1_score(y_test, y_hat, pos_label = 'No')
0.9865118661430767
4-6. F1_Score - 'Yes(연체)'
from sklearn.metrics import f1_score
f1_score(y_test, y_hat, pos_label = 'Yes')
0.44755244755244755
4-7. Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_hat,
target_names = ['No', 'Yes'],
digits = 5))
의사결정 나무(Decision Tree) (0) | 2022.06.08 |
---|---|
로지스틱 회귀(Logistic Regression) 3 (0) | 2022.06.07 |
로지스틱 회귀(Logistic Regression) 1 (0) | 2022.06.07 |
회귀분석(Regression Analysis) 4 (0) | 2022.06.07 |
회귀분석(Regression Analysis) 3 (2) | 2022.06.07 |