인공지능/머신러닝

회귀분석(Regression Analysis) 3

2^7 2022. 6. 7. 14:55

Encoding

문자형 변수를 숫자형 변수로 인코딩

  • Integer Encoding
    • from sklearn.preprocessing import LabelEncoder
    • 문자형 변수를 숫자형 변수로 변경하여 변수 연산 범위를 확대
    • ‘europe’ : 0, ‘korea’ : 1, ‘usa’ : 2
  • One-Hot Encoding
    • from sklearn.preprocessing import OneHotEncoder
    • 하나의 값만 True(1)이고 나머지 값은 False(0)인 인코딩

1.  Data Set

import seaborn as sns

DF = sns.load_dataset('mpg')
DF.head()

type(DF.origin[0])

str

DF.origin.value_counts()

usa 249

japan 79

europe 70

Name: origin, dtype: int64

X = DF[['origin']]
X[111:115]

2. With LabelEncoder

#정수(Integer) 인코딩

from sklearn.preprocessing import LabelEncoder

encoder1 = LabelEncoder()
LE = encoder1.fit_transform(X)
#정수 인코딩 결과

LE[111:115]

array([1, 2, 2, 0])

 

3. With OneHotEncoder

from sklearn.preprocessing import OneHotEncoder 

encoder2 = OneHotEncoder()
OHE = encoder2.fit_transform(X)
#Array 변환 필요

print(OHE[111:115])

OHE.toarray()[111:115]

 

728x90