Credit

Code
import pandas as pd
import os
os.getcwd()
data_info = pd.read_csv(
    'c:\\Users\\HoraceTsai\\Documents\\Jupyter\\TensorFlow_FILES\DATA\\lending_club_info.csv',
    index_col='LoanStatNew'
    )
Code
print(data_info.loc['revol_util']['Description'])
Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
Code
def feat_info(col_name):
    print(data_info.loc[col_name]['Description'])
Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# might be needed depending on your version of Jupyter
%matplotlib inline
Code
df = pd.read_csv('c:\\Users\\HoraceTsai\\Documents\\Jupyter\\TensorFlow_FILES\DATA\\lending_club_loan_two.csv')
Code
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 396030 entries, 0 to 396029
Data columns (total 27 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   loan_amnt             396030 non-null  float64
 1   term                  396030 non-null  object 
 2   int_rate              396030 non-null  float64
 3   installment           396030 non-null  float64
 4   grade                 396030 non-null  object 
 5   sub_grade             396030 non-null  object 
 6   emp_title             373103 non-null  object 
 7   emp_length            377729 non-null  object 
 8   home_ownership        396030 non-null  object 
 9   annual_inc            396030 non-null  float64
 10  verification_status   396030 non-null  object 
 11  issue_d               396030 non-null  object 
 12  loan_status           396030 non-null  object 
 13  purpose               396030 non-null  object 
 14  title                 394274 non-null  object 
 15  dti                   396030 non-null  float64
 16  earliest_cr_line      396030 non-null  object 
 17  open_acc              396030 non-null  float64
 18  pub_rec               396030 non-null  float64
 19  revol_bal             396030 non-null  float64
 20  revol_util            395754 non-null  float64
 21  total_acc             396030 non-null  float64
 22  initial_list_status   396030 non-null  object 
 23  application_type      396030 non-null  object 
 24  mort_acc              358235 non-null  float64
 25  pub_rec_bankruptcies  395495 non-null  float64
 26  address               396030 non-null  object 
dtypes: float64(12), object(15)
memory usage: 81.6+ MB

Exploratory Data Analysis (EDA)

Code
sns.countplot(
    x= "loan_status",
    data= df)
#inbalanced outcomes