Data Analysis with Python - Data Wrangling

Data Science

Hiru_93 2022. 8. 13. 17:28

file_name = "파일.csv"
df = pd.read.csv(file_name)
df['column'] = df['column']+1

df.dropna() # df is the dataframe

df.dropna(subset = ['price'], axis = 0, inplace = True) # 와
df = df.dropna(subset=['price'], axis=0) # 은 같다

df.replace(missing_value, new_value)

보통은 평균값으로 값을 replace 한다

mean = df['column'].mean()
df['column'].replace(np.nan, mean)

Dataframe.astype can be used to convert a data type from one format to another. For instance int → float. Int and Float types are mostly used to convert the format to another.
When convert the dataset form "city_mpg" → "city-L/100km",

#명령어는 다음과 같다
df.rename(columns = {"city_mpg": "city-L/100km"}, inplace = True)

df["length"] = df["length"] / df["length"].max() # 와 같이 max 를 이용해 구할수 있음

(df["length"] - df["length"].min()) / (df["length"].max() - df["length"].min()

df["length"] = (df["length"] - df["length"].mean()) / df["length"].std() # 에서 mean은 데이터셋 평균을 구하고, std는 표준편차를 구함

Bins = np.linspace(min(df["price"]), max(df["price"]), 4)
Group_names = ["Low", "Medium", "High"]

df["price-binned"] = pd.cut(df["price"], bins, labels = group_names, include_lowest = True)

 pd.get_dummies(df['fuel'])