Data Science
-
Data Analysis with Python - Exploratory Data AnalysisData Science 2022. 8. 13. 17:57
Descriptive statistics: Giving summaries about the sample and measures of the data 판다스 라이브러리를 이용해 data summarizing 하는 방법: df.describe() # Any NaN data are automatically skipped in these statistics To return counts of unique values: value_counts() 표에 넣어서 도출하는 법: to_frame() Box plot: Scatter Plot: shows the relationship between two variables Predictor/independent variable on x axis Target variable..
-
Data Analysis with Python - Data WranglingData Science 2022. 8. 13. 17:28
Data Wrangling 이란 raw data를 보다 쉽게 분석할 수 있다고 정리하고 통합하는 과정이다 칼럼 추가하는 법 file_name = "파일.csv" df = pd.read.csv(file_name) df['column'] = df['column']+1 Missing data는 어떻게 처리하나? How to drop missing values? df.dropna() # df is the dataframe df.dropna(subset = ['price'], axis = 0, inplace = True) # 와 df = df.dropna(subset=['price'], axis=0) # 은 같다 Make sure that inplace = True How to replace missing val..
-
Data analysis with Python - Importing datasetData Science 2022. 8. 13. 17:12
Data analysis with Python by IBM 시리즈 - 1강 Importing dataset Python packages for Data Science Scientifics computing: Pandas / Numpy / SciPy Visualization: Matplotlib / Seaborn Algorithmic libraries(Linear Regresion등에 쓰임): Scikit-learn(머신러닝 라이브러리) / Statsmodels(Estimate statistical models, perform statistical test) Datatype 비교: Pandas vs Python dataframe.describe() 은 숫자가 아닌 columns은 생략한다. 때문에 stri..