[Python 데이터 분석] 판다스(Pandas Timestamp, Period로 시계열 데이터 다루기

파이썬/파이썬 데이터 분석
2018. 12. 15. 17:33

| 판다스(Pandas) Timestamp, Period

판다스(Pandas)에서는 Timestamp, Period를 이용하여 시계열 데이터를 쉽게 다룰 수 있는 기능을 제공합니다. 다음 예제는 그 기능들을 이용하여 시계열 데이터를 만들고 처리하는 파이썬 코드들입니다.

import pandas as pd
import numpy as np

time1 = pd.Timestamp('9/1/2016 10:05AM')
print(time1)
'''
2016-09-01 10:05:00
'''
period = pd.Period('3/5/2016')
print(period)
'''
2016-03-05
'''
# timestamp 인덱스
t1 = pd.Series(list('abc'), [pd.Timestamp('2016-09-01'), pd.Timestamp('2016-09-02'), pd.Timestamp('2016-09-03')])
print(t1)
'''
2016-09-01    a
2016-09-02    b
2016-09-03    c
'''
# Period는 어떤 기간을 나타낸다고 볼 수 있다.
# Timestamp는 딱 그 시점, 어떤 특정한 시간을 나타낸다고 볼 수 있다.
#   p = pd.Period('2017-06-13')
#   test = pd.Timestamp('2017-06-13 22:11')
#   p.start_time < test < p.end_time
t2 = pd.Series(list('def'), [pd.Period('2016-09'), pd.Period('2016-10'), pd.Period('2016-11')])
print(t2)
'''
2016-09    d
2016-10    e
2016-11    f
'''
print(type(t2.index))
'''         
<class 'pandas.core.indexes.period.PeriodIndex'>
'''
d1 = ['2 June 2013', 'Aug 29, 2014', '2015-06-26', '7/12/16']
ts3 = pd.DataFrame(np.random.randint(10, 100, (4,2)), index=d1, columns=list('ab'))
print(ts3)
'''
               a   b
2 June 2013   10  90
Aug 29, 2014  87  96
2015-06-26    52  35
7/12/16       29  59
'''
# 전혀 다른 형식의 날짜를 나타내는 문자열이더라도
# Timestamp로 바꾸면 똑같은 포맷으로 맞출 수 있다.
ts3.index = pd.to_datetime(ts3.index)
print(ts3)
'''
            a   b
2013-06-02  78  76
2014-08-29  63  44
2015-06-26  34  91
2016-07-12  44  94
'''
print(pd.to_datetime('4.7.12', dayfirst=True))
'''
2012-07-04 00:00:00
'''
# Timestamp 끼리의 연산
delta = pd.Timestamp('9/3/2016') - pd.Timestamp('9/1/2016')
print(delta)
'''
2 days 00:00:00
'''
# delta를 추가한 연산
delta = pd.Timestamp('9/2/2016 8:10AM') + pd.Timedelta('12D 3H')
print(delta)
'''
2016-09-14 11:10:00
'''
# periods 만큼의 수와 freq만큼의 빈도수로 date들의 집합을 구할 수 있다.
dates = pd.date_range('10-01-2016', periods=9, freq='2W-SUN')
print(dates)
'''
DatetimeIndex(['2016-10-02', '2016-10-16', '2016-10-30', '2016-11-13',
               '2016-11-27', '2016-12-11', '2016-12-25', '2017-01-08',
               '2017-01-22'],
              dtype='datetime64[ns]', freq='2W-SUN')
'''
# cumsum 은 각 원소들의 누적합을 표시
# a = np.array([1,2,3,4,5,6])
# print(a.cumsum())
# [ 1  3  6 10 15 21]
df = pd.DataFrame({'Count 1': 100 + np.random.randint(-5, 10, 9).cumsum(),
                   'Count 2': 120 + np.random.randint(-5, 10, 9)}, index=dates)
print(df)
'''
                Count 1  Count 2
2016-10-02       99      118
2016-10-16       95      126
2016-10-30       92      128
2016-11-13       93      122
2016-11-27       90      124
2016-12-11       85      122
2016-12-25       85      125
2017-01-08       94      116
2017-01-22       92      129
'''
# timestamp들의 날짜를 볼 수 있는 명령어
print(df.index.weekday_name)
'''
Index(['Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday',
       'Sunday', 'Sunday'],
      dtype='object')
'''
print(df.diff())
'''
            Count 1  Count 2
2016-10-02      NaN      NaN
2016-10-16      4.0     -6.0
2016-10-30      8.0     13.0
2016-11-13      7.0     -1.0
2016-11-27      7.0     -5.0
2016-12-11      2.0      3.0
2016-12-25      7.0     -2.0
2017-01-08      6.0      2.0
2017-01-22     -2.0     -3.0
'''
# Timestamp를 월별로 통합한 뒤 그 후 평균을 구할 수 있다.
# groupby를 생각하면 편하다.
print(df.resample('M').mean())
'''
            Count 1  Count 2
2016-10-31  108.666667    119.0
2016-11-30  110.500000    124.5
2016-12-31  116.000000    122.0
2017-01-31  125.000000    122.5
'''
# 2017년인 timestamp를 조회할 수 있다.
print(df['2017'])
'''
            Count 1  Count 2
2017-01-08      141      122
2017-01-22      137      121
'''
# 2016년 12월인 timestamp를 조회할 수 있다.
print(df['2016-12'])
'''
            Count 1  Count 2
2016-12-11      136      125
2016-12-25      132      119
'''
# 2016년 12월 이후의 timestamp를 조회할 수 있다.
print(df['2016-12':])
'''
            Count 1  Count 2
2016-12-11      136      125
2016-12-25      132      119
2017-01-08      141      122
2017-01-22      137      121
'''
# 2주의 빈도로 timestamp를 만들어서 1주씩 빈 인덱스가 만들어졌다.
# 하지만 asfreq 메서드를 사용해서 이 빈 값들을 채워 넣을 수 있다.
# method를 통해서 forward fill 방식으로 채워 넣을 수 있다.
print(df.asfreq('W', method='ffill'))
'''
            Count 1  Count 2
2016-10-02      101      124
2016-10-09      101      124
2016-10-16      110      127
2016-10-23      110      127
2016-10-30      114      124
2016-11-06      114      124
2016-11-13      121      129
2016-11-20      121      129
2016-11-27      128      119
2016-12-04      128      119
2016-12-11      136      125
2016-12-18      136      125
2016-12-25      132      119
2017-01-01      132      119
2017-01-08      141      122
2017-01-15      141      122
2017-01-22      137      121
'''

import matplotlib.pyplot as plt

df.plot()
plt.show()

참고자료 : https://www.coursera.org/learn/python-data-analysis

저작자표시 비영리

'파이썬 > 파이썬 데이터 분석' 카테고리의 다른 글

[Python 데이터 분석] 가설 검정 (Hypothesis Testing) (6)	2018.12.26
[Python 데이터 분석] 판다스 분포 : 이항분포, 정규분포, 카이제곱분포 (Pandas Distribution : binomial, normal, uniform, chisquare) (1)	2018.12.19
[Python 데이터 분석] 파이썬 판다스 피벗 테이블(Python Pandas Pivot Table) (0)	2018.12.14
[Python 데이터 분석] 파이썬 카테고리 타입 및 cut을 이용한 범위 나누기(Python Category, cut method) (0)	2018.12.14
[Python 데이터 분석] groupby 집계함수 예제(Python groupby) (1)	2018.12.14