Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I want to group a dataset and return the maximum and minimum timestamp. Here's my data
id timestamp
1 2017-09-17 10:09:01
2 2017-10-02 01:13:15
1 2017-09-17 10:53:07
1 2017-09-17 10:52:18
2 2017-09-12 21:59:40
Here's the output that i want
id max min
1 2017-09-17 10:53:07 2017-09-17 10:09:01
2 2017-10-02 01:13:15 2017-09-12 21:59:40
Here's what I did, the code seems not efficient, I hope theres better way to do this on pandas
data1 = df.sort_values('timestamp').drop_duplicates(['customer_id'], keep='last')
data2 = df.sort_values('timestamp').drop_duplicates(['customer_id'], keep='first')
data1['max'] = data1['timestamp']
data2['min'] = data2['timestamp']
data = data1.merge(data2, on = 'customer_id', how='left')
data = data.drop(['timestamp_x','timestamp_y'], axis=1)
It seems that pandas have this type of pivot
df = df.groupby('id')['timestamp'].agg(['min','max']).reset_index()
print (df)
id min max
0 1 2017-09-17 10:09:01 2017-09-17 10:53:07
1 2 2017-09-12 21:59:40 2017-10-02 01:13:15
Or a bit modify your solution (should be faster):
data = df.sort_values('timestamp')
data1 = data.drop_duplicates(['id'], keep='last').set_index('id')
data2 = data.drop_duplicates(['id'], keep='first').set_index('id')
df = pd.concat([data1['timestamp'], data2['timestamp']],keys=('max','min'), axis=1)
print (df)
max min
1 2017-09-17 10:53:07 2017-09-17 10:09:01
2 2017-10-02 01:13:15 2017-09-12 21:59:40
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.