Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I would like to find the nearest data point around one minute before the current data point in a for-loop.
Dataframe looks like this:
Timestamp Value
36 2000-11-08 23:30:40.370 45.5
47 2000-11-13 04:52:29.410 44.5
67 2000-12-01 22:17:50.300 42.5
129 2000-11-24 00:57:11.950 43.0
176 2000-12-03 01:40:16.250 42.0
246 2000-11-12 07:32:54.000 43.5
281 2000-11-30 21:13:07.630 45.5
335 2000-11-30 20:43:11.050 43.5
The for-loop looks like this:
for i in range(1, len(df.index)-1):
if some_condition:
current_data_time = df.loc[i, "Timestamp"]
one_min_ago = df.loc[i, 'Timestamp'] - datetime.timedelta(minutes = 1)
nearest_data_one_minute_ago = df.loc[["Timestamp"] == one_min_ago, "Value"]
How do I find nearest_data_one_minute_ago
, if there is not a value at exactly one_min_ago
?
Edit:
Expected output:
Timestamp Value nearest_data_one_minute_ago
36 2000-11-08 23:30:40.370 45.5 NaN
47 2000-11-13 04:52:29.410 44.5 (should be `Value` at `2000-11-13 04:51:29.410`, if there isn't, find the nearest, before or after doesn't matter)
67 2000-12-01 22:17:50.300 42.5 (should be `Value` at `2000-12-01 22:16:50.300`, if there isn't, find the nearest, ...)
129 2000-11-24 00:57:11.950 43.0 (should be `Value` at `2000-11-24 00:56:11.950`, if there isn't, find the nearest, ...)
176 2000-12-03 01:40:16.250 42.0 (should be `Value` at `2000-12-03 01:39:16.250`, if there isn't, find the nearest, ...)
246 2000-11-12 07:32:54.000 43.5 (should be `Value` at `2000-11-12 07:31:54.000`, if there isn't, find the nearest, ...)
281 2000-11-30 21:13:07.630 45.5 (should be `Value` at `2000-11-30 21:12:07.630`, if there isn't, find the nearest, ...)
335 2000-11-30 20:43:11.050 43.5 (should be `Value` at `2000-11-30 20:42:11.050`, if there isn't, find the nearest, ...)
Please note that the sample data is just an example for data format where in reality there is more data with smaller time intervals.
–
–
–
If need match by closest values to 1 minute is possible use merge_asof
with same DataFrame with parameters allow_exact_matches=False
for prevent matching same rows and tolerance
for specify 1 minute:
print (df)
Timestamp Value
36 2000-11-08 23:30:40.370 45.5
47 2000-11-08 23:30:29.410 44.5 <- change data for match
67 2000-12-01 22:17:50.300 42.5
129 2000-11-24 00:57:11.950 43.0
176 2000-12-03 01:40:16.250 42.0
246 2000-11-12 07:32:54.000 43.5
281 2000-11-30 21:13:07.630 45.5
335 2000-11-30 20:43:11.050 43.5
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.sort_values('Timestamp')
df = pd.merge_asof(df, df,
on='Timestamp',
allow_exact_matches=False,
tolerance=pd.Timedelta(1, 'min'))
print (df)
Timestamp Value_x Value_y
0 2000-11-08 23:30:29.410 44.5 NaN
1 2000-11-08 23:30:40.370 45.5 44.5
2 2000-11-12 07:32:54.000 43.5 NaN
3 2000-11-24 00:57:11.950 43.0 NaN
4 2000-11-30 20:43:11.050 43.5 NaN
5 2000-11-30 21:13:07.630 45.5 NaN
6 2000-12-01 22:17:50.300 42.5 NaN
7 2000-12-03 01:40:16.250 42.0 NaN
–
–
–
–
I know you are looking for a for-loop solution. Here's something you can do without a for loop.
import pandas as pd
import numpy as np
c = ['Timestamp','Value']
d = [['2000-11-08 23:30:40.370', 45.5],
['2000-11-13 04:52:00.410', 31.5], #modified
['2000-11-13 04:52:15.410', 32.5], #modified 15 secs from 2000-11-13 04:52:00.410
['2000-11-13 04:52:30.410', 33.5], #modified 30 secs from 2000-11-13 04:52:00.410
['2000-11-13 04:52:45.410', 34.5], #modified 45 secs from 2000-11-13 04:52:00.410
['2000-11-13 04:53:00.410', 35.5], #modified 1 min from 2000-11-13 04:52:00.410
['2000-11-13 04:53:15.410', 36.5], #modified new set within 1 min of 2000-11-13 04:52:15.410
['2000-12-01 22:17:50.300', 42.5],
['2000-11-24 00:57:11.950', 43.0],
['2000-12-03 01:40:16.250', 42.0],
['2000-11-12 07:32:54.000', 43.5],
['2000-11-30 21:13:07.630', 45.5],
['2000-11-30 20:43:11.050', 43.5]]
df = pd.DataFrame(d,columns=c)
#convert timestamp to datetime format
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
#sort dataframe by Timestamp
df = df.sort_values('Timestamp')
df['Diff_Next'] = ((pd.to_datetime(df['Timestamp'].shift(-1)) - pd.to_datetime(df['Timestamp'])).dt.total_seconds() / 60)
df['Diff_Prev'] = df['Diff_Next'].shift(1)
df.loc[df['Diff_Next'] <= 1,'Nearest_Value'] = df.Value.shift(-1).fillna(0)
df.loc[df['Diff_Prev'] <= 1,'Nearest_Value'] = df.Value.shift(1).fillna(0)
print (df)
This will check if previous timestamp is within 1 min. If so, it will use that value else, it will look for next timestamp and if that's within 1 min, then it will use that value. If none, it will assign NaN.
I need to check for next before prev. That will ensure prev value takes precedence.
The output of the modified dataframe will be:
Timestamp Value Diff_Next Diff_Prev Nearrest_Value
0 2000-11-08 23:30:40.370 45.5 4802.227167 NaN NaN
10 2000-11-12 07:32:54.000 43.5 1279.106833 4802.227167 NaN
1 2000-11-13 04:52:00.410 31.5 0.250000 1279.106833 NaN
2 2000-11-13 04:52:15.410 32.5 0.250000 0.250000 31.5
3 2000-11-13 04:52:30.410 33.5 0.250000 0.250000 32.5
4 2000-11-13 04:52:45.410 34.5 0.250000 0.250000 33.5
5 2000-11-13 04:53:00.410 35.5 0.250000 0.250000 34.5
6 2000-11-13 04:53:15.410 36.5 15603.942333 0.250000 35.5
8 2000-11-24 00:57:11.950 43.0 9825.985000 15603.942333 NaN
12 2000-11-30 20:43:11.050 43.5 29.943000 9825.985000 NaN
11 2000-11-30 21:13:07.630 45.5 1504.711167 29.943000 NaN
7 2000-12-01 22:17:50.300 42.5 1642.432500 1504.711167 NaN
9 2000-12-03 01:40:16.250 42.0 NaN 1642.432500 NaN
Note here that the values in Diff_Prev
and Diff_Next
is in whole number representation. So 0.25 is 15 min (quarter of a minute).
If you still want to iterate through the dataframe using loops, then use this solution. I strongly recommend NOT doing this.
for i,row in df.iterrows():
if row['Diff_Prev'] <= 1:
df.loc[i,'Nearrest_Value'] = df.loc[i-1,'Value']
elif row['Diff_Prev'] <= 1:
df.loc[i,'Nearrest_Value'] = df.loc[i+1,'Value']
You will get same results. The good part of this loop is that you don't have to worry about first row and last row. The NaN value in Diff_Prev
and Diff_Next
will resolve itself due to the if
and elif
statements.
Let me know if you are looking for the value that's closest to 1 min window between current row and any previous rows.
For example if the data is as shown below,
['2000-11-13 04:52:00.410', 44.5],
['2000-11-13 04:52:15.410', 44.5],
['2000-11-13 04:52:30.410', 44.5],
['2000-11-13 04:52:45.410', 44.5],
['2000-11-13 04:53:00.410', 44.5],
['2000-11-13 04:53:15.410', 44.5],
Do you want 04:52:45
to be the closest 1 min difference to 04:52:00
and ignore other values like 04:52:15
and 04:52:30
?
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.