pandas - Find the nearest data about one minute ago in Python

link之家
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I would like to find the nearest data point around one minute before the current data point in a for-loop.
Dataframe looks like this:
    Timestamp               Value
36  2000-11-08 23:30:40.370 45.5
47  2000-11-13 04:52:29.410 44.5
67  2000-12-01 22:17:50.300 42.5
129 2000-11-24 00:57:11.950 43.0
176 2000-12-03 01:40:16.250 42.0
246 2000-11-12 07:32:54.000 43.5
281 2000-11-30 21:13:07.630 45.5
335 2000-11-30 20:43:11.050 43.5
The for-loop looks like this:
for i in range(1, len(df.index)-1):
    if some_condition: 
        current_data_time = df.loc[i, "Timestamp"] 
        one_min_ago = df.loc[i, 'Timestamp'] - datetime.timedelta(minutes = 1)
        nearest_data_one_minute_ago = df.loc[["Timestamp"] == one_min_ago, "Value"] 
How do I find nearest_data_one_minute_ago , if there is not a value at exactly one_min_ago?
Edit:
Expected output:
    Timestamp               Value   nearest_data_one_minute_ago
36  2000-11-08 23:30:40.370 45.5    NaN
47  2000-11-13 04:52:29.410 44.5    (should be `Value` at `2000-11-13 04:51:29.410`, if there isn't, find the nearest, before or after doesn't matter)
67  2000-12-01 22:17:50.300 42.5    (should be `Value` at `2000-12-01 22:16:50.300`, if there isn't, find the nearest, ...)
129 2000-11-24 00:57:11.950 43.0    (should be `Value` at `2000-11-24 00:56:11.950`, if there isn't, find the nearest, ...)
176 2000-12-03 01:40:16.250 42.0    (should be `Value` at `2000-12-03 01:39:16.250`, if there isn't, find the nearest, ...)
246 2000-11-12 07:32:54.000 43.5    (should be `Value` at `2000-11-12 07:31:54.000`, if there isn't, find the nearest, ...)
281 2000-11-30 21:13:07.630 45.5    (should be `Value` at `2000-11-30 21:12:07.630`, if there isn't, find the nearest, ...)
335 2000-11-30 20:43:11.050 43.5    (should be `Value` at `2000-11-30 20:42:11.050`, if there isn't, find the nearest, ...)
Please note that the sample data is just an example for data format where in reality there is more data with smaller time intervals.
                first off, why for loop. what is the input to check the nearest one minute? Are you looking to get the nearest one minute for each row in the dataframe? You need to be more clear for us to take this on. Always post expected answer so we know what you are looking for
– Joe Ferndz
                Feb 8, 2021 at 5:34
                Subtract the time you want to search for and sort by the abs of the difference. The closest time will have the smallest difference
– Iain Shelvington
                Feb 8, 2021 at 5:38
                Hi @JoeFerndz I need a for-loop as I have multiple other operations which I need to apply on the data, a for-loop would allow for the flexibility. Besides, the dataset is large and for-loop allows me to scan through the data once and apply all functions I need. I hope that makes sense
– nilsinelabore
                Feb 8, 2021 at 22:03
If need match by closest values to 1 minute is possible use merge_asof with same DataFrame with parameters allow_exact_matches=False for prevent matching same rows and tolerance for specify 1 minute:
print (df)
                   Timestamp  Value
36   2000-11-08 23:30:40.370   45.5
47   2000-11-08 23:30:29.410   44.5  <- change data for match
67   2000-12-01 22:17:50.300   42.5
129  2000-11-24 00:57:11.950   43.0
176  2000-12-03 01:40:16.250   42.0
246  2000-11-12 07:32:54.000   43.5
281  2000-11-30 21:13:07.630   45.5
335  2000-11-30 20:43:11.050   43.5
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.sort_values('Timestamp')
df = pd.merge_asof(df, df, 
                   on='Timestamp', 
                   allow_exact_matches=False,
                   tolerance=pd.Timedelta(1, 'min'))
print (df)
                Timestamp  Value_x  Value_y
0 2000-11-08 23:30:29.410     44.5      NaN
1 2000-11-08 23:30:40.370     45.5     44.5
2 2000-11-12 07:32:54.000     43.5      NaN
3 2000-11-24 00:57:11.950     43.0      NaN
4 2000-11-30 20:43:11.050     43.5      NaN
5 2000-11-30 21:13:07.630     45.5      NaN
6 2000-12-01 22:17:50.300     42.5      NaN
7 2000-12-03 01:40:16.250     42.0      NaN 
                Hi jezrael, thanks for your answer. Is there a for-loop solution to do this? As I have other operations which I need to apply to the function, a for-loop would allow for that.
– nilsinelabore
                Feb 8, 2021 at 22:00
                @nilsinelabore - hmmm, unfortunately  I have no idea how create  merge_asof in loops (with good performance). Btw,  what are another operation need in loop? Is possible be more specific? Because if need loops, why not first use merge_asof and then use loops? Not understand it.
– jezrael
                Feb 9, 2021 at 5:39
                So there are multiple other things I'd like to calculate. One of them is that I only need to find the Value nearest to 1 minutes ago if the some_condition in the for-loop qualifies the current data point as a "peak", which means that it is higher than both the value before and after itself. Besides, after finding the data point, say point A 1min ago, I also need to find the values in the range of +/-2min of point A, that's why I need the loop.
– nilsinelabore
                Feb 9, 2021 at 5:47
                @nilsinelabore - Ok, unfortunately I have no idea for fast merge_asof in loop, so I cant help you  with it.
– jezrael
                Feb 9, 2021 at 5:48
I know you are looking for a for-loop solution. Here's something you can do without a for loop.
import pandas as pd
import numpy as np
c = ['Timestamp','Value']
d = [['2000-11-08 23:30:40.370', 45.5],
     ['2000-11-13 04:52:00.410', 31.5], #modified
     ['2000-11-13 04:52:15.410', 32.5], #modified 15 secs from 2000-11-13 04:52:00.410
     ['2000-11-13 04:52:30.410', 33.5], #modified 30 secs from 2000-11-13 04:52:00.410
     ['2000-11-13 04:52:45.410', 34.5], #modified 45 secs from 2000-11-13 04:52:00.410
     ['2000-11-13 04:53:00.410', 35.5], #modified  1  min from 2000-11-13 04:52:00.410
     ['2000-11-13 04:53:15.410', 36.5], #modified new set within 1 min of 2000-11-13 04:52:15.410
     ['2000-12-01 22:17:50.300', 42.5],
     ['2000-11-24 00:57:11.950', 43.0],
     ['2000-12-03 01:40:16.250', 42.0],
     ['2000-11-12 07:32:54.000', 43.5],
     ['2000-11-30 21:13:07.630', 45.5],
     ['2000-11-30 20:43:11.050', 43.5]]
df = pd.DataFrame(d,columns=c)
#convert timestamp to datetime format
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
#sort dataframe by Timestamp
df = df.sort_values('Timestamp')
df['Diff_Next'] = ((pd.to_datetime(df['Timestamp'].shift(-1)) - pd.to_datetime(df['Timestamp'])).dt.total_seconds() / 60)
df['Diff_Prev'] = df['Diff_Next'].shift(1)
df.loc[df['Diff_Next'] <= 1,'Nearest_Value'] = df.Value.shift(-1).fillna(0)
df.loc[df['Diff_Prev'] <= 1,'Nearest_Value'] = df.Value.shift(1).fillna(0)
print (df)
This will check if previous timestamp is within 1 min. If so, it will use that value else, it will look for next timestamp and if that's within 1 min, then it will use that value. If none, it will assign NaN.
I need to check for next before prev. That will ensure prev value takes precedence.
The output of the modified dataframe will be:
                 Timestamp  Value     Diff_Next     Diff_Prev  Nearrest_Value
0  2000-11-08 23:30:40.370   45.5   4802.227167           NaN             NaN
10 2000-11-12 07:32:54.000   43.5   1279.106833   4802.227167             NaN
1  2000-11-13 04:52:00.410   31.5      0.250000   1279.106833             NaN
2  2000-11-13 04:52:15.410   32.5      0.250000      0.250000            31.5
3  2000-11-13 04:52:30.410   33.5      0.250000      0.250000            32.5
4  2000-11-13 04:52:45.410   34.5      0.250000      0.250000            33.5
5  2000-11-13 04:53:00.410   35.5      0.250000      0.250000            34.5
6  2000-11-13 04:53:15.410   36.5  15603.942333      0.250000            35.5
8  2000-11-24 00:57:11.950   43.0   9825.985000  15603.942333             NaN
12 2000-11-30 20:43:11.050   43.5     29.943000   9825.985000             NaN
11 2000-11-30 21:13:07.630   45.5   1504.711167     29.943000             NaN
7  2000-12-01 22:17:50.300   42.5   1642.432500   1504.711167             NaN
9  2000-12-03 01:40:16.250   42.0           NaN   1642.432500             NaN
Note here that the values in Diff_Prev and Diff_Next is in whole number representation. So 0.25 is 15 min (quarter of a minute).
If you still want to iterate through the dataframe using loops, then use this solution. I strongly recommend NOT doing this.
for i,row in df.iterrows():
    if row['Diff_Prev'] <= 1:
        df.loc[i,'Nearrest_Value'] = df.loc[i-1,'Value']
    elif row['Diff_Prev'] <= 1:
        df.loc[i,'Nearrest_Value'] = df.loc[i+1,'Value']
You will get same results. The good part of this loop is that you don't have to worry about first row and last row. The NaN value in Diff_Prev and Diff_Next will resolve itself due to the if and elif statements.
Let me know if you are looking for the value that's closest to 1 min window between current row and any previous rows.
For example if the data is as shown below,
 ['2000-11-13 04:52:00.410', 44.5], 
 ['2000-11-13 04:52:15.410', 44.5], 
 ['2000-11-13 04:52:30.410', 44.5], 
 ['2000-11-13 04:52:45.410', 44.5], 
 ['2000-11-13 04:53:00.410', 44.5], 
 ['2000-11-13 04:53:15.410', 44.5], 
Do you want 04:52:45 to be the closest 1 min difference to 04:52:00 and ignore other values like 04:52:15 and 04:52:30 ?
                Thanks for the solution Joe. Yes I need the data closest to the 1 min gap time point, rather than any data that is within 1min time difference.
– nilsinelabore
                Feb 9, 2021 at 1:57
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.