在Pandas中寻找日期列的最小和最大值

link之家
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
相关文章推荐
爱旅游的帽子 · 在各时区之间转换时间 - .NET | ...· 1 年前 ·
欢快的领带 · 如何用java连接到rabbitmq集群？- ...· 1 年前 ·
大方的钥匙扣 · CSS如何给文字添加下划线样式？ - ...· 2 年前 ·
爱笑的高山 · 二十个 Laravel Eloquent ...· 2 年前 ·
任性的斑马 · 用latexmk编译XeLaTeX ...· 2 年前 ·
But executing the below-mentioned code is giving me the wrong output.
print(customer_final['tran_date'].max())
print(customer_final['tran_date'].min())
2014-12-02 00:00:00
2011-01-02 00:00:00
如果有任何帮助，我们将非常感激。
编辑：发布原始数据。
transaction_id  cust_id tran_date   prod_subcat_code    prod_cat_code   Qty Rate    Tax total_amt   Store_type
0   80712190438 270351  28-02-2014  1   1   -5  -772    405.300 -4265.300   e-Shop
1   29258453508 270384  27-02-2014  5   3   -5  -1497   785.925 -8270.925   e-Shop
2   51750724947 273420  24-02-2014  6   5   -2  -791    166.110 -1748.110   TeleShop
3   93274880719 271509  24-02-2014  11  6   -3  -1363   429.345 -4518.345   e-Shop
4   51750724947 273420  23-02-2014  6   5   -2  -791    166.110 -1748.110   TeleShop
... ... ... ... ... ... ... ... ... ... ...
23048   94340757522 274550  25-01-2011  12  5   1   1264    132.720 1396.720    e-Shop
23049   89780862956 270022  25-01-2011  4   1   1   677 71.085  748.085 e-Shop
23050   85115299378 271020  25-01-2011  2   6   4   1052    441.840 4649.840    MBR
23051   72870271171 270911  25-01-2011  11  5   3   1142    359.730 3785.730    TeleShop
23052   77960931771 271961  25-01-2011  11  5   1   447 46.935  493.935 TeleShop
编辑2：DF中所有列的数据类型。
transaction_id               int64
cust_id                      int64
tran_date           datetime64[ns]
prod_subcat_code             int64
prod_cat_code                int64
Qty                          int64
Rate                         int64
Tax                        float64
total_amt                  float64
Store_type                  object
Unnamed: 10                 object
dtype: object
    5 个评论
thebernardlim：
你试过这个吗？stackoverflow.com/questions/23178129/...
coco18：
你的列是日期时间类型吗？
gm-123：
Yes sir! But no luck.
coco18：
尝试在最大位置获得customer_final['tran_date'].max(). dt.day，看看你的数据类型是否正确。
gm-123：
@coc018 Yes, it's dtype is datetime64[ns]
python
pandas
gm-123发布于 2020-03-08
2 个回答
Valdi_Bo发布于 2020-03-09
已采纳
0 人赞同

显然，你有你的日期（在一个输入文件中）的格式。
不同的方式。
你的一个评论包含Timestamp('2014-12-02 00:00:00')。
所以我看到你有%Y-%m-%d格式化（可能在大多数情况下）。
但在另一个地方，你写了time data '12/2/2014'，所以至少
在某些行中，你有%d/%m/%Y格式化。
使你的输入有秩序。你不能让日期的格式有两个
不同的格式。
I performed the following experiment:
作为源数据，我使用了你的原始数据的一部分（前2行和后2行）。
额外的行（第3行）有不同的日期格式。
存储为一个字符串变量。
  transaction_id cust_id tran_date prod_subcat_code prod_cat_code Qty Rate Tax total_amt Store_type
3       93274880719 271509  24-02-2014  11  6   -3  -1363   429.345 -4518.345   e-Shop
4       51750724947 273420  23-02-2014  6   5   -2  -791    166.110 -1748.110   TeleShop
40      51750724947 273420  12/2/2014   6   5   -2  -791    166.110 -1748.110   TeleShop
23048   94340757522 274550  25-01-2011  12  5   1   1264    132.720 1396.720    e-Shop
23049   89780862956 270022  25-01-2011  4   1   1   677     71.085  748.085     e-Shop'''
注意，初始行在开始时有一些空格，以提供
索引列的空列名。
然后我定义了以下日期解析函数（import re需要）。
将很快被使用。
def dPars(txt):
    if re.match(r'\d{2}-\d{1,2}-\d{4}', txt):
        return pd.to_datetime(txt, format='%d-%m-%Y')
    if re.match(r'\d{2}/\d{1,2}/\d{4}', txt):
        return pd.to_datetime(txt, format='%d/%m/%Y')
    return txt
我看了上面的内容，有上面的日期转换功能。
customer_final = pd.read_csv(io.StringIO(txt), delim_whitespace=True,
    index_col=0, parse_dates=['tran_date'], date_parser=dPars)
I printed 转载_日期 column - print(customer_final.tran_date) - getting
3       2014-02-24
4       2014-02-23
40      2014-02-12
23048   2011-01-25
23049   2011-01-25
Name: tran_date, dtype: datetime64[ns]
因此，所有的日期都被解析为它们应该是的。
我打印了最小/最大日期 - 【替换代码8- 得到了正确的结果。
2011-01-25 00:00:00 2014-02-24 00:00:00
也许你应该以我的实验为基础编写你的代码（在你的代码中用你的输入文件名代替
替换代码10】为你的输入文件名）。
还要注意的是，如果你有一些输入行的格式为12/2/2014，那么
12 is the month数和2 is the 日 number (US date format),
whereas other rows have the 日 number first.
    
gm-123：
我已经尝试了不同的日期格式来检查数据集是否存在任何问题。
Parfait：
如果原字段已经是一个日期时间，如OP comments如果是这样的话，格式化不可能是问题，因为这种类型不允许使用混合类型。这很可能是OP没有对数据进行分类。
Parfait发布于 2020-03-09
0 人赞同

从根本上说，你有两个问题。1）查看简略的数据和2）查看未分类的数据。
你声称。我们可以清楚地看到，在数据集中我们有2011-01-25至2014-02-28之间的数据。然而，Pandas是缩写了你的unsorted数据，省略许多你的23000行数据框架的行数用省略号表示。...。因此，你引用的这个人工检查的日期只是来自无序数据的头部和尾部，不会与min和max的值相匹配。
customer_final['tran_date']
# 0       2014-02-28       # <---- HEAD OF UNSORTED DATA
# 1       2014-02-27
# 2       2014-02-24
# 3       2014-02-24
# 4       2014-02-23
#            ...           # <---- OMITTED VALUES OF UNSORTED DATA 
# 23048   2011-01-25
# 23049   2011-01-25
# 23050   2011-01-25
# 23051   2011-01-25
# 23052   2011-01-25       # <---- TAIL OF UNSORTED DATA
你可以用pd.set_option('display.max_rows', None)来删除遗漏的行，但你可能会让自己不堪重负，显示出23000多个没有被排序的值。
因此，那些min和max是没有错的。为了仔细检查，实际对你的数据进行排序，然后将该列打印出来或其头部和尾部。这样做，汇总的数字应该相应地匹配。
# SORT DATA FRAME IN DESCENDING ORDER BY tran_date
customer_final = customer_final.sort_values(by='tran_date', ascending = False)
# VIEW ALL DATA (ABBREVIATED UNLESS YOU CHANGE SETTING)
customer_final['tran_date']
# VIEW FIRST VALUES (DEFAULT TO 5)