我有一个有6列的表,我想根据 "记录 "归档,按 "ID1 "和 "ID2 "来分组行。我的记录字段要么是 "IN",要么是 "OUT",它们按日期排序。
这里是我的输入样本...
data = [("ACC.PXP","7246","2020-02-24T14:49:00",None,None,'IN'),
("ACC.PXP","7246","2021-03-09T08:20:00","Hospital","Foundation","OUT"),
("ACC.PXP","7246","2021-04-05T17:17:00","Hospital","Foundation","IN")
df = spark.createDataFrame(data=data,schema=['ID1','ID2','date','type','name','record'])
df.show(truncate=False)
+-------+----+-------------------+--------+----------+------+
|ID1 |ID2 |date |type |name |record|
+-------+----+-------------------+--------+----------+------+
|ACC.PXP|7246|2020-02-24T14:49:00|null |null |IN |
|ACC.PXP|7246|2021-03-09T08:20:00|Hospital|Foundation|OUT |
|ACC.PXP|7246|2021-04-05T17:17:00|Hospital|Foundation|IN |
以下是我想要的结果
data2 = [("ACC.PXP","7246","2020-02-24T14:49:00",None,None, "2021-03-09T08:20:00","Hospital","Foundation"),
("ACC.PXP","7246","2021-04-05T17:17:00","Hospital","Foundation", None,None,None)
df2 = spark.createDataFrame(data=data2,schema=['ID1','ID2','date','type','name','date1','type1','name1'])
df2.show(truncate=False)
+-------+----+-------------------+--------+----------+-------------------+--------+----------+
|ID1 |ID2 |date |type |name |date1 |type1 |name1 |
+-------+----+-------------------+--------+----------+-------------------+--------+----------+
|ACC.PXP|7246|2020-02-24T14:49:00|null |null |2021-03-09T08:20:00|Hospital|Foundation|
|ACC.PXP|7246|2021-04-05T17:17:00|Hospital|Foundation|null |null |null |
+-------+----+-------------------+--------+----------+-------------------+--------+----------+