07_pandas.DataFrame的for循环处理（迭代）

当使用for语句循环（迭代）pandas.DataFrame时，简单的使用for语句便可以取得返回列名，因此使用重复使用for方法，便可以获取每行的值。
以下面的pandas.DataFrame为例。
import pandas as pd
df = pd.DataFrame({'age': [24, 42], 'state': ['NY', 'CA'], 'point': [64, 92]},
                  index=['Alice', 'Bob'])
print(df)
#        age state  point
# Alice   24    NY     64
# Bob     42    CA     92
在此对以下内容进行说明： 
pandas.DataFrame for循环的应用
逐列检索 
  DataFrame.iteritems()
 
逐行检索 
  DataFrame.iterrows()
DataFrame.itertuples()
 
检索特定列的值
循环更新值 
pandas.DataFrame for循环的应用
 
当pandas.DataFrame直接使用for循环时，按以下顺序获取列名（列名）。 
for column_name in df:
    print(type(column_name))
    print(column_name)
    print('======\n')
# <class 'str'>
# age
# ======
# <class 'str'>
# state
# ======
# <class 'str'>
# point
# ======
调用方法__iter __（）。 
for column_name in df.__iter__():
    print(type(column_name))
    print(column_name)
    print('======\n')
# <class 'str'>
# age
# ======
# <class 'str'>
# state
# ======
# <class 'str'>
# point
# ======
DataFrame.iteritems()
 
使用iteritems（）方法，您可以一一获取列名称（列名称）和元组（列名称，系列）的每个列的数据（pandas.Series类型）。 
pandas.Series可以通过指定索引名称等来检索行的值。 
for column_name, item in df.iteritems():
    print(type(column_name))
    print(column_name)
    print('~~~~~~')
    print(type(item))
    print(item)
    print('------')
    print(item['Alice'])
    print(item[0])
    print(item.Alice)
    print('======\n')
# <class 'str'>
# age
# ~~~~~~
# <class 'pandas.core.series.Series'>
# Alice    24
# Bob      42
# Name: age, dtype: int64
# ------
# ======
# <class 'str'>
# state
# ~~~~~~
# <class 'pandas.core.series.Series'>
# Alice    NY
# Bob      CA
# Name: state, dtype: object
# ------
# ======
# <class 'str'>
# point
# ~~~~~~
# <class 'pandas.core.series.Series'>
# Alice    64
# Bob      92
# Name: point, dtype: int64
# ------
# ======
一次检索一行的方法包括iterrows（）和itertuples（）。 itertuples（）更快。 
如果只需要特定列的值，则如下所述，指定列并将它们分别在for循环中进行迭代会更快。 
DataFrame.iterrows()
 
通过使用iterrows（）方法，可以获得每一行的数据（pandas.Series类型）和行名和元组（索引，系列）。 
pandas.Series可以通过指定列名等来检索列的值。 
for index, row in df.iterrows():
    print(type(index))
    print(index)
    print('~~~~~~')
    print(type(row))
    print(row)
    print('------')
    print(row['point'])
    print(row[2])
    print(row.point)
    print('======\n')
# <class 'str'>
# Alice
# ~~~~~~
# <class 'pandas.core.series.Series'>
# age      24
# state    NY
# point    64
# Name: Alice, dtype: object
# ------
# ======
# <class 'str'>
# Bob
# ~~~~~~
# <class 'pandas.core.series.Series'>
# age      42
# state    CA
# point    92
# Name: Bob, dtype: object
# ------
# ======
DataFrame.itertuples()
 
使用itertuples（）方法，可以一一获取索引名（行名）和该行数据的元组。元组的第一个元素是索引名称。 
默认情况下，返回一个名为Pandas的namedtuple。由于它是namedtuple，因此可以访问每个元素的值。 
for row in df.itertuples():
    print(type(row))
    print(row)
    print('------')
    print(row[3])
    print(row.point)
    print('======\n')
# <class 'pandas.core.frame.Pandas'>
# Pandas(Index='Alice', age=24, state='NY', point=64)
# ------
# ======
# <class 'pandas.core.frame.Pandas'>
# Pandas(Index='Bob', age=42, state='CA', point=92)
# ------
# ======
如果参数name为None，则返回一个普通的元组。 
for row in df.itertuples(name=None):
    print(type(row))
    print(row)
    print('------')
    print(row[3])
    print('======\n')
# <class 'tuple'>
# ('Alice', 24, 'NY', 64)
# ------
# ======
# <class 'tuple'>
# ('Bob', 42, 'CA', 92)
# ------
# ======
检索特定列的值
 
上述的iterrows（）和itertuples（）方法可以检索每一行中的所有列元素，但是如果仅需要特定的列元素，可以使用以下方法。 
pandas.DataFrame的列是pandas.Series。 
print(df['age'])
# Alice    24
# Bob      42
# Name: age, dtype: int64
print(type(df['age']))
# <class 'pandas.core.series.Series'>
如果将pandas.Series应用于for循环，则可以按顺序获取值，因此，如果指定pandas.DataFrame列并将其应用于for循环，则可以按顺序获取该列中的值。 
for age in df['age']:
    print(age)
如果使用内置函数zip（），则可以一次收集多列值。 
for age, point in zip(df['age'], df['point']):
    print(age, point)
# 24 64
# 42 92
如果要获取索引（行名），使用index属性。如以上示例所示，可以与其他列一起通过zip（）获得。 
print(df.index)
# Index(['Alice', 'Bob'], dtype='object')
print(type(df.index))
# <class 'pandas.core.indexes.base.Index'>
for index in df.index:
    print(index)
# Alice
# Bob
for index, state in zip(df.index, df['state']):
    print(index, state)
# Alice NY
# Bob CA
循环更新值
 
iterrows（）方法逐行检索值，返回一个副本，而不是视图，因此更改pandas.Series不会更新原始数据。 
for index, row in df.iterrows():
    row['point'] += row['age']
print(df)
#        age state  point
# Alice   24    NY     64
# Bob     42    CA     92
at[]选择并处理原始DataFrame中的数据时更新。 
for index, row in df.iterrows():
    df.at[index, 'point'] += row['age']
print(df)
#        age state  point
# Alice   24    NY     88
# Bob     42    CA    134
有关at[]的文章另请参考以下连接。 
04_Pandas获取和修改任意位置的值（at,iat,loc,iloc） 
请注意，上面的示例使用at[]只是一个示例，在许多情况下，有必要使用for循环来更新元素或基于现有列添加新列，for循环的编写更加简单快捷。 
与上述相同的处理。上面更新的对象被进一步更新。 
df['point'] += df['age']
print(df)
#        age state  point
# Alice   24    NY    112
# Bob     42    CA    176
可以添加新列。 
df['new'] = df['point'] + df['age'] * 2
print(df)
#        age state  point  new
# Alice   24    NY    112  160
# Bob     42    CA    176  260
除了简单的算术运算之外，NumPy函数还可以应用于列的每个元素。以下是平方根的示例。另外，这里，NumPy的功能可以通过pd.np访问，但是，当然可以单独导入NumPy。 
df['age_sqrt'] = pd.np.sqrt(df['age'])
print(df)
#        age state  point  new  age_sqrt
# Alice   24    NY    112  160  4.898979
# Bob     42    CA    176  260  6.480741
对于字符串，提供了用于直接处理列（系列）的字符串方法。下面是转换为小写并提取第一个字符的示例。 
df['state_0'] = df['state'].str.lower().str[0]
print(df)
#        age state  point  new  age_sqrt state_0
# Alice   24    NY    112  160  4.898979       n
# Bob     42    CA    176  260  6.480741       c
                    07_pandas.DataFrame的for循环处理（迭代）当使用for语句循环（迭代）pandas.DataFrame时，简单的使用for语句便可以取得返回列名，因此使用重复使用for方法，便可以获取每行的值。以下面的pandas.DataFrame为例。import pandas as pddf = pd.DataFrame({'age': [24, 42], 'state': [...
        return x/acidity_max
    X[i] =X[i].map(transform)str2float 法1： 
把每一列中的字符串转换成int类型race=X['race'].unique()
import numpy as np
import pandas as pd
data = {'city': ['Beijing', 'Shanghai', 'Guangzhou', 'Shenzhen', 'Hangzhou', 'Chongqing'],
    'year': [2016,2016,2015,2017,2016, 2016],
    'population': [2100, 2300, 1000, 700, 500, 500]}
frame = pd.DataFrame(data, columns = ['year', 'city', 'populat
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark import SparkContext
#初始化数据
#初始化pandas DataFrame
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], index=['row1', 'row2'], columns=['c1', 'c2', 'c3'])
#打印数据
				在pandas中dataframe可以一维格式化的二维数据，是一个很清晰数据表， 那你知道如何遍历这个数据表吗？本文介绍pandas遍历dataframe方法：1、使用df.iterrows()获取可迭代对象, 然后使用for循环遍历；2、使用applymap()函数遍历dataframe所有元素；3、按行遍历迭代成元组。
方法一：使用df.iterrows()获取可迭代对象, 然后使用for循环遍历即可
for index, row in df.iterrows():
  print(index, ro
那么可以用python的pandas库来实现。
pandas的dataframe有一个很好用的函数applymap，它可以把某个函数应用到dataframe的每一个元素上，而且比常规的for循环去遍历每个元素要快很多。如下是相关代码：
import pandas as pd
data = [[...
				pandas.DataFrame(output_10.detach().numpy()) 输出的类型是 pandas 数据帧。
pandas 是一个用于数据分析的开源库。数据帧是 pandas 中用于存储表格数据的数据结构。它由一个二维结构组成，其中有行和列。每一行代表一个观察值，每一列代表一个变量。
output_10.detach().numpy() 的输出类型是 numpy 数组。numpy 是一个用于进行科学计算的 Python 库，它提供了许多用于操作数组的函数和方法。数组是 numpy 中的主要数据结构，它由一个类似于 Python 列表的多维数据集合组成。
因此，pandas.DataFrame(output_10.detach().numpy()) 输出的是一个从 numpy 数组转换而来的 pandas 数据帧。