Sorting by Specific Column data using .csv in python

link之家

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

力能扛鼎的乌龙茶 · influxdb基础（七）——select查 ...· 1 年前 ·

健壮的茶壶 · 不用PyScript，网页端运行的Pytho ...· 1 年前 ·

潇洒的饼干 · Python将json文件写入ES数据库的方 ...· 2 年前 ·

I'm trying to order a .csv file with just over 300 entries and output it all back out ordered by the numerical values in one specific column under a dialect. Here's the code I've written so far but it just seems to output the data as it went in

import csv
import itertools
from itertools import groupby as gb
reader = csv.DictReader(open('Full_List.csv', 'r'))
groups = gb(reader, lambda d: d['red label'])
result = [max(g, key=lambda d: d['red label']) for k, g in groups]
writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)
There's only 50 rows in the whole file that contain a value under the dialect "red label" and all the others are left blank. 
It's in the Z column on the .csv(but not that last one) so I'd assume the index of the column is 25(0 being the first).
Any help would be greatly appreciated.
                groupby isn't for sorting, it's for chunking an iterable.  From the docs for  itertools.groupby: "Generally, the iterable needs to already be sorted on the same key function."
                    – Steven Rumbalski
                Mar 21 '13 at 23:15
df = pd.read_csv('Full_List.csv')
df = df.sort('red label')
df.to_csv('Full_List_sorted.csv', index=False)
You may need to adjust the options to read_csv and to_csv to match the format of your CSV file.
        https://stackoverflow.com/questions/15559812/sorting-by-specific-column-data-using-csv-in-python/15561404#15561404
            share
                    improve this answer
                I've tried using the pandas method you have told me about but whenever I run the script I get the error "No module pandas exists" even though I've installed it from my python directory using sudo apt-get install python-pandas
                    – AzKai
                Mar 26 '13 at 0:40
                Edit: I've figured out what the problem is in trying to run pandas. When I installed it it installed into my python2.7 folder but when I run my script it's running from the python3.2 folder which is in the same directory as the 2.7 version which is /usr/local/lib and I've no idea how to change my script to run from that directory
                    – AzKai
                Mar 26 '13 at 17:53
                Finally got around the pandas error but the output is still the same as the above method that Steven gave me
                    – AzKai
                Mar 26 '13 at 18:53
groupby isn't for sorting, it's for chunking an iterable.  For sorting use sorted.  
import csv
reader = csv.DictReader(open('Full_List.csv', 'r'))
result = sorted(reader, key=lambda d: float(d['red label']))
writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)
Note: I changed your lambda to cast your character data to float for correct numerical sorting.
        https://stackoverflow.com/questions/15559812/sorting-by-specific-column-data-using-csv-in-python/15559985#15559985
            share
                    improve this answer
                I've tried that and gotten the following error: ValueError: could not convert string to float: I changed the casting from float to str. It compiled but it completely eliminated all values in the column it's sorting
                    – AzKai
                Mar 22 '13 at 0:04
                From the ValueError it appears that d['red label'] does not always return numeric data.  Do you have any empty fields?  As regards to "it completely eliminated all values in the column", I think that is not the case.  This code does not overwrite any values.  It would be helpful to see your actual data.
                    – Steven Rumbalski
                Mar 22 '13 at 14:39
                If those blank fields can be sorted as if they have a value of 0.0 change float(d['red label']) to float(d['red label']) if d['red label']) else 0.0.
                    – Steven Rumbalski
                Mar 22 '13 at 21:21
                @AzKai: Post the first ten lines of your file.  Something is not quite right here.
                    – Steven Rumbalski
                Mar 23 '13 at 1:26
I found with testing that the following works on csv files that I have. Note that all rows of the column have valid entries.
from optparse import OptionParser
# Create options.statistic using -s
# Open and set up input file
ifile = open(options.filein, 'rb')
reader = cvs.DictReader(ifile)
# Create the sorted list
  print 'Try the float version'
  sortedlist = sorted(reader, key = lambda d: float(d[options.statistic]), reverse=options.high)
except ValueError:
  print 'Need to use the text version'
  ifile.seek(0)
  ifile.next()
  sortedlist = sorted(reader, key=lambda d: d[options.statistic], reverse=options.high)
# Close the input file. This allows the input file to be the same as the output file
ifile.close()
# Open the output file
ofile = open(options.fileout, 'wb')
writer = csv.DictWriter(ofile, fieldnames=outfields, extrasactions='ignore', restval = '')
# Output the header
writer.writerow(dict((fn, fn) for fn in outfields))
# Output the sorted list
writer.writerows(sortedlist)
ofile.close()
        https://stackoverflow.com/questions/15559812/sorting-by-specific-column-data-using-csv-in-python/21169395#21169395
            share
                    improve this answer