添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

I'm trying to order a .csv file with just over 300 entries and output it all back out ordered by the numerical values in one specific column under a dialect. Here's the code I've written so far but it just seems to output the data as it went in

import csv
import itertools
from itertools import groupby as gb
reader = csv.DictReader(open('Full_List.csv', 'r'))
groups = gb(reader, lambda d: d['red label'])
result = [max(g, key=lambda d: d['red label']) for k, g in groups]
writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)

There's only 50 rows in the whole file that contain a value under the dialect "red label" and all the others are left blank. It's in the Z column on the .csv(but not that last one) so I'd assume the index of the column is 25(0 being the first). Any help would be greatly appreciated.

groupby isn't for sorting, it's for chunking an iterable. From the docs for itertools.groupby: "Generally, the iterable needs to already be sorted on the same key function." – Steven Rumbalski Mar 21 '13 at 23:15 df = pd.read_csv('Full_List.csv') df = df.sort('red label') df.to_csv('Full_List_sorted.csv', index=False)

You may need to adjust the options to read_csv and to_csv to match the format of your CSV file.

share improve this answer I've tried using the pandas method you have told me about but whenever I run the script I get the error "No module pandas exists" even though I've installed it from my python directory using sudo apt-get install python-pandas – AzKai Mar 26 '13 at 0:40 Edit: I've figured out what the problem is in trying to run pandas. When I installed it it installed into my python2.7 folder but when I run my script it's running from the python3.2 folder which is in the same directory as the 2.7 version which is /usr/local/lib and I've no idea how to change my script to run from that directory – AzKai Mar 26 '13 at 17:53 Finally got around the pandas error but the output is still the same as the above method that Steven gave me – AzKai Mar 26 '13 at 18:53

groupby isn't for sorting, it's for chunking an iterable. For sorting use sorted.

import csv
reader = csv.DictReader(open('Full_List.csv', 'r'))
result = sorted(reader, key=lambda d: float(d['red label']))
writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)

Note: I changed your lambda to cast your character data to float for correct numerical sorting.

share improve this answer I've tried that and gotten the following error: ValueError: could not convert string to float: I changed the casting from float to str. It compiled but it completely eliminated all values in the column it's sorting – AzKai Mar 22 '13 at 0:04 From the ValueError it appears that d['red label'] does not always return numeric data. Do you have any empty fields? As regards to "it completely eliminated all values in the column", I think that is not the case. This code does not overwrite any values. It would be helpful to see your actual data. – Steven Rumbalski Mar 22 '13 at 14:39 If those blank fields can be sorted as if they have a value of 0.0 change float(d['red label']) to float(d['red label']) if d['red label']) else 0.0. – Steven Rumbalski Mar 22 '13 at 21:21 @AzKai: Post the first ten lines of your file. Something is not quite right here. – Steven Rumbalski Mar 23 '13 at 1:26

I found with testing that the following works on csv files that I have. Note that all rows of the column have valid entries.

from optparse import OptionParser
# Create options.statistic using -s
# Open and set up input file
ifile = open(options.filein, 'rb')
reader = cvs.DictReader(ifile)
# Create the sorted list
  print 'Try the float version'
  sortedlist = sorted(reader, key = lambda d: float(d[options.statistic]), reverse=options.high)
except ValueError:
  print 'Need to use the text version'
  ifile.seek(0)
  ifile.next()
  sortedlist = sorted(reader, key=lambda d: d[options.statistic], reverse=options.high)
# Close the input file. This allows the input file to be the same as the output file
ifile.close()
# Open the output file
ofile = open(options.fileout, 'wb')
writer = csv.DictWriter(ofile, fieldnames=outfields, extrasactions='ignore', restval = '')
# Output the header
writer.writerow(dict((fn, fn) for fn in outfields))
# Output the sorted list
writer.writerows(sortedlist)
ofile.close()
        
            share
                    improve this answer