PAGE2GO2 HOME | INTERNET NEWS

LeighExchange - Free Advertising Network Stock Research at Internet Speed Need Money Easy and Quick?

Python object overhead?

 List
Subject: Python object overhead?
Poster: Matt Garman
Date: Fri, 23 Mar 2007 15:11:35 -0600
Related Postings: 1 2 3 4 5 6 7
I'm trying to use Python to work with large pipe ('|') delimited data files. The files range in size from 25 MB to 200 MB.

Since each line corresponds to a record, what I'm trying to do is create an object from each record. However, it seems that doing this causes the memory overhead to go up two or three times.

See the two examples below: running each on the same input file results in 3x the memory usage for Example 2. (Memory usage is checked using top.)

This happens for both Python 2.4.3 on Gentoo Linux (64bit) and Python 2.3.4 on CentOS 4.4 (64bit).

Is this "just the way it is" or am I overlooking something obvious?

Thanks, Matt

Example 1: read lines into list: # begin readlines.py import sys, time filedata = list() file = open(sys.argv[1]) while True: line = file.readline() if len(line) == 0: break # EOF filedata.append(line) file.close() print "data read; sleeping 20 seconds..." time.sleep(20) # gives time to check top # end readlines.py

Example 2: read lines into objects: # begin readobjects.py import sys, time class FileRecord: def __init__(self, line): self.line = line records = list() file = open(sys.argv[1]) while True: line = file.readline() if len(line) == 0: break # EOF rec = FileRecord(line) records.append(rec) file.close() print "data read; sleeping 20 seconds..." time.sleep(20) # gives time to check top # end readobjects.py

 

Page2Go2.com is not responsible for content of this message.