While a program is running, its data is stored in random access memory (RAM), which is volatile, i.e., when the program ends, data in RAM disappears.
To make data available the next time the program is started, it has to be written to a non-volatile storage medium.
Data on non-volatile storage media is stored in named locations on the media called files. By reading and writing files, programs can save information between program runs.
In Python, we must open files before we can use them and close them when we are done with them.
Method | Use | Explanation |
---|---|---|
open | open(filename,'r') | Open a file called filename and use it for reading. This will return a reference to a file object. |
open | open(filename,'w') | Open a file called filename and use it for writing. This will also return a reference to a file object. |
close | filevariable.close() | File use is complete. |
Method | Use | Explanation |
---|---|---|
write | filevar.write(astring) | Add astring to the end of the file. filevar must refer to a file that has been opened for writing. |
read(n) | filevar.read() | Reads and returns a string of n characters, or the entire file as a single string if n is not provided. |
readline(n) | filevar.readline() | Returns the next line of the file with all text up to and including the newline character. If n is provided as a parameter then only n characters will be returned if the line is longer than n. |
readlines(n) | filevar.readlines() | Returns a list of strings, each representing a single line of the file. If n is not provided then all lines of the file are returned. If n is provided then n characters are read but n is rounded up so that an entire line is returned. |
myfile = open("test.txt", "w")
myfile.write("My first file written from Python\n")
myfile.write("---------------------------------\n")
myfile.write("Hello, world!\n")
myfile.close()
%%bash
ls
cat test.txt
myfile = open("test.txt", "w")
myfile = open("subdir/test.txt", "w")
myfile = open("../../chap2/test.txt", "w")
Suppose we have a text file called qbdata.txt
that contains the following data representing statistics about NFL quarterbacks.
data format:
First Name, Last Name, Position, Team, Completions, Attempts, Yards, TDs, Ints, Comp%, Rating
Colt McCoy QB CLE 135 222 1576 6 9 60.8% 74.5
Josh Freeman QB TB 291 474 3451 25 6 61.4% 95.9
Michael Vick QB PHI 233 372 3018 21 6 62.6% 100.2
Matt Schaub QB HOU 365 574 4370 24 12 63.6% 92.0
Philip Rivers QB SD 357 541 4710 30 13 66.0% 101.8
Matt Hasselbeck QB SEA 266 444 3001 12 17 59.9% 73.2
Jimmy Clausen QB CAR 157 299 1558 3 9 52.5% 58.4
Joe Flacco QB BAL 306 489 3622 25 10 62.6% 93.6
Kyle Orton QB DEN 293 498 3653 20 9 58.8% 87.5
Jason Campbell QB OAK 194 329 2387 13 8 59.0% 84.5
Peyton Manning QB IND 450 679 4700 33 17 66.3% 91.9
Drew Brees QB NO 448 658 4620 33 22 68.1% 90.9
Matt Ryan QB ATL 357 571 3705 28 9 62.5% 91.0
Matt Cassel QB KC 262 450 3116 27 7 58.2% 93.0
Mark Sanchez QB NYJ 278 507 3291 17 13 54.8% 75.3
Brett Favre QB MIN 217 358 2509 11 19 60.6% 69.9
David Garrard QB JAC 236 366 2734 23 15 64.5% 90.8
Eli Manning QB NYG 339 539 4002 31 25 62.9% 85.3
Carson Palmer QB CIN 362 586 3970 26 20 61.8% 82.4
Alex Smith QB SF 204 342 2370 14 10 59.6% 82.1
Chad Henne QB MIA 301 490 3301 15 19 61.4% 75.4
Tony Romo QB DAL 148 213 1605 11 7 69.5% 94.9
Jay Cutler QB CHI 261 432 3274 23 16 60.4% 86.3
Jon Kitna QB DAL 209 318 2365 16 12 65.7% 88.9
Tom Brady QB NE 324 492 3900 36 4 65.9% 111.0
Ben Roethlisberger QB PIT 240 389 3200 17 5 61.7% 97.0
Kerry Collins QB TEN 160 278 1823 14 8 57.6% 82.2
Derek Anderson QB ARI 169 327 2065 7 10 51.7% 65.9
Ryan Fitzpatrick QB BUF 255 441 3000 23 15 57.8% 81.8
Donovan McNabb QB WAS 275 472 3377 14 15 58.3% 77.1
Kevin Kolb QB PHI 115 189 1197 7 7 60.8% 76.1
Aaron Rodgers QB GB 312 475 3922 28 11 65.7% 101.2
Sam Bradford QB STL 354 590 3512 18 15 60.0% 76.5
Shaun Hill QB DET 257 416 2686 16 12 61.8% 81.3
Now create qbdata.txt
%%bash
cat << _EOF > qbdata.txt
Colt McCoy QB CLE 135 222 1576 6 9 60.8% 74.5
Josh Freeman QB TB 291 474 3451 25 6 61.4% 95.9
Michael Vick QB PHI 233 372 3018 21 6 62.6% 100.2
Matt Schaub QB HOU 365 574 4370 24 12 63.6% 92.0
Philip Rivers QB SD 357 541 4710 30 13 66.0% 101.8
Matt Hasselbeck QB SEA 266 444 3001 12 17 59.9% 73.2
Jimmy Clausen QB CAR 157 299 1558 3 9 52.5% 58.4
Joe Flacco QB BAL 306 489 3622 25 10 62.6% 93.6
Kyle Orton QB DEN 293 498 3653 20 9 58.8% 87.5
Jason Campbell QB OAK 194 329 2387 13 8 59.0% 84.5
Peyton Manning QB IND 450 679 4700 33 17 66.3% 91.9
Drew Brees QB NO 448 658 4620 33 22 68.1% 90.9
Matt Ryan QB ATL 357 571 3705 28 9 62.5% 91.0
Matt Cassel QB KC 262 450 3116 27 7 58.2% 93.0
Mark Sanchez QB NYJ 278 507 3291 17 13 54.8% 75.3
Brett Favre QB MIN 217 358 2509 11 19 60.6% 69.9
David Garrard QB JAC 236 366 2734 23 15 64.5% 90.8
Eli Manning QB NYG 339 539 4002 31 25 62.9% 85.3
Carson Palmer QB CIN 362 586 3970 26 20 61.8% 82.4
Alex Smith QB SF 204 342 2370 14 10 59.6% 82.1
Chad Henne QB MIA 301 490 3301 15 19 61.4% 75.4
Tony Romo QB DAL 148 213 1605 11 7 69.5% 94.9
Jay Cutler QB CHI 261 432 3274 23 16 60.4% 86.3
Jon Kitna QB DAL 209 318 2365 16 12 65.7% 88.9
Tom Brady QB NE 324 492 3900 36 4 65.9% 111.0
Ben Roethlisberger QB PIT 240 389 3200 17 5 61.7% 97.0
Kerry Collins QB TEN 160 278 1823 14 8 57.6% 82.2
Derek Anderson QB ARI 169 327 2065 7 10 51.7% 65.9
Ryan Fitzpatrick QB BUF 255 441 3000 23 15 57.8% 81.8
Donovan McNabb QB WAS 275 472 3377 14 15 58.3% 77.1
Kevin Kolb QB PHI 115 189 1197 7 7 60.8% 76.1
Aaron Rodgers QB GB 312 475 3922 28 11 65.7% 101.2
Sam Bradford QB STL 354 590 3512 18 15 60.0% 76.5
Shaun Hill QB DET 257 416 2686 16 12 61.8% 81.3
_EOF
qbfile = open("qbdata.txt", "r")
for aline in qbfile:
values = aline.split()
print('QB ', values[0], values[1],
'had a rating of ', values[10] )
qbfile.close()
>>> infile = open("qbdata.txt", "r")
>>> aline = infile.readline()
>>> print(aline)
>>> infile = open("qbdata.txt", "r")
>>> linelist = infile.readlines()
>>> print(len(linelist))
>>> print(linelist[0:4])
infile = open("qbdata.txt", "r")
line = infile.readline()
while line:
values = line.split()
print('QB ', values[0], values[1],
'had a rating of ', values[10] )
line = infile.readline()
infile.close()
infile = open("qbdata.txt", "r")
while True:
line = infile.readline()
if len(line) == 0: # If there are no more lines
break
values = line.split()
print('QB ', values[0], values[1], 'had a rating of ', values[10] )
infile.close()
The readlines
method in reads all the lines and returns a list of the strings.
infile = open("qbdata.txt", "r")
lines = infile.readlines()
infile.close()
for line in lines:
values = line.split()
print('QB ', values[0], values[1],
'had a rating of ', values[10] )
infile = open("qbdata.txt", "r")
content = infile.read()
infile.close()
words = content.split()
print("There are {0} words in the file.".
format(len(words)))
One of the most commonly performed data processing tasks is to read data from a file, manipulate it in some way, and then write the resulting data out to a new data file to be used for other purposes later.
Now we will save QB Names to a new file.
infile = open("qbdata.txt", "r")
aline = infile.readline()
while aline:
items = aline.split()
dataline = items[1] + ',' + items[0]
print(dataline)
aline = infile.readline()
infile.close()
infile = open("qbdata.txt", "r")
outfile = open("qbnames.txt", "w")
aline = infile.readline()
while aline:
items = aline.split()
dataline = items[1] + ',' + items[0]
outfile.write(dataline + '\n')
aline = infile.readline()
infile.close()
outfile.close()
%%bash
cat qbnames.txt
import urllib.request
url = "https://caodg.github.io/ic/demos/foobar.py"
destination_filename = "foobar.py"
urllib.request.urlretrieve(url, destination_filename)
%%bash
ls foo*
Rather than save the web resource to our local disk, we read it directly into a string, and return it:
import urllib.request
def retrieve_page(url):
""" Retrieve the contents(bytes) of a web page.
The contents is decoded to a string before returning it.
"""
my_socket = urllib.request.urlopen(url)
dta = my_socket.read().decode()
my_socket.close()
return dta
the_text = retrieve_page("https://caodg.github.io/ic/demos/foobar.py")
print(the_text)
file = open('notexistfile')
print(55/0)
a = []
print(a[5])
'a'[0] = 'b'
Sometimes we want to execute an operation that might cause an exception, but we don’t want the program to stop. We can handle the exception using the try
statement to “wrap” a region of code.
filename = input("Enter a file name: ")
try:
f = open(filename, "r")
except:
print("There is no file named", filename)
filename = input("Enter a file name: ")
try:
f = open(filename, "r")
except Exception: # general exception
print("There is no file named", filename)
handle selected exceptions
while True:
try:
x = int(input("Please enter a number: "))
break
except ValueError:
print("Oops! That was no valid number. Try again...")
while True:
try:
x = int(input("Please enter a number: "))
break
except ValueError:
print("Oops! That was no valid number. Try again...")
except KeyboardInterrupt:
print("Oops! Keyboard Interrupted")
import sys
try:
f = open('foobar.py')
s = f.readline()
i = int(s.strip())
except OSError as err:
print("OS error: {0}".format(err))
except ValueError as err:
print("Could not convert data to an integer: {0}".format(err))
except:
print("Unexpected error:", sys.exc_info()[0])
raise
else
clause
The try ... except
statement has an optional else
clause, which, when present, must follow all except
clauses.
It is useful for code that must be executed if the try
clause does not raise an exception.
filename = input("Enter a file name: ")
try:
f = open(filename, "r")
except:
print("There is no file named", filename)
else:
print(filename, 'has', len(f.readlines()), 'lines')
f.close()
The finally
clause of the try
statement
A common programming pattern is to grab a resource of some kind — e.g. we may open a file for writing. Then we perform some computation which may raise an exception, or may work without any problems.
Whatever happens, we want to “clean up” the resources we grabbed — e.g. close the file.
The finally
clause of the try
statement is the way to do just this.
def divide(x, y):
try:
result = x / y
except ZeroDivisionError:
print("division by zero!")
else:
print("result is", result)
finally:
print("executing finally clause")
divide(10,0)
divide(10,1)
Correct way to open a file, looks dirty
try:
f = open('xxx')
except:
do something
else:
try:
do something
except:
do something
finally:
f.close()
with
statement¶The elegant way to open a file, do something, then automatically close the file:
try:
with open( "data.txt" ) as f :
data = f.read()
do something
except Exception as err :
print("error: {0}".format(err))
try:
with open( "foobar.py" ) as f :
s = f.readline()
i = int(s.strip())
except OSError as err :
print("OS error: {0}".format(err))
except Exception as err:
print(err)
How to persitent and restore any Python object?
o = {'k1':(1,'Name', 88.5, [1,2])}
The pickle module implements binary protocols for serializing and de-serializing a Python object structure.
Pickling: convert an object hierarchy into a byte stream
Unpickling: convert a byte stream into an object hierarchy
import pickle
o = {'k1':(1,'Name', 88.5, [1,2])}
os = pickle.dumps(o)
print(os)
o2 = pickle.loads(os)
print(o2)
print(o == o2)
print(o is o2)
pickling an object into a file
import pickle
o = {'k1':(1,'Name', 88.5, [1,2])}
with open('pickle-example.p', 'wb') as pfile:
pickle.dump(o, pfile)
import pickle
with open('pickle-example.p', 'rb') as pfile:
objs = pickle.load(pfile)
print(objs)
Limitation of pickle
Although pickle reads and writes file objects, it does not handle the issue of naming persistent objects, nor the (even more complicated) issue of concurrent access to persistent objects.
The shelve module provides a simple interface to pickle and unpickle objects on DBM-style database files.
A shelf is a persistent, dictionary-like object.
shelf interface summation
import shelve
d = shelve.open(filename) # open -- file may get suffix added by low-level
# library
d[key] = data # store data at key (overwrites old data if
# using an existing key)
data = d[key] # retrieve a COPY of data at key (raise KeyError
# if no such key)
del d[key] # delete data stored at key (raises KeyError
# if no such key)
flag = key in d # true if the key exists
klist = list(d.keys()) # a list of all existing keys (slow!)
# as d was opened WITHOUT writeback=True, beware:
d['xx'] = [0, 1, 2] # this works as expected, but...
d['xx'].append(3) # *this doesn't!* -- d['xx'] is STILL [0, 1, 2]!
# having opened d without writeback=True, you need to code carefully:
temp = d['xx'] # extracts the copy
temp.append(5) # mutates the copy
d['xx'] = temp # stores the copy right back, to persist it
# or, d=shelve.open(filename,writeback=True) would let you just code
# d['xx'].append(5) and have it work as expected, BUT it would also
# consume more memory and make the d.close() operation slower.
d.close() # close it
import shelve
with shelve.open('shelf-example', 'c') as shelf:
shelf['list'] = [1,2,3,[4]]
shelf['int'] = 100
shelf['str'] = 'string'
shelf['tuple'] =(1.0,2.0)
shelf['dict'] = {'a':1, 'b':2}
import shelve
with shelve.open('shelf-example', 'r') as shelf:
for key in shelf.keys():
print(key, shelf[key])
open
You must open a file before you can read its contents.
close
When you are done with a file, you should close it.
read
Will read the entire contents of a file as a string. This is often used in an assignment statement so that a variable can reference the contents of the file.
readline
Will read a single line from the file, up to and including the first instance of the newline character.
readlines
Will read the entire contents of a file into a list where each line of the file is a string and is an element in the list.
write
Will add characters to the end of a file that has been opened for writing.
exception
An error that occurs at runtime.
handle an exception
To prevent an exception from causing our program to crash, by wrapping the block of code in a try ... except construct.