11. Files, Exceptions

About Files

  • While a program is running, its data is stored in random access memory (RAM), which is volatile, i.e., when the program ends, data in RAM disappears.

  • To make data available the next time the program is started, it has to be written to a non-volatile storage medium.

  • Data on non-volatile storage media is stored in named locations on the media called files. By reading and writing files, programs can save information between program runs.

Working with files

In Python, we must open files before we can use them and close them when we are done with them.

Method Use Explanation
open open(filename,'r') Open a file called filename and use it for reading. This will return a reference to a file object.
open open(filename,'w') Open a file called filename and use it for writing. This will also return a reference to a file object.
close filevariable.close() File use is complete.

Reading & writing methods

Method Use Explanation
write filevar.write(astring) Add astring to the end of the file. filevar must refer to a file that has been opened for writing.
read(n) filevar.read() Reads and returns a string of n characters, or the entire file as a single string if n is not provided.
readline(n) filevar.readline() Returns the next line of the file with all text up to and including the newline character. If n is provided as a parameter then only n characters will be returned if the line is longer than n.
readlines(n) filevar.readlines() Returns a list of strings, each representing a single line of the file. If n is not provided then all lines of the file are returned. If n is provided then n characters are read but n is rounded up so that an entire line is returned.

Writing to our first file

In [ ]:
myfile = open("test.txt", "w")
myfile.write("My first file written from Python\n")
myfile.write("---------------------------------\n")
myfile.write("Hello, world!\n")
myfile.close()
In [ ]:
%%bash
ls
cat test.txt

Finding a File on your Disk

myfile = open("test.txt", "w")

myfile = open("subdir/test.txt", "w")

myfile = open("../../chap2/test.txt", "w")

Reading a File

Suppose we have a text file called qbdata.txt that contains the following data representing statistics about NFL quarterbacks.

data format:

First Name, Last Name, Position, Team, Completions, Attempts, Yards, TDs, Ints, Comp%, Rating

Colt McCoy QB CLE  135 222 1576    6   9   60.8%   74.5
Josh Freeman QB TB 291 474 3451    25  6   61.4%   95.9
Michael Vick QB PHI    233 372 3018    21  6   62.6%   100.2
Matt Schaub QB HOU 365 574 4370    24  12  63.6%   92.0
Philip Rivers QB SD    357 541 4710    30  13  66.0%   101.8
Matt Hasselbeck QB SEA 266 444 3001    12  17  59.9%   73.2
Jimmy Clausen QB CAR   157 299 1558    3   9   52.5%   58.4
Joe Flacco QB BAL  306 489 3622    25  10  62.6%   93.6
Kyle Orton QB DEN  293 498 3653    20  9   58.8%   87.5
Jason Campbell QB OAK  194 329 2387    13  8   59.0%   84.5
Peyton Manning QB IND  450 679 4700    33  17  66.3%   91.9
Drew Brees QB NO   448 658 4620    33  22  68.1%   90.9
Matt Ryan QB ATL   357 571 3705    28  9   62.5%   91.0
Matt Cassel QB KC  262 450 3116    27  7   58.2%   93.0
Mark Sanchez QB NYJ    278 507 3291    17  13  54.8%   75.3
Brett Favre QB MIN 217 358 2509    11  19  60.6%   69.9
David Garrard QB JAC   236 366 2734    23  15  64.5%   90.8
Eli Manning QB NYG 339 539 4002    31  25  62.9%   85.3
Carson Palmer QB CIN   362 586 3970    26  20  61.8%   82.4
Alex Smith QB SF   204 342 2370    14  10  59.6%   82.1
Chad Henne QB MIA  301 490 3301    15  19  61.4%   75.4
Tony Romo QB DAL   148 213 1605    11  7   69.5%   94.9
Jay Cutler QB CHI  261 432 3274    23  16  60.4%   86.3
Jon Kitna QB DAL   209 318 2365    16  12  65.7%   88.9
Tom Brady QB NE    324 492 3900    36  4   65.9%   111.0
Ben Roethlisberger QB PIT  240 389 3200    17  5   61.7%   97.0
Kerry Collins QB TEN   160 278 1823    14  8   57.6%   82.2
Derek Anderson QB ARI  169 327 2065    7   10  51.7%   65.9
Ryan Fitzpatrick QB BUF    255 441 3000    23  15  57.8%   81.8
Donovan McNabb QB WAS  275 472 3377    14  15  58.3%   77.1
Kevin Kolb QB PHI  115 189 1197    7   7   60.8%   76.1
Aaron Rodgers QB GB    312 475 3922    28  11  65.7%   101.2
Sam Bradford QB STL    354 590 3512    18  15  60.0%   76.5
Shaun Hill QB DET  257 416 2686    16  12  61.8%   81.3

Now create qbdata.txt

In [ ]:
%%bash
cat << _EOF > qbdata.txt
Colt McCoy QB CLE  135 222 1576    6   9   60.8%   74.5
Josh Freeman QB TB 291 474 3451    25  6   61.4%   95.9
Michael Vick QB PHI    233 372 3018    21  6   62.6%   100.2
Matt Schaub QB HOU 365 574 4370    24  12  63.6%   92.0
Philip Rivers QB SD    357 541 4710    30  13  66.0%   101.8
Matt Hasselbeck QB SEA 266 444 3001    12  17  59.9%   73.2
Jimmy Clausen QB CAR   157 299 1558    3   9   52.5%   58.4
Joe Flacco QB BAL  306 489 3622    25  10  62.6%   93.6
Kyle Orton QB DEN  293 498 3653    20  9   58.8%   87.5
Jason Campbell QB OAK  194 329 2387    13  8   59.0%   84.5
Peyton Manning QB IND  450 679 4700    33  17  66.3%   91.9
Drew Brees QB NO   448 658 4620    33  22  68.1%   90.9
Matt Ryan QB ATL   357 571 3705    28  9   62.5%   91.0
Matt Cassel QB KC  262 450 3116    27  7   58.2%   93.0
Mark Sanchez QB NYJ    278 507 3291    17  13  54.8%   75.3
Brett Favre QB MIN 217 358 2509    11  19  60.6%   69.9
David Garrard QB JAC   236 366 2734    23  15  64.5%   90.8
Eli Manning QB NYG 339 539 4002    31  25  62.9%   85.3
Carson Palmer QB CIN   362 586 3970    26  20  61.8%   82.4
Alex Smith QB SF   204 342 2370    14  10  59.6%   82.1
Chad Henne QB MIA  301 490 3301    15  19  61.4%   75.4
Tony Romo QB DAL   148 213 1605    11  7   69.5%   94.9
Jay Cutler QB CHI  261 432 3274    23  16  60.4%   86.3
Jon Kitna QB DAL   209 318 2365    16  12  65.7%   88.9
Tom Brady QB NE    324 492 3900    36  4   65.9%   111.0
Ben Roethlisberger QB PIT  240 389 3200    17  5   61.7%   97.0
Kerry Collins QB TEN   160 278 1823    14  8   57.6%   82.2
Derek Anderson QB ARI  169 327 2065    7   10  51.7%   65.9
Ryan Fitzpatrick QB BUF    255 441 3000    23  15  57.8%   81.8
Donovan McNabb QB WAS  275 472 3377    14  15  58.3%   77.1
Kevin Kolb QB PHI  115 189 1197    7   7   60.8%   76.1
Aaron Rodgers QB GB    312 475 3922    28  11  65.7%   101.2
Sam Bradford QB STL    354 590 3512    18  15  60.0%   76.5
Shaun Hill QB DET  257 416 2686    16  12  61.8%   81.3
_EOF

Iterating over lines in a file

In [ ]:
qbfile = open("qbdata.txt", "r")

for aline in qbfile:
    values = aline.split()
    print('QB ', values[0], values[1], 
          'had a rating of ', values[10] )

qbfile.close()

Alternative File Reading Methods

In [ ]:
>>> infile = open("qbdata.txt", "r")
>>> aline = infile.readline()
>>> print(aline)

>>> infile = open("qbdata.txt", "r")
>>> linelist = infile.readlines()
>>> print(len(linelist))
>>> print(linelist[0:4])
In [ ]:
infile = open("qbdata.txt", "r")
line = infile.readline()
while line:
    values = line.split()
    print('QB ', values[0], values[1], 
          'had a rating of ', values[10] )
    line = infile.readline()

infile.close()
In [ ]:
infile = open("qbdata.txt", "r")
while True:
    line = infile.readline()
    if len(line) == 0:   # If there are no more lines
        break
        
    values = line.split()
    print('QB ', values[0], values[1], 'had a rating of ', values[10] )

infile.close()

Turning a file into a list of lines

The readlines method in reads all the lines and returns a list of the strings.

Alert:
The data file size should be very small so as to be loaded into RAM.
In [ ]:
infile = open("qbdata.txt", "r")
lines = infile.readlines()
infile.close()

for line in lines:
    values = line.split()
    print('QB ', values[0], values[1], 
          'had a rating of ', values[10] )

Reading the whole file at once

In [ ]:
infile = open("qbdata.txt", "r")
content = infile.read()
infile.close()

words = content.split()
print("There are {0} words in the file.".
      format(len(words)))

Writing Text Files

One of the most commonly performed data processing tasks is to read data from a file, manipulate it in some way, and then write the resulting data out to a new data file to be used for other purposes later.

Now we will save QB Names to a new file.

In [ ]:
infile = open("qbdata.txt", "r")
aline = infile.readline()
while aline:
    items = aline.split()
    dataline = items[1] + ',' + items[0]
    print(dataline)
    aline = infile.readline()

infile.close()
In [ ]:
infile = open("qbdata.txt", "r")
outfile = open("qbnames.txt", "w")

aline = infile.readline()
while aline:
    items = aline.split()
    dataline = items[1] + ',' + items[0]
    outfile.write(dataline + '\n')
    aline = infile.readline()

infile.close()
outfile.close()
In [ ]:
%%bash

cat qbnames.txt
Note:
When writing data to files, it is the programmers job to include the newline characters as part of the string if desired.

Fetching something from the web

In [ ]:
import urllib.request

url = "https://caodg.github.io/ic/demos/foobar.py"
destination_filename = "foobar.py"

urllib.request.urlretrieve(url, destination_filename)
In [ ]:
%%bash
ls foo*

A slightly different example

Rather than save the web resource to our local disk, we read it directly into a string, and return it:

In [ ]:
import urllib.request

def retrieve_page(url):
    """ Retrieve the contents(bytes) of a web page.
        The contents is decoded to a string before returning it.
    """
    my_socket = urllib.request.urlopen(url)
    dta = my_socket.read().decode()
    my_socket.close()
    return dta

the_text = retrieve_page("https://caodg.github.io/ic/demos/foobar.py")
print(the_text)

Openning an non-existent file

In [ ]:
file = open('notexistfile')

Other exceptions

In [ ]:
print(55/0)
In [ ]:
a = []
print(a[5])
In [ ]:
'a'[0] = 'b'

Handle exceptions

Sometimes we want to execute an operation that might cause an exception, but we don’t want the program to stop. We can handle the exception using the try statement to “wrap” a region of code.

In [ ]:
filename = input("Enter a file name: ")
try:
    f = open(filename, "r")
except:
    print("There is no file named", filename)
In [ ]:
filename = input("Enter a file name: ")
try:
    f = open(filename, "r")
except Exception: # general exception
    print("There is no file named", filename)

handle selected exceptions

In [ ]:
while True:
    try:
        x = int(input("Please enter a number: "))
        break
    except ValueError:
        print("Oops!  That was no valid number.  Try again...")
In [ ]:
while True:
    try:
        x = int(input("Please enter a number: "))
        break
    except ValueError:
        print("Oops!  That was no valid number.  Try again...")
    except KeyboardInterrupt:
        print("Oops!  Keyboard Interrupted")
In [3]:
import sys

try:
    f = open('foobar.py')
    s = f.readline()
    i = int(s.strip())
except OSError as err:
    print("OS error: {0}".format(err))
except ValueError as err:
    print("Could not convert data to an integer: {0}".format(err))
except:
    print("Unexpected error:", sys.exc_info()[0])
    raise
Could not convert data to an integer: invalid literal for int() with base 10: '#!/usr/bin/env python3'

else clause

The try ... except statement has an optional else clause, which, when present, must follow all except clauses.

It is useful for code that must be executed if the try clause does not raise an exception.

In [6]:
filename = input("Enter a file name: ")
try:
    f = open(filename, "r")
except:
    print("There is no file named", filename)
else:
    print(filename, 'has', len(f.readlines()), 'lines')
    f.close()
Enter a file name: foobar.py
foobar.py has 26 lines

The finally clause of the try statement

A common programming pattern is to grab a resource of some kind — e.g. we may open a file for writing. Then we perform some computation which may raise an exception, or may work without any problems.

Whatever happens, we want to “clean up” the resources we grabbed — e.g. close the file.

The finally clause of the try statement is the way to do just this.

In [7]:
def divide(x, y):
    try:
        result = x / y
    except ZeroDivisionError:
        print("division by zero!")
    else:
        print("result is", result)
    finally:
        print("executing finally clause")

divide(10,0)
divide(10,1)
division by zero!
executing finally clause
result is 10.0
executing finally clause

Correct way to open a file, looks dirty

try:
    f = open('xxx')
except:
    do something
else:   
    try:
        do something
    except:
        do something
    finally:
        f.close()

with statement

The elegant way to open a file, do something, then automatically close the file:

try:
    with open( "data.txt" ) as f :
        data = f.read()
        do something 
except Exception as err :
    print("error: {0}".format(err))
In [8]:
try:
    with open( "foobar.py" ) as f :
        s = f.readline()
        i = int(s.strip())
except OSError as err :
    print("OS error: {0}".format(err))
except Exception as err:
    print(err)
invalid literal for int() with base 10: '#!/usr/bin/env python3'

Object persistence

How to persitent and restore any Python object?

o = {'k1':(1,'Name', 88.5, [1,2])}

pickle — Python object serialization

The pickle module implements binary protocols for serializing and de-serializing a Python object structure.

Pickling: convert an object hierarchy into a byte stream

Unpickling: convert a byte stream into an object hierarchy

In [9]:
import pickle

o = {'k1':(1,'Name', 88.5, [1,2])}

os = pickle.dumps(o)

print(os)
b'\x80\x03}q\x00X\x02\x00\x00\x00k1q\x01(K\x01X\x04\x00\x00\x00Nameq\x02G@V \x00\x00\x00\x00\x00]q\x03(K\x01K\x02etq\x04s.'
In [10]:
o2 = pickle.loads(os)

print(o2)
print(o == o2)
print(o is o2)
{'k1': (1, 'Name', 88.5, [1, 2])}
True
False

pickling an object into a file

In [11]:
import pickle

o = {'k1':(1,'Name', 88.5, [1,2])}

with open('pickle-example.p', 'wb') as pfile:
    pickle.dump(o, pfile)
In [12]:
import pickle

with open('pickle-example.p', 'rb') as pfile:
    objs = pickle.load(pfile)
    print(objs)
{'k1': (1, 'Name', 88.5, [1, 2])}

Limitation of pickle

Although pickle reads and writes file objects, it does not handle the issue of naming persistent objects, nor the (even more complicated) issue of concurrent access to persistent objects.

shelve — Python object persistence

The shelve module provides a simple interface to pickle and unpickle objects on DBM-style database files.

A shelf is a persistent, dictionary-like object.

shelf interface summation

import shelve

d = shelve.open(filename)  # open -- file may get suffix added by low-level
                           # library

d[key] = data              # store data at key (overwrites old data if
                           # using an existing key)
data = d[key]              # retrieve a COPY of data at key (raise KeyError
                           # if no such key)
del d[key]                 # delete data stored at key (raises KeyError
                           # if no such key)

flag = key in d            # true if the key exists
klist = list(d.keys())     # a list of all existing keys (slow!)

# as d was opened WITHOUT writeback=True, beware:
d['xx'] = [0, 1, 2]        # this works as expected, but...
d['xx'].append(3)          # *this doesn't!* -- d['xx'] is STILL [0, 1, 2]!

# having opened d without writeback=True, you need to code carefully:
temp = d['xx']             # extracts the copy
temp.append(5)             # mutates the copy
d['xx'] = temp             # stores the copy right back, to persist it

# or, d=shelve.open(filename,writeback=True) would let you just code
# d['xx'].append(5) and have it work as expected, BUT it would also
# consume more memory and make the d.close() operation slower.

d.close()                  # close it
In [13]:
import shelve

with shelve.open('shelf-example', 'c') as shelf:
    shelf['list'] = [1,2,3,[4]]
    shelf['int'] = 100
    shelf['str'] = 'string'
    shelf['tuple'] =(1.0,2.0)
    shelf['dict'] = {'a':1, 'b':2}
In [14]:
import shelve

with shelve.open('shelf-example', 'r') as shelf:
    for key in shelf.keys():
        print(key, shelf[key])
list [1, 2, 3, [4]]
tuple (1.0, 2.0)
dict {'a': 1, 'b': 2}
int 100
str string

Glossary

open

You must open a file before you can read its contents.

close

When you are done with a file, you should close it.

read

Will read the entire contents of a file as a string. This is often used in an assignment statement so that a variable can reference the contents of the file.

readline

Will read a single line from the file, up to and including the first instance of the newline character.

readlines

Will read the entire contents of a file into a list where each line of the file is a string and is an element in the list.

write

Will add characters to the end of a file that has been opened for writing.

exception

An error that occurs at runtime.

handle an exception

To prevent an exception from causing our program to crash, by wrapping the block of code in a try ... except construct.

In [ ]: