9. Strings

A Collection Data Type

  • Simple or primitive data types: int, float, bool

    • cannot be broken down
  • Collection data types: str, list

    • are made up of smaller pieces
Types that are comprised of smaller pieces are called collection data types.

String: sequential collection of characters

The individual characters that make up the string are assumed to be in a particular order from left to right.

Empy string:

''

or

""

Operations on Strings

illegal string ops (message is str):

message - 1
"Hello" / 123
message * "Hello"
"15" + 2

String ops: string + string, string*integer

In [ ]:
fruit = "banana"
bakedGood = " nut bread"
print(fruit + bakedGood)
In [ ]:
print("Go" * 6)

name = "Packers"
print(name * 3)

print(name + "Go" * 3)

print((name + "Go") * 3)

Exercise

1. What is printed by the following statements?
s = "python"
t = "rocks"
print(s + t)
(A) python rocks
(B) python
(C) pythonrocks
(D) Error, you cannot add two strings together.

2. What is printed by the following statements?

s = "python"
excl = "!"
print(s+excl*3)
(A) python!!!
(B) python!python!python!
(C) pythonpythonpython!
(D) Error, you cannot perform concatenation and repetition at the same time.

Index Operator

The indexing operator (Python uses square brackets [] to enclose the index) selects a single character from a string.

The characters are accessed by their position or index value.

In [ ]:
school = "Peking University"
m = school[2]
print(m)

lastchar = school[-1]
print(lastchar)

Exercise

1. What is printed by the following statements?
s = "python rocks"
print(s[3])
(A) t
(B) h
(C) c
(D) Error, you cannot use the [ ] operator with a string.
2. What is printed by the following statements?
s = "python rocks"
print(s[2] + s[-5])
(A) tr
(B) ps
(C) nn
(D) Error, you cannot use the [ ] operator with the + operator.

String Methods

In [ ]:
ss = "Hello, World"
print(ss.upper())

tt = ss.lower()
print(tt)
Method Parameters Description
upper none Returns a string in all uppercase
lower none Returns a string in all lowercase
capitalize none Returns a string with first character capitalized
strip none Returns a string with the leading and trailing whitespace removed
lstrip none Returns a string with the leading whitespace removed
rstrip none Returns a string with the trailing whitespace removed
count item Returns the number of occurrences of item
Method Parameters Description
replace old, new Replaces all occurrences of old substring with new
center width Returns a string centered in a field of width spaces
ljust width Returns a string left justified
rjust width Returns a string right justified
find item Returns the leftmost index if item is found
rfind item Returns the rightmost index
index item Like find except causes a runtime error if item is not found
rindex item Like rfind except causes a runtime error if item is not found
In [ ]:
ss = "    Hello, World    "

els = ss.count("l")
print(els)

print("***" + ss.strip() + "***")
print("***" + ss.lstrip() + "***")
print("***" + ss.rstrip() + "***")

news = ss.replace("o", "***")
print(news)
In [ ]:
food = "banana bread"
print(food.capitalize())
print("*" + food.center(25) + "*")
print("*" + food.ljust(25) + "*")
print("*" + food.rjust(25) + "*")
print(food.find("b"))
print(food.rfind("b"))
print(food.index("z"))

Exercise

1. What is printed by the following statements?
s = "python rocks"
print(s.count("o") + s.count("p"))
(A) 0
(B) 2
(C) 3
2. What is printed by the following statements?
s = "python rocks"
print(s[1] * s.index("n"))
(A) yyyyy
(B) 55555
(C) n
(D) Error, you cannot combine all those things together.

Length

In [ ]:
fruit = "Banana"
print(len(fruit))
In [ ]:
fruit = "Banana"
sz = len(fruit)
last = fruit[sz]  # ERROR!
print(last)
In [ ]:
fruit = "Banana"
sz = len(fruit)
lastch = fruit[sz-1]
print(lastch)

Exercise

1. What is printed by the following statements?
s = "python rocks"
print(len(s))
(A) 11
(B) 12
2. What is printed by the following statements?
s = "python rocks"
print(s[len(s)-5])
(A) o
(B) r
(C) s
(D) Error, len(s) is 12 and there is no index 12.

The Slice Operator

A substring of a string is called a slice.

In [ ]:
singers = "Peter, Paul, and Mary"
print(singers[0:5])
print(singers[7:11])
print(singers[17:21])
In [ ]:
fruit = "banana"
print(fruit[:3])
print(fruit[3:])

Exercise

1. What is printed by the following statements?
s = "python rocks"
print(s[3:8])
(A) python
(B) rocks
(C) hon r
(D) Error, you cannot have two numbers inside the [ ].
2. Output a given string in reverse order. E.g., s = 'boy', output 'yob'

String Comparison

In [ ]:
word = "banana"
if word == "banana":
    print("Yes, we have bananas!")
else:
    print("Yes, we have NO bananas!")
In [ ]:
word = "zebra"

if word < "banana":
    print("Your word, " + word + 
          ", comes before banana.")
elif word > "banana":
    print("Your word, " + word + 
          ", comes after banana.")
else:
    print("Yes, we have no bananas!")
In [ ]:
print("apple" < "banana")

print("apple" == "Apple")
print("apple" < "Apple")
In [ ]:
print(ord("A"))
print(ord("B"))
print(ord("5"))

print(ord("a"))
print("apple" > "Apple")
In [ ]:
print(chr(65))
print(chr(66))

print(chr(49))
print(chr(53))

print("The character for 32 is", chr(32), "!!!")
print(ord(" "))

Exercise

1. Evaluate the following comparison:
"Dog" < "Doghouse"
(A) True
(B) False
2. Evaluate the following comparison:
"dog" < "Dog"
(A) True
(B) False
(C) They are the same word
3. 两小孩进行军棋大战, 每次各方从军长、师长、旅长、团长、营长、连长、排长中随意选出三子进行三局对战, 对战时依次按照职务大小进行比较决定胜负, 每局胜得3分, 平得1分, 负得0分, 三局总分高胜出. 如 ['排长', '军长', '连长'] < ['连长', '师长', '旅长']. 写一个Python程序, 实现上述功能.

Strings are Immutable

In [ ]:
greeting = "Hello, world!"
greeting[0] = 'J'            # ERROR!
print(greeting)

Strings are immutable, which means you cannot change an existing string.

The best you can do is create a new string that is a variation on the original.

In [ ]:
greeting = "Hello, world!"
newGreeting = 'J' + greeting[1:]
print(newGreeting)
print(greeting)            # same as it was
In [ ]:
s='hello'
t=s.replace('l','g')
print(s)
print(t)

Exercise

What is printed by the following statements:
s = "Ball"
s[0] = "C"
print(s)
(A) Ball
(B) Call
(C) Error

Traversal by item

A lot of computations involve processing a collection one item at a time.

For strings, often we:
start at the beginning, select each character in turn, 
    do something to it,         
    and continue until the end.

This pattern of processing is called a traversal.

In [ ]:
for aname in ["Joe", "Amy", "Brad", "Angelina", 
              "Zuki", "Thandi", "Paris"]:
    invitation = "Hi " + aname + \
        ".  Please come to my party on Saturday!"
    print(invitation)
In [ ]:
for avalue in range(10):
    print(avalue)
In [ ]:
for achar in "Go Spot Go":
    print(achar)

Exercise

1. How many times is the word HELLO printed by the following statements?
s = "python rocks"
for ch in s:
    print("HELLO")
(A) 10
(B) 11
(C) 12
(D) Error, the for statement needs to use the range function.
2. How many times is the word HELLO printed by the following statements?
s = "python rocks"
for ch in s[3:8]:
    print("HELLO")
(A) 4
(B) 5
(C) 6
(D) Error, the for statement cannot use slice.

Traversal by index

In [ ]:
fruit = "apple"
for idx in range(5):
    currentChar = fruit[idx]
    print(currentChar)
In [ ]:
from IPython.display import IFrame
IFrame("http://pythontutor.com/iframe-embed.html#code=fruit%20%3D%20%22apple%22%0Afor%20idx%20in%20range(5%29%3A%0A%20%20%20%20currentChar%20%3D%20fruit%5Bidx%5D%0A%20%20%20%20print(currentChar%29&codeDivHeight=400&codeDivWidth=350&cumulative=false&curInstr=0&heapPrimitives=false&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false", width='100%', height=450)
In [ ]:
fruit = "apple"
for idx in range(len(fruit)):
    print(fruit[idx])
In [ ]:
fruit = "apple"
for idx in range(len(fruit)-1, -1, -1):
    print(fruit[idx])

Exercise

去除字符串多次出现的字母,仅留最先出现的一个。例 'abcabb',经过去除后,输出 'abc'

Exercise

How many times is the letter o printed by the following statements?
s = "python rocks"
for idx in range(len(s)):
    if idx % 2 == 0:
        print(s[idx])
(A) 0
(B) 1
(C) 2
(D) Error, the for statement cannot have an if inside.

Traversal and the while Loop

In [ ]:
fruit = "apple"

position = 0
while position < len(fruit):
    print(fruit[position])
    position = position + 1
In [ ]:
from IPython.display import IFrame
IFrame("http://pythontutor.com/iframe-embed.html#code=fruit%20%3D%20%22apple%22%0A%0Aposition%20%3D%200%0Awhile%20position%20%3C%20len(fruit%29%3A%0A%20%20%20%20print(fruit%5Bposition%5D%29%0A%20%20%20%20position%20%3D%20position%20%2B%201&codeDivHeight=400&codeDivWidth=350&cumulative=false&curInstr=0&heapPrimitives=false&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false", width='100%', height=450)

Exercise

How many times is the letter o printed by the following statements?
s = "python rocks"
idx = 1
while idx < len(s):
    print(s[idx])
    idx = idx + 2
(A) 0
(B) 1
(C) 2

The in and not in operators

The in operator tests if one string is a substring of another.

In [ ]:
print('a' in 'a')
print('apple' in 'apple')
print('' in 'a')
print('' in 'apple')
In [ ]:
print('x' not in 'apple')

The Accumulator Pattern with Strings

In [ ]:
def removeVowels(s):
    vowels = "aeiouAEIOU"
    noVowels = ""
    for eachChar in s:
        if eachChar not in vowels:
            noVowels = noVowels + eachChar
    return noVowels

print(removeVowels("compsci"))
print(removeVowels("aAbEefIijOopUus"))

Alternatives to: if eachChar not in vowels

if eachChar != 'a'  and eachChar != 'e'  and eachChar != 'i'  and
   eachChar != 'o'  and eachChar != 'u'  and eachChar != 'A'  and
   eachChar != 'E'  and eachChar != 'I'  and eachChar != 'O'  and
   eachChar != 'U':

     noVowels = noVowels + eachChar

Exercise

给定两个字符串s1和s2, 删除s1中所有在s2中出现过的字符

def remove2(s1:str ,s2:str)->str:
    '''remove all chars in s2 from s1 and return the result'''

Exercise

What is printed by the following statements:
s = "ball"
r = ""
for item in s:
    r = item.upper() + r
print(r)
(A) Ball
(B) BALL
(C) LLAB

Looping and Counting

In [ ]:
def count(text, aChar):
    lettercount = 0
    for c in text:
        if c == aChar:
            lettercount = lettercount + 1
    return lettercount

print(count("banana","a"))

A find function

In [ ]:
def find(astring, achar):
    """
    Find and return the index of achar in astring.
    Return -1 if achar does not occur in astring."""
    ix = 0
    found = False
    while ix < len(astring) and not found:
        if astring[ix] == achar:
            found = True
        else:
            ix = ix + 1
    if found:
        return ix
    else:
        return -1
In [ ]:
print(find("Compsci", "p"))
print(find("Compsci", "C"))
print(find("Compsci", "i"))
print(find("Compsci", "x"))
In [ ]:
print("Compsci".find("p"))
print("Compsci".find("C"))
print("Compsci".find("i"))
print("Compsci".find("x"))

Optional parameters

In [ ]:
def find2(astring, achar, start):
    """
    Find and return the index of achar in astring.
    Return -1 if achar does not occur in astring."""
    ix = start
    found = False
    while ix < len(astring) and not found:
        if astring[ix] == achar:
            found = True
        else:
            ix = ix + 1
    if found:
        return ix
    else:
        return -1
In [ ]:
print(find2('banana', 'a', 2))

print('banana'.find('a',2))
In [ ]:
def find3(astring, achar, start=0):
    """find and return index of achar"""
    ix = start
    found = False
    while ix < len(astring) and not found:
        if astring[ix] == achar:
            found = True
        else:
            ix = ix + 1
    if found:
        return ix
    else:
        return -1
In [ ]:
print(find3('banana', 'a', 2))
In [ ]:
def find4(astring, achar, start=0, end=None):
    """Find and return the index of achar"""
    ix = start
    if end == None:
        end = len(astring)
    found = False
    while ix < end and not found:
        if astring[ix] == achar:
            found = True
        else:
            ix = ix + 1
    if found:
        return ix
    else:
        return -1
In [ ]:
ss = "Python strings have some interesting methods."

print(find4(ss, 's'))
print(find4(ss, 's', 7))
print(find4(ss, 's', 8))
print(find4(ss, 's', 8, 13))
print(find4(ss, '.'))
In [ ]:
ss = "Python strings have some interesting methods."

print(ss.find('s'))
print(ss.find('s', 7))
print(ss.find('s', 8))
print(ss.find('s', 8, 13))
print(ss.find('.'))

Exercise

给定一个字符串, 输出出现次数最多的那个字符及其出现次数.

Character classification

In [ ]:
import string

print(string.ascii_lowercase)
print(string.ascii_uppercase)
print(string.digits)
print(string.punctuation)
print(string.whitespace)
In [ ]:
print('ab'.isalpha())
print('123'.isdigit())
print('str'.isidentifier())
print(' '.isspace())
print('ab'.islower())

Join, replace, partition, split

In [ ]:
', '.join(['We','Are','Friends'])
In [ ]:
'Hello'.replace('e', 'a')
In [ ]:
'We are friends'.partition(' ')
In [ ]:
'We are friends'.rpartition(' ')
In [ ]:
'We are friends'.split(' ')
In [ ]:
name_value = input('input the name and value in one line, separated by space: ')
name, value = name_value.split(' ')
print(name, value)
In [ ]:
print('1 2 3'.split())
print('1 2 3'.split(maxsplit=1))
print('  1   2   3   '.split())
In [ ]:
name_value = input('input the name and value in one line, separated by space: ')
name, value = name_value.split()
print(name, value)

line boundaries

Representation Description
\n Line Feed
\r Carriage Return
\r\n Carriage Return + Line Feed
In [ ]:
print('ab c\n\nde fg\rkl\r\n'.splitlines())
print('ab c\n\nde fg\rkl\r\n'.splitlines(
        keepends=True))
print("".splitlines())
print("One line\n".splitlines())
print('Two lines\n'.split('\n'))

String formating: old

format % values

format: string

values: a single non-tuple value, or a tuple with exactly the number of items specified by the format string, or a single mapping object (for example, a dictionary)

In [ ]:
print('I love %s' % 'basketball')
print('I love %s in %d, %8.2f' % 
      ('basketball', 2016, 200.56))
print('%(language)s has %(number)03d quote types.' 
      % {'language': "Python", "number": 2})

String formating: new

str.format(*args, **kwargs)
In [ ]:
print('I love {}'.format('basketball'))
print('I love {} in {:d}, {:>8.2f}'.format(
        'basketball', 2016, 200.56))
print('{language} has {number:03d} quote types.'.
      format(
           **{'language': "Python", "number": 2}))
In [ ]:
print('{0}, {1}, {2}'.format('a', 'b', 'c'))
print('{}, {}, {}'.format('a', 'b', 'c'))
print('{2}, {1}, {0}'.format('a', 'b', 'c'))
print('{2}, {1}, {0}'.format(*'abc'))
print('{0}{1}{0}'.format('abra', 'cad'))
In [ ]:
print('Coordinates: {latitude}, {longitude}'.
      format(latitude='37.24N', 
             longitude='-115.81W'))

coord = {'latitude': '37.24N', 
         'longitude': '-115.81W'}
print('Coordinates: {latitude}, {longitude}'.
      format(**coord))
In [ ]:
print('{:<30}'.format('left aligned'))
print('{:>30}'.format('right aligned'))
print('{:^30}'.format('centered'))
print('{:*^30}'.format('centered'))
print('{:#^30}'.format('centered'))
In [ ]:
# format also supports binary numbers
"int: {0:d};  hex: {0:x};  oct: {0:o};\
  bin: {0:b}".format(42)
In [ ]:
# with 0x, 0o, or 0b as prefix:
"int: {0:d};  hex: {0:#x};  oct: {0:#o};  \
bin: {0:#b}".format(42)
In [ ]:
# Using the comma as a thousands separator
'{:,}'.format(1234567890)
In [ ]:
# Expressing a percentage:

points = 19
total = 22
'Correct answers: {:.2%}'.format(points/total)
In [ ]:
# Using type-specific formatting:

import datetime
d = datetime.datetime(2010, 7, 4, 12, 15, 58)
print(d)
print('{:%Y/%m/%d %H:%M:%S}'.format(d))
print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.
In [ ]:
for i in range(100):
    if i % 2 == 0:
        print(i)
In [ ]:
for i in range(100):
    if i % 2 == 0:
        print(i, end=" ")
In [ ]:
c = 0
for i in range(100):
    if i % 2 == 0:
        c += 1
        print(i, end=" ")
        if c == 5:
            c = 0
            print('\n')
In [ ]:
c = 0
for i in range(100):
    if i % 2 == 0:
        c += 1
        print(i, end="\t")
        if c == 5:
            c = 0
            print('\n')
In [ ]:
for i in range(100):
    if i % 2 == 0:
        print(i, end=",")

Exercise

1. Count how many chars(with/without spaces), words, lines in a text string

Summary

indexing ([])

Access a single character in a string using its position (starting from 0). Example: 'This'[2] evaluates to 'i'.

length function (len)

Returns the number of characters in a string. Example: len('happy') evaluates to 5.

for loop traversal (for)

Traversing a string means accessing each character in the string, one at a time. For example, the following for loop:

for ix in 'Example':
    ...

executes the body of the loop 7 times with different values of ix each time.

slicing ([:])

A slice is a substring of a string. Example: 'bananas and cream'[3:6] evaluates to ana (so does 'bananas and cream'[1:4]).

string comparison (>, <, >=, <=, ==, !=)

The six common comparision operators work with strings, evaluating according to lexigraphical order. Examples: 'apple' < 'banana' evaluates to True. 'Zeta' < 'Appricot' evaluates to False. 'Zebra' <= 'aardvark' evaluates to True because all upper case letters precede lower case letters.

in and not in operator (in, not in)

The in operator tests whether one string is contained inside another string. Examples: 'heck' in "I'll be checking for you." evaluates to True. 'cheese' in "I'll be checking for you." evaluates to False.

Exercise

1. Judge if a "word" W is within a main "text string" S

2. Find the common sub str of str1 and str2

In [ ]: