Learn Programming with Python — Introduction to Compound Data Types: Lists

Posted on May 05, 2020 in Learn Python

Learn Programming with Python — Introduction to Compound Data Types: Lists

We’re finally getting to the heart of what makes Python magical! Lists are an enormously powerful tool for creating beautiful software.

Learn Programming with Python — Introduction to Compound Data Types: Lists

We’re finally getting to the heart of what makes Python magical! Lists are an enormously powerful tool for creating beautiful software.

What is the List Compound Data Type?

Lists are exactly what we might be expecting: the list compound data type contains an ordered list of objects. The list — unlike the set — may contain duplicates and is in a specific order. Unlike the tuple, the elements of the list itself don’t in total comprise a record.

Let’s say you spend some time recording the species of each bird you see from your window:

recorded_birds = ["crow", "sparrow", "sparrow", "robin", "crow", "sparrow"]

The contents of this list contains duplicates. In this way it is possible to know two separate things: 1) how many individual birds (six) and 2) how many different species (three) were observed. The set data type is only able to represent the number of species. In Python, we create a list using square brackets [] or by using the list() constructor. In Python, the list is similar to what other programming languages might call an array.

Let’s cast (convert) this list to a set:

In my code editor I get this output:

What’s going on here?

  • On line 1, I create the list of recorded_birds, each observed bird is an entry in the list.
  • On line 2 I cast the list to a set called recorded_species
  • On lines 3 and 4, I print the list and the set to the console
  • From the output, you can identify the list because it is enclosed in square brackets []. The original order is preserved.
  • From the output, you can identify the set because it is enclosed in curly brackets {}. The original order is not relevant to the set.

Constructing a List

The list() constructor takes a sequence and returns a list object. A list with zero elements is a perfectly fine list, too! This code is functionally identical to what you saw above:

recorded_birds = list(["crow", "sparrow", "sparrow", "sparrow", "crow", "robin"])

Try doing this in your code editor without using the square brackets: just pass list() a sequence of elements. It will fail like this:

>>> recorded_birds = list("crow", "sparrow", "sparrow", "sparrow", "crow", "robin")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list expected at most 1 arguments, got 6

Why? We passed six arguments to the list() constructor but we should have passed a single sequence with six elements. Silly!

It is easy to create a list of letters from all the letters in a string and then manipulate individual parts of the list.

In my code editor, I get this result:

What’s going on here?

  • On line 1, we define a new str called mytext, and assign familiar words to it.
  • On line 2, we define a new list called characters, and assign the output of list(mytext) to it. The list() constructor is receiving a string — a sequence of characters — and returns a list object, one element for every letter in mytext.
  • On line 4, we print() the length of characters — 13 including spaces and punctuation.
  • On line 5, we print() the element at index position 7 — a “W”. Don’t forget that Python indices begin with 0!
  • On line 6 ,we modify the list object by insert() ing a new element with the character “-” at index position 6.
  • On line 7, we append two exclamation marks to the end of the list as a new element (with two characters!).
  • On line 9, we print() the list. Every element has a single character except the last, which has two. Notice that square brackets are used to indicate that this is a list object.
  • On line 10, we use the join() method on the empty string “”, joining every element in characters. Our modified string is shown!

List Comprehension Demystified

A powerful way to construct a list is to enclose a looping statement inside square brackets:

characters:list = [current_letter for current_letter in mytext]

Wait, what!? What does this even mean? The key to understanding is in the following construct:

[looped_value for looped_value in source_sequence]
  • The square brackets indicate that we are building a list. You guessed correctly, we could use curly brackets to return a set.
  • looped_value is the element being added to to the list, and is also the same variable seen in each element source_sequence on each loop. We could write it asx for x if we don’t care about names.
  • source_sequence is the source of the elements.

It’s really just an awkward for loop, which we would usually begin with for looped_value in source_of_values.

We can add modifications to the returned value inside a list comprehension:

characters:list = [letter.title() for letter in mytext]

Which will perform the title() built-in function on every element returned.

We can also filter the values in the source sequence to skip over elements we dislike, such as non-letters:

characters:list = [letter for letter in mytext if letter.isalpha()]

This is what my program looks like now. The goal is to use list comprehension to return the alphabetical characters in a string — in uppercase!

This is what I get in my code editor:

What’s going on here?

  • On line 2, the big give-away is the square brackets []. This is a sure sign we are working with a list.
  • The second big give away is for inside []. We have a loop being used to create a list, therefore we have list comprehension happening! In front of our very eyes!
  • What follows in is the source of the elements for the list. mytext contains a string, each element of that string is a potential element of the list we’re constructing.
  • The for current_letter part indicates that current_letter contains the value being acted on every time we loop over an element in mytext.
  • The if statement says that we’ll only act upon current_letter if it meets the condition of isalpha().
  • The first statement, current_letter.title(), is considered last. It indicates each element being inserted into the list being built. For every element in mytext which passes the test of being an alphabetical letter we will return its title() case.

What would happen if we replaced the square brackets [] on line 2 with curly brackets {} ? Exactly! We’d be building a set, and sets only contain unique values in no particular order:

ELORDHW

OK, smarty pants. Now you’re thinking “hey, what about tuple comprehension?”. No. Sorry. Tuples don’t work like that. Do you recall that I claimed that tuple is used as a record data structure, with each element specifying some aspect of the record’s identity? Like the elements of a street address — only a fully defined tuple with all it’s identity attributes specified makes any kind of sense. However, if you really need to, it is entirely valid to pass a list as the argument to the tuple() constructor:

mytuple:tuple = tuple( [x for x in range(10)] )

Which uses list comprehension to create a tuple with 10 elements, 0 to 9.

Remember that we used curly brackets {} for a set and normal brackets for a tuple (). It is good to always remember SC-TN-LS — set is curly, tuple is normal and list is square.

Lists as a Data Structure

Think of a spreadsheet file, with multiple sheets each containing data in a table structure. If each sheet contains a data table with rows, each row can be represented, in Python, as an element in a list. Each horizontal line in the table is best represented by a tuple. Let’s take a snippet of a public data set (hospitals in the USA) provided to us as comma-separated values (CSV) document.

Hospital data in table format

In Python, each row can best be represented as a tuple. If we store each tuple in a list, we can represent the entire table data structure within the Python compound data type of list.

This is a fragment of a Python program which implements the table structure shown above as a list of namedtuples.

In my code editor, I see this:

What’s going on here?

  • On line 1 we’re importing the collections module, on line 2 the “Pretty Print” module. It produces output like print(), but prettier!
  • On line 4, using type hints we are creating an empty list named facilities.
  • On line 5 we’re defining a new namedtuple named medicare_facility and specifying which attributes it has. The same attributes as in the CSV file pictured above.
  • On lines 8 and 18 we’re appending a new element to the facilities list, made up of:
  • On lines 9 and 19 we’re creating a new namedtuple and specifying the values copied from the CSV file.
  • Line 28 prints the length — the number of elements — in the facilities list. We get 2 as output! Yay!
  • Line 29 uses pretty print to print out the entire list. You can see that the list is enclosed in square brackets. Each list element contains a facility namedtuple, the tuple’s values are enclosed in normal brackets.
  • On line 31 we use for to loop through every element in facilities
  • On line 32 we are printing the two attributes id and facility_name

What have we Achieved?

So much! If you’ve understood everything so far: kudos! You’re awesome!

  • You’ve learned that square brackets are a sure sign that a list is being used
  • You’ve learned that lists are mutable (they can be modified), they retain the original order of entry, and they may contain duplicates
  • You’ve learned that lists are often called arrays in other programming languages
  • You’ve seen how list elements can be accessed by using their 0-based index position
  • You’ve learned to cast between sets and lists, and tuples and lists
  • You’ve understood the concept of list comprehension
  • You’ve had a refresher on named tuples
  • You’ve earned a double gold star

Dictionaries will soon be explored. You’ll need all of the knowledge acquired so far when we tackle this next beast :)

Articles in this series so far: