Learn Programming with Python — Introduction to Compound Data Types: Sets and Tuples

Posted on May 03, 2020 in Learn Python

Learn Programming with Python — Introduction to Compound Data Types: Sets and Tuples

Let’s take a close look at compound data types in Python. Sets and tuples allow us to build and use richer, more expressive programs.

Photo credit to @kokilsh https://www.pexels.com/@kokilsh

Learn Programming with Python — Introduction to Compound Data Types: Sets and Tuples

Let’s take a close look at compound data types in Python. Sets and tuples allow us to build and use richer, more expressive programs.

The Set Data Type in Python

In a previous instalment, I introduced you to the set compound data type. A set is an unordered collection of objects which share something in common.

Here is a set, which we can iterate over by passing it to a for loop statement:

The set data type has some amazing capabilities! These are derived directly from the set theory branch of mathematics. You’ll be most familiar with this from what you already know about Venn diagrams.

A Venn Diagram of two intersecting sets.

Let’s create a bunch of sets and see how they work in Python.

In my code editor, I see the following:

What’s going on here?

  • On lines 1, 2, and 3 I each define a set (using a :set type hint) containing related elements. (Don’t shoot me if you’re a botanist, I’m doing my best!).
  • On line 6 I’m printing out the intersection of treefruit and cirtusfruit. This returns all elements appearing in both sets.
  • On lines 12 and 13 I’m also printing intersections.
  • On line 15 I’m creating the union — all the elements of two sets combined into a new set.
  • On line 18 I’m finding the difference — which of thestonefruit and treefruit are not citrusfruit.

We can modify the set in-place using the methods pop(), remove() and discard(). pop() returns a random element in the set, remove() will remove the given element from the set but fail with an error if it wasn’t present, and discard() will silently remove the given element but not fail with an error if it wasn’t present. See the following screenshot of me producing a runtime error on line 9 by using remove() on the set using a fruit which was already popped off on line 4.

Because the Python set is based on set theory, programmers often use sets when they wish to test for membership of an element in multiple collections, or simply as an easy way to remove duplicates from a collection.

Python also offers us the frozenset compound data type. Frozensets can’t have their elements modified using discard(), for example.

The set() Constructor

So far, we have only been using curly brackets to create a new set:

citrusfruit:set = {"oranges", "lemons", "limes", "satsumas", "nectarines"}

However, the built-in function set() can create a new set based on the argument given. This is called a constructor because it is used to construct, and to return, new object. For example, we can create a new set based on a string (remember: a string is a sequence of characters) using set():

characters:set = set("The quick brown fox jumped over the lazy dog.")
print(len(characters))

How many unique characters are in the sentence? One for every letter in the English alphabet, a space, and a period: 28.

The Tuple Data Type in Python

The tuple compound data type contains one or more comma-separated elements which, in total, are considered to form a record. Programmers often use tuples when a single value is not enough to identify something. Like the address of a house! In computer science, we might formally say something like: the identity of any house object is comprised of its attributes (street name, house number, city, postal code, and country). These 5 attributes of the house’s identity can easily be used in a tuple with 5 elements — an address.

In a previous instalment, we considered using a tuple with two elements to represent the geolocation of a point on the Earth’s surface, using its latitude and its longitude.

geolocation = (48.858093, 2.294694) #The Eiffel Tower

When we created the set data type, we enclose its values in curly brackets {}. When we create a tuple, we use standard brackets () to enclose the values.

After we’ve created a tuple, we can access its elements using their index. The index always begins at 0, and is always indicated by the use of square brackets acting on a variable such as geolocation[0].

>>> geolocation = (48.858093, 2.294694) #The Eiffel Tower
>>> print(f"Latitude: {geolocation[0]} Longitude: {geolocation[1]}")
Latitude: 48.858093 Longitude: 2.294694

An interesting thing about the tuple is that, once it has been created, it can’t be modified. This special property is called mutability, every tuple is immutable. Attempting to change the value at index 0 of our tuple will result in a TypeError:

>>> geolocation[0] = 50.0000
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

Python has a secret, hidden in the collections module! Remembering which index in a tuple contains which value can be a real PITA. Was latitude at index element 0, or was that longitude? Let’s use a named tuple to resolve the issue! To use the namedtuple data type, we first need to import the capability contained from the collections module.

Here is what this looks like in my code editor:

It is really simple to convert a tuple to a set (losing any duplicates) or a set into a tuple:

After executing this in my code editor, I get:

What’s going on here?

  • On line 1 we defined a variable named geolocation, and hint that we intend this to be a tuple geolocation:tuple . We use the normal brackets to create a comma-separated list of the tuple’s elements. Note that two elements have the same value.
  • On line 2 we have defined a variable named students, and hint that we intend this to be a set. We use curly brackets to create a comma-separated list of the set’s elements.
  • On line 4 we use the set() function with the geolocation tuple as its argument. We print the new set — which has only a single unique value. Note that the output uses curly brackets to indicate that this is a set.
  • On line 5 we use the tuple() function with the students set as its argument. We print the new tuple, it contains all of the elements of the set. Note that the output uses normal brackets to indicate that this is a tuple.

A note on type hints. In other programming languages, it is not normal to hint at the intended data type of the variable we’re creating. It is either a hard requirement (e.g. Java), or it is seen as pointless (e.g. Javascript). Even though on line 1 we hinted that the geolocation variable is intended to be a tuple, Python does not prevent us from assigning a set to it on line 4. Type hints were introduced in Python 3.5, the documentation gives us the very helpful text:

Note
The Python runtime does not enforce function and variable type annotations. They can be used by third party tools such as type checkers, IDEs, linters, etc.

Tuples can Contain Other Tuples!

You may be familiar with how to specify a colour’s name and its RGB value. Here is a handy table of common HTML colour names and their RGB values using hexadecimal notation. Let’s make this Pythonic, using tuples!

In my code editor, this is the output I get:

What’s going on here?

  • First and foremost, notice my typo on line 21. I wrote “Nane” and not “Name”. Everyone makes mistakes! Feel free to fix it for me ;)
  • On line 1 we’re importing thecollections module again. I’ve grown to like it!
  • On lines 3 and 4 we’re defining two new namedtuples. We’re also specifying that index[1] is named “rgbvalue”. At this index we will be storing an entire tuple, within the tuple.
  • On line 6 we’re defining a set which will hold all the tuples we’ll define next.
  • On lines 8, 11, 14, and 17 we create a variable called htmltuple. We assign a new namedtuple to this variable. We create the tuple value by calling the constructor htmlcolour and passing it the values we want. The first argument contains the name of the colour. The second argument contains a newnamedtuple, this time constructed from the rgbvalue namedtupledefined on line 3.
  • Notice that I’m using red=, blue=, and green= instead of the indices. Also noticed that I’ve changed the order around a little to prove that using named indices works fine.
  • You’ve noticed that the RGB value use the hexadecimal system, where hex FF is equal to decimal 255. In Python, we can directly use the hexadecimal values by prefixing them with the symbol 0x.
  • On lines 9, 12, 15, and 18 we add the new tuple to the set called colours.
  • On lines 21 to 25 I’m printing out (for each element in the set of colours) some user-friendly information. This line was too long for my editor, so I broke it up into smaller lines using multiple F-Strings.
  • Notice that I’m using dot notation colour.rgbvalue.blue to keep digging deeper into the tuples. I think this is more programmer-friendly than writing colour[1][2] to reach the value of blue!

What have we Achieved?

Really, such a lot! If you’ve got this far you should be really proud of your achievement! I hope you’re having fun.

  • We’ve discussed sets and tuples, again.
  • We’ve considered the constructor, used to create new objects.
  • We’ve considered mutability, the property of an object which tells us if it is modifiable or not.
  • We’ve explored type hints, noting that they are nothing more than a helpful hint.
  • We’ve looked at the frozenset, an immutable variant of the set data type.
  • We explored using indexes to get to a specific element in a tuple.
  • We’ve used the namedtuple data type to write tuples in a more readable way.
  • We’ve learned about casting, or converting, between sets and tuples.

I hope you enjoyed this! If you spotted any errors please do let me know! In the next instalment in this series we’ll explore two extremely useful compound data types, list and dict.

Articles in this series so far: