Learn Programming with Python — Introduction to Compound Data Types: Dictionaries

Posted on May 08, 2020 in Learn Python

Learn Programming with Python — Introduction to Compound Data Types: Dictionaries

Dictionaries are a very versatile compound data type, allowing us to create mappings between keys and values. Dictionaries are fun!

Image credit to @pixabay https://www.pexels.com/@pixabay

Learn Programming with Python — Introduction to Compound Data Types: Dictionaries

Dictionaries are a very versatile compound data type, allowing us to create mappings between keys and values. Dictionaries are fun!

Getting Started with Key-Value Pairs.

Dictionaries operate on a sequence of key-value pairs, so let’s first take a little diversion and look at key-value pairs. Almost all books can be identified by their ISBN-Number. The ISBN number is an identifier of a book, it’s key. There are other ways to identify a book, but few are as universally useful as the ISBN number. Here is a list of key-value pairs, ISBN Number and Title, of some top -selling books:

+------------+-------------------------------------------+
| 1510752242 | Restoring Faith in the Promise of Science |
+------------+-------------------------------------------+
| 031670704X | Midnight Sun |
+------------+-------------------------------------------+
| 1524763136 | Becoming |
+------------+-------------------------------------------+
| 0735219095 | Where the Crawdads Sing |
+------------+-------------------------------------------+
| 1338635174 | The Ballad of Songbirds and Snakes |
+------------+-------------------------------------------+

We can represent the first book, in Python, using a dict like this:

books: dict = {'031670704X': 'Midnight Sun'} #1 key-value pair

The first thing to notice here is that we create a dict with curly brackets {}, assigning a comma-separated sequence of key-value pairs. Assigning individual values would otherwise create a set using the curly brackets {}.

books: set = {'031670704X', 'Midnight Sun'} #two elements 

Using Python, let’s create the dict data type with 5 key-value pairs:

When executed in my code editor, this is what I see:

What’s going on here?

  • On line 6, I define a new namedtuple named Book, it has two named attributes isbn and title.
  • On line 10, I append() a new Book object to the books_as_list variable. This continues for the remaining 4 books through to line 24.
  • On line 26, I use a for loop to iterate over all the elements in the books_as_list sequence.
  • On line 27, I use the construct of multiple assignment to assign values to 2 variables in one operation: key, value = book.isbn, book.title .
  • On line 28, I create a new element in the dictionary books_as_dict. I assign the new element’s key to be equal to the key variable, same for value.
  • Looking at the output in the terminal window, you can see that we have two separate sequences: there is a list with 5 elements, each element is a Book namedtuple. There is also a dictionary with 5 key-value pairs.

Using Complex Values in a Dictionary

In the books example above, we kept everything super-simple. We only had two data elements for only 5 books, each is represented as a str simple data type. What should we do if we have more than one value attribute we need to store with a key? This kind of example:

+------------+------------------------------------+----------------+
| 1510752242 | Restoring Faith in ... Science | K Heckenlively |
+------------+------------------------------------+----------------+
| 031670704X | Midnight Sun | S Meyer |
+------------+------------------------------------+----------------+
| 1524763136 | Becoming | M Obama |
+------------+------------------------------------+----------------+
| 0735219095 | Where the Crawdads Sing | S Owens |
+------------+------------------------------------+----------------+
| 1338635174 | The Ballad of Songbirds and Snakes | S Collins |
+------------+------------------------------------+----------------+

We have a couple of options. The easiest is to store both title and author as a tuple inside the value part of each key-value pair.

Which gives us the following terminal output:

Dict of books: {'1510752242': ('Restoring Faith in ... of Science', 'K Heckenlively'), '031670704X': ('Midnight Sun', 'S Meyer'), '1524763136': ('Becoming', 'M Obama'), '0735219095': ('Where the Crawdads Sing', ' S Owens'), '1338635174': ('The Ballad of Songbirds and Snakes', 'S Collins')}

We could also do the following:

Which gives us the following terminal output:

Dict of books: {'1510752242': {'title': 'Restoring Faith in ... of Science', 'author': 'K Heckenlively'}, '031670704X': {'title': 'Midnight Sun', 'author': 'S Meyer'}, '1524763136': {'title': 'Becoming', 'author': 'M Obama'}, '0735219095': {'title': 'Where the Crawdads Sing', 'author': ' S Owens'}, '1338635174': {'title': 'The Ballad of Songbirds and Snakes', 'author': 'S Collins'}}

This looks very, very similar to a JSON formatted text file except the quote marks are single, not double. Printing a dict in Python first converts the elements to a str object suitable for printing, and here we get single quote marks. To get a valid JSON string from a dictionary, we can easily import json and then run its built-in method dumps() to dump a string.

This version of the program does just that:

In my terminal I receive valid JSON text:

{
"1510752242": {
"title": "Restoring Faith in ... of Science",
"author": "K Heckenlively"
},
"031670704X": {
"title": "Midnight Sun",
"author": "S Meyer"
},
"1524763136": {
"title": "Becoming",
"author": "M Obama"
},
"0735219095": {
"title": "Where the Crawdads Sing",
"author": " S Owens"
},
"1338635174": {
"title": "The Ballad of Songbirds and Snakes",
"author": "S Collins"
}
}

When we store a dictionary as the value part, we end up with a nested dictionary structure. This can be as deep as we want and allows us to store complex data structures, such as networks and trees.

Using Dictionaries, Sets and Lists Together

In an earlier instalment we discussed sets, using the following “fruits” example:

Why don’t we rewrite the fruits as a dictionary? :) This will be challenging — some fruits are both citrusfruit and treefruit, or treefruit and stonefruit. But we want our dictionary to be indexed by the fruit’s name, and then its value shows us what different kinds of fruit it belongs to, right? A fruit is a fruit, and what types of fruit it should be secondary.

The following seems a reasonable way to store this information:

'lemons': ['treefruit', 'citrusfruit']

In this key-value mapping, the fruit’s name is used as the key, and a list of categories is the value. The dictionary’s keys may not contain duplicates. Just like a real dictionary of words, each key appears only once, but perhaps with several separate meanings. The following code shows that keys are overwritten if added multiple times to the dictionary.

You can see that only one entry for “oranges” exists and that the “treefruit” value was overwritten. The following program completes the exercise:

When I run this, I see the following in my code editor:

What’s going on here?

  • On line 1, I’m creating an empty dictionary named fruits_dict
  • On lines 3–6 we have a repeat of what we saw in the previous instalment in this series on the topic of sets.
  • On line 8, I’m looping through a superset of all three sets of fruit.
  • On line 9, I’m adding each fruit as the key in fruits_dict, and setting the default value to be an empty list [].
  • On line 11, I’m looping through the key-value pairs in fruits_dict using the built-in method items() — which returns a tuple. The first element of the tuple is assigned to the variable named key, the second element to the variable named value.
  • On lines 12, 14, and 16, using in, I’m testing the membership of the fruit in each of the three sets.
  • If the fruit is present in the set, on lines 13, 15, and 17 I’m appending the appropriate text as an entry in the list of values.

You can see from the terminal output that my program has correctly transformed the three sets into a single dictionary.

{'nectarines': ['citrusfruit', 'stonefruit'], 'cherries': ['treefruit', 'stonefruit'], 'pears': ['treefruit'], 'limes': ['citrusfruit', 'treefruit'], 'plums': ['treefruit', 'stonefruit'], 'peaches': ['treefruit', 'stonefruit'], 'satsumas': ['citrusfruit'], 'apples': ['treefruit'], 'lemons': ['citrusfruit', 'treefruit'], 'oranges': ['citrusfruit', 'treefruit']}

Just for fun, how would you reverse this process? If you had the previous dictionary and wanted to create a set of citrusfruit, how would you do it?

citrusfruit: set = {key for key, value in fruits_dict.items() if "citrusfruit" in value}

Using set comprehension, of course! We saw this construct used and explained it in list comprehension in the previous article in this series: Lists.

Mutability and Dictionary Keys

Python’s dictionary data type is very, very fast. Retrieving an item from the dictionary remains blazingly fast, regardless of how large the dictionary is. This is impressive, but it’s useful to really understand how this was achieved.

Not every Python data type can be used as a key in the dictionary. You can use str, but you can’t use list. You can use int, but you can’t use set. Why? The key must be immutable. If it were possible to change the value of a key, the dictionary would not be able to pre-calculate the key’s location in computer memory. This is done using a hash table, which means that the key’s object has a hash. An object’s hash never changes over its lifetime. Immutable data types such as int and str are hashable. Mutable data types such as list , dict and set are not hashable.

>>> dict1: dict = {{1,2}, "value"}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>> dict1: dict = { {"key":"value"} , "value"}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>> dict1: dict = { [1,2] , "value"}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Amusingly, it is possible to use the special object None as a dictionary’s key, but please don’t try this at work!

Specialist Dictionary Data Types: Ordered-Dict

The list retains its order, but the dict compound data type is not guaranteed to retain its order. If your code is executed by Python in a version lower than 3.6, the dict will be unordered. Always assume that your dictionary is unordered. The author of dict includes the note:

Obviously, we recommend that any portable Python program continues to use OrderedDict when ordering is important.

They are used exactly like regular dict objects:

import collections as c
ordered_dict: c.OrderedDict = c.OrderedDict({'banana': 3, 'apple': 4, 'pear': 1, 'orange': 12})

Dictionary Comprehensions

List comprehensions, set comprehensions, now: dictionary comprehensions! The full trio of very handy things provided to us by Python!

Let’s start with a trivial exercise: make a dictionary of all capital letters and their ASCII code. Let’s go!

>>> {i : chr(+i) for i in range(65, 91, 1)}
  • This is a dictionary comprehension because we’re using curly brackets {} and the value being returned is a key-value pair separated by a colon : and there is a for loop inside.
  • We are looping over the in value: range(65, 91, 1). This is a list from 65 to 91 in increments of 1.
  • The value in each loop is assigned to i, and is being given by range()
  • The key being returned is i
  • The value being returned is chr(i) which is a built-in function to return the character at that index in the ASCII character table.

The output we get is as expected, a dictionary of key-value pairs:

{65: 'A', 66: 'B', 67: 'C', 68: 'D', 69: 'E', 70: 'F', 71: 'G', 72: 'H', 73: 'I', 74: 'J', 75: 'K', 76: 'L', 77: 'M', 78: 'N', 79: 'O', 80: 'P', 81: 'Q', 82: 'R', 83: 'S', 84: 'T', 85: 'U', 86: 'V', 87: 'W', 88: 'X', 89: 'Y', 90: 'Z'}

We can do all kinds of really fancy dictionary comprehensions! Like creating a dictionary from 2 lists of equal length:

In my code editor, I get this:

What’s going on here?

  • Lines 1–3 and 7–9 are docstrings. This is the way in which document the module (file) and the function in Python.
  • On line 4, I’m importing the typing module. This is useful next:
  • On line 6, I’m defining a function’s signature. It is named: dict_from_list. The first argument it accepts is named keys, and this must be a list of hashable objects. Remember, dictionary keys must be hashable. The second argument is a list named values. The function returns a dict data type.
  • On line 10, I test to ensure that both the keys and the values are the same length.
  • On line 11, if the lengths are different, I am raising, or throwing, an exception of the type ValueError.
  • On line 13 I am calling in the built-in dict() function, and passing in the result of the zip() operation. Zip takes two sequences and returns a list of tuples.

You can see from the output in my terminal that the two lists, one containing the names of countries, the other containing their populations in millions, is now in a dictionary data type.

What have we achieved?

Almost too much to list!

  • zip(), range() built-in functions
  • raise to generate a ValueError exception at runtime
  • Building dictionaries programatically from sets and lists
  • The concepts of hashable and mutable
  • Creating dictionary comprehensions
  • Getting the key-value pairs from a dictionary using items()

In the next instalment, we’ll move on to using Python to interact with external data.

Articles in this series so far: