Learn Programming with Python — Introduction to Data Types: Strings

Posted on April 30, 2020 in Learn Python

Learn Programming with Python — Introduction to Data Types: Strings

Let’s explore the concept of data types and how it Python represents text data as strings.

Image credit to @goumbik https://www.pexels.com/@goumbik

Learn Programming with Python — Introduction to Data Types: Strings

Let’s explore the concept of data types and how Python represents text data as strings.

A Quick Review

In earlier instalments of this series we already came into hand-to-hand contact with different types of data:

  • String data was input from the console or output to the screen
  • Integer data was used to test different age conditions
  • Boolean data was used to control program flow

We also used Sets and Tuples to collect individual pieces of data into data structures. We won’t go deep into data structures in this instalment, but we will reuse the tuple and set structures we’ve already seen.

In this instalment we’re going to get our fingers typing Python code to illustrate how Python uses data types, specifically data types to work with text data.

Text Data Types in Python3

Text data, comprised of characters, is managed in Python as :

  • str data types contains a string of characters (even an empty sequence) such as Hello, World!

There are no other data types for textual data in Python. str is all you get, and is likely all you really need!

Here is some code which manipulates a String, using some of the built-in string functions. For a full list of these functions (known as methods), take a look at the official docs.

When I execute this code, I get the following:

But how does this work?

  • We defined a new variable named sentence in line #3 above. We assigned some text to this variable. Python automatically figured out that sentence contains a string of characters! Other programming languages make this much harder.
  • Because sentence is of data type str, all of Python’s built-in methods become available. The dot in sentence.lower() indicates that lower() is a part of sentence. Automatically. Python knows that all strings can have the method lower(), and it knows that sentence is a string. So these built-in methods became available for us to use.

We can do many things with strings, but we frequently test them to see if they match some conditions. Let’s define a new set, containing the names of the capital cities I can currently think of. I want to go through each element in this set, and see if any capital cities are comprised of more than one word. This would mean that the city’s name has a hyphen or a space in it. Right? Let’s go!

When I execute this program in VS Code, I get:

What’s going on here?

  • Line one contains a set — we encountered sets earlier. Don’t forget the curly brackets and a list of elements which indicate that we’re defining a set.
  • We’re using a for loop to iterate over every element in the set. As we encounter an element, we assign it to the variable city.
  • We’re using an If … Elif … Else construct to control the program flow, executing different statements depending on if the condition is met.
  • In each condition (line numbers 4 and 6) we’re using the built-in method of all strings: find(). If find() does find the characters being searched for, it will return the position at which the character was first found. If it doesn’t find the search term, it returns -1. We’re testing to see that find() returns a value greater than 1.

If you look at this program’s output, you might have noticed that the order of the cities in the output is different to the order of the cities in my set. That’s because — to Python and to mathemigicians — the order of the elements of a set is irrelevant. We’ll talk about sequences with a desired ordering later in this article :)

Most modern programming languages add built-in methods to data types. This makes the programmer’s job much, much easier!

As an exercise, you could create a program which finds all of the words in a set of the names of fruit which are more than 6 characters long. No cheating!

Identifying the str Data Type

You might not have thought of this. But how do you know if a piece of data is a str data type? Sometimes we write code which receives data from somebody else’s code. We might be expecting a str, but if we don’t verify this, then the methods we want to use won’t be available if it’s not, actually, a str.

This code will use the type() function to return the data type of a variable:

print(type('Hello, World!'))

The output of this is:

<class 'str'>

What’s going on here?

  • We used the type() function to determine the variable’s data type
  • We’re informed that this variable is of the class ‘str’

We’ve stumbled into a super-important concept: Object Oriented Programming. All variables in Python are objects, all objects belong to a specific class of objects! All variables of data type str are actually objects belonging to the class of ‘str’. Weird! We’ll dive really deep into objects and classes later in this series. For now, here’s some more code using the built-in function isinstance().

print( isinstance("Hello, World!", str) 

And we get the bool answer:

True

That’s because the string “Hello, World!” is in fact an object instance of the class str, and therefore is also of the str data type. In Python2, this worked very differently. Luckily, we’re using Python3 which has fixed a lot of inconsistencies.

Duck Typing, and Forcing Type Conversion

I mentioned earlier that Python guesses the correct data type for your variable. Let’s get into some details.

In many programming languages it is expected that the programmer always declare which data type a variable should be. In Python we can just write:

myvariable = "Hello, World!"
print (isinstance(myvariable, str))

And verify that Python has given myvariable the data type str. In a language such as Java, we need to explicitly state that we want to create a string before we can assign a value.

This is called “Duck Typing”. No, Duck’s can’t use a keyboard. The saying is: “If it waddles like a duck, and if it quacks like a duck: it’s a duck.”. Python guesses that the value assigned to myvariable looks like a string, behaves like a string, so just go ahead and call it a string. That’s fine, but what does Python do in these cases?

n = 2
m = "3"
print (n * 5)
print (m * 5)
print (n * m)

This is a head scratcher! n is an int data type with the numerical value of 2. m is a str data type with the text value of “3”. Multiplying n by 5 results in 10. Multiplying m by 5 results in five copies of the string “3”: “33333”. Multiplying the string “3” by 2 via n * m results in two copies of the string “3”: “33”.

A new feature in Python3 is the availability of type hints. This isn’t about strictly specifying what data type we want to use — Python will still use duck typing. It does help other programmers know what the intent was. It also helps some automated checking tools catch unintended consequences.

n:int = 2
m:str = "3"

In the example above, I’ve used :int and :str to hint about what I intend the data type of the variables n and m to be.

Let’s convert the n to become a str data type and m to become an integer, just because we can!

n:int = 2
m:str = "3"
n = str(n)
m = int(m)
print (n * 5)
print (m * 5)
print (m * n)

When I run this code, I get the following output. Hopefully you should, too!

What Have we Achieved?

Quite a lot, again!

  • We’ve discussed the str data type in Python3, and that it is he only data type in Python used for managing text data
  • We’ve explored how the built-in methods of all string objects become available to our own strings
  • We’ve touched on objects, instances and classes and learned how to test if a variable belongs to a specific data type
  • We’ve discussed how Python will automatically guess the correct data type, but we can force the conversion from integer representations to strings

In the next instalment we’ll look at numerical data types before moving on to collections of data.

Articles in this series so far: