Learn Programming with Python — Introduction to Data Types: Strings
Let’s explore the concept of data types and how Python represents text data as strings.
A Quick Review
In earlier instalments of this series we already came into hand-to-hand contact with different types of data:
- String data was input from the console or output to the screen
- Integer data was used to test different age conditions
- Boolean data was used to control program flow
We also used Sets and Tuples to collect
individual pieces of data into data structures. We won’t go deep into data structures in this
instalment, but we will reuse the
set structures we’ve already seen.
In this instalment we’re going to get our fingers typing Python code to illustrate how Python uses data types, specifically data types to work with text data.
Text Data Types in Python3
Text data, comprised of characters, is managed in Python as :
strdata types contains a string of characters (even an empty sequence) such as
There are no other data types for
textual data in Python.
str is all you get, and
is likely all you really need!
Here is some code which manipulates a String, using some of the built-in string functions. For a full list of these functions (known as methods), take a look at the official docs.
When I execute this code, I get the following:
But how does this work?
- We defined a new variable named
sentencein line #3 above. We assigned some text to this variable. Python automatically figured out that
sentencecontains a string of characters! Other programming languages make this much harder.
sentenceis of data type
str, all of Python’s built-in methods become available. The dot in
lower()is a part of
sentence. Automatically. Python knows that all strings can have the method
lower(), and it knows that
sentenceis a string. So these built-in methods became available for us to use.
We can do many things with strings, but
we frequently test them to see if they match some conditions. Let’s define a new
set, containing the names of the capital
cities I can currently think of. I want to go through each element in this set, and see if any
capital cities are comprised of more than one word. This would mean that the city’s name has a
hyphen or a space in it. Right? Let’s go!
When I execute this program in VS Code, I get:
What’s going on here?
- Line one contains a
set— we encountered sets earlier. Don’t forget the curly brackets and a list of elements which indicate that we’re defining a
- We’re using a
forloop to iterate over every element in the set. As we encounter an element, we assign it to the variable
- We’re using an If … Elif … Else construct to control the program flow, executing different statements depending on if the condition is met.
- In each condition (line numbers 4
and 6) we’re using the built-in method of all strings:
find()does find the characters being searched for, it will return the position at which the character was first found. If it doesn’t find the search term, it returns -1. We’re testing to see that
find()returns a value greater than 1.
If you look at this program’s output, you might have noticed that the order of the cities in the output is different to the order of the cities in my set. That’s because — to Python and to mathemigicians — the order of the elements of a set is irrelevant. We’ll talk about sequences with a desired ordering later in this article :)
Most modern programming languages add built-in methods to data types. This makes the programmer’s job much, much easier!
As an exercise, you could create a program which finds all of the words in a set of the names of fruit which are more than 6 characters long. No cheating!
Identifying the str Data Type
You might not have thought of this. But
how do you know if a piece of data is a
data type? Sometimes we write code which receives data from somebody else’s code. We might be
str, but if we don’t verify this,
then the methods we want to use won’t be available if it’s not, actually, a
This code will use the type() function to return the data type of a variable:
The output of this is:
What’s going on here?
- We used the type() function to determine the variable’s data type
- We’re informed that this variable
is of the
We’ve stumbled into a super-important
concept: Object Oriented Programming. All variables in Python are objects, all objects belong to
a specific class of objects! All variables of data type str are actually objects belonging to
the class of ‘str’. Weird! We’ll dive really deep into objects and classes later in this series.
For now, here’s some more code using the built-in function
print( isinstance("Hello, World!", str)
And we get the
That’s because the string “Hello,
World!” is in fact an object instance of the class
str, and therefore is also of the
str data type. In Python2, this worked very
differently. Luckily, we’re using Python3 which has fixed a lot of inconsistencies.
Duck Typing, and Forcing Type Conversion
I mentioned earlier that Python guesses the correct data type for your variable. Let’s get into some details.
In many programming languages it is expected that the programmer always declare which data type a variable should be. In Python we can just write:
myvariable = "Hello, World!"
print (isinstance(myvariable, str))
And verify that Python has given
myvariable the data type
str. In a language such as Java, we need to
explicitly state that we want to create a string before we can assign a value.
This is called “Duck Typing”. No, Duck’s
can’t use a keyboard. The saying is: “If it waddles like a duck, and if it quacks like a duck:
it’s a duck.”. Python guesses that the value assigned to
myvariable looks like a string, behaves like
a string, so just go ahead and call it a string. That’s fine, but what does Python do in these
n = 2
m = "3"
print (n * 5)
print (m * 5)
print (n * m)
This is a head scratcher!
n is an
int data type with the numerical value of 2.
m is a
str data type with the text value of “3”.
n by 5 results in 10. Multiplying
m by 5 results in five copies of the string
“3”: “33333”. Multiplying the string “3” by 2 via
m results in two copies of the string “3”: “33”.
A new feature in Python3 is the availability of type hints. This isn’t about strictly specifying what data type we want to use — Python will still use duck typing. It does help other programmers know what the intent was. It also helps some automated checking tools catch unintended consequences.
n:int = 2
m:str = "3"
In the example above, I’ve used
:str to hint about what I intend the data type of
m to be.
Let’s convert the n to become a str data type and m to become an integer, just because we can!
n:int = 2
m:str = "3"
n = str(n)
m = int(m)
print (n * 5)
print (m * 5)
print (m * n)
When I run this code, I get the following output. Hopefully you should, too!
What Have we Achieved?
Quite a lot, again!
- We’ve discussed the str data type in Python3, and that it is he only data type in Python used for managing text data
- We’ve explored how the built-in methods of all string objects become available to our own strings
- We’ve touched on objects, instances and classes and learned how to test if a variable belongs to a specific data type
- We’ve discussed how Python will automatically guess the correct data type, but we can force the conversion from integer representations to strings
In the next instalment we’ll look at numerical data types before moving on to collections of data.
Articles in this series so far:
- Learn Programming with Python — An Introduction
- Learn Programming with Python — Introduction to Functions
- Learn Programming with Python — Controlling Execution Flow
- Learn Programming with Python — Introduction to Data Types: Strings
- Learn Programming with Python — Introduction to Data Types: Numbers
- Learn Programming with Python — Introduction to Compound Data Types: Sets and Tuples
- Learn Programming with Python — Introduction to Compound Data Types: Lists
- Learn Programming with Python — Introduction to Compound Data Types: Dictionaries