Introduction
Whereas studying Python or studying another person’s code, you will have encountered the ‘u’ and ‘r’ prefixes and uncooked string literals. However what do these phrases imply? How do they have an effect on our Python code? On this article, we are going to attemp to demystify these ideas and perceive their utilization in Python.
String Literals in Python
A string literal in Python is a sequence of characters enclosed in quotes. We are able to use both single quotes (‘ ‘) or double quotes (” “) to outline a string.
my_string = 'Good day, StackAbuse readers!'
print(my_string)
my_string = "Good day, StackAbuse readers!"
print(my_string)
Operating this code gives you the next:
$ python string_example.py
Good day, StackAbuse readers!
Good day, StackAbuse readers!
Fairly simple, proper? In my view, the factor that confuses most individuals is the “literal” half. We’re used to calling them simply “strings”, so if you hear it being known as a “string literal”, it seems like one thing extra sophisticated.
Python additionally affords different methods to outline strings. We are able to prefix our string literals with sure characters to alter their habits. That is the place ‘u’ and ‘r’ prefixes are available, which we’ll speak about later.
Python additionally helps triple quotes (”’ ”’ or “”” “””) to outline strings. These are particularly helpful once we wish to outline a string that spans a number of strains.
This is an instance of a multi-line string:
my_string = """
Good day,
StackAbuse readers!
"""
print(my_string)
Operating this code will output the next:
$ python multiline_string_example.py
Good day,
StackAbuse readers!
Discover the newlines within the output? That is because of triple quotes!
What are ‘u’ and ‘r’ String Prefixes?
In Python, string literals can have optionally available prefixes that present further details about the string. These prefixes are ‘u’ and ‘r’, they usually’re used earlier than the string literal to specify its kind. The ‘u’ prefix stands for Unicode, and the ‘r’ prefix stands for uncooked.
Now, chances are you’ll be questioning what Unicode and uncooked strings are. Effectively, let’s break them down one after the other, beginning with the ‘u’ prefix.
The ‘u’ String Prefix
The ‘u’ prefix in Python stands for Unicode. It is used to outline a Unicode string. However what’s a Unicode string?
Unicode is a world encoding commonplace that gives a singular quantity for each character, no matter the platform, program, or language. This makes it doable to make use of and show textual content from a number of languages and image units in your Python packages.
In Python 3.x, all strings are Unicode by default. Nonetheless, in Python 2.x, it is advisable to use the ‘u’ prefix to outline a Unicode string.
For example, if you wish to create a string with Chinese language characters in Python 2.x, you would want to make use of the ‘u’ prefix like so:
chinese_string = u'你好'
print(chinese_string)
Once you run this code, you will get the output:
$ 你好
Which is “Good day” in Chinese language.
Observe: In Python 3.x, you possibly can nonetheless use the ‘u’ prefix, nevertheless it’s not mandatory as a result of all strings are Unicode by default.
So, that is the ‘u’ prefix. It helps you’re employed with worldwide textual content in your Python packages, particularly when you’re utilizing Python 2.x. However what in regards to the ‘r’ prefix? We’ll dive into that within the subsequent part.
The ‘r’ String Prefix
The ‘r’ prefix in Python denotes a uncooked string literal. Once you prefix a string with ‘r’, it tells Python to interpret the string precisely as it’s and to not interpret any backslashes or particular metacharacters that the string might need.
Take into account this code:
normal_string = "tTab character"
print(normal_string)
Output:
Tab character
Try our hands-on, sensible information to studying Git, with best-practices, industry-accepted requirements, and included cheat sheet. Cease Googling Git instructions and really be taught it!
Right here, t
is interpreted as a tab character. But when we prefix this string with ‘r’:
raw_string = r"tTab character"
print(raw_string)
Output:
tTab character
You may see that the ‘t’ is not interpreted as a tab character. It is handled as two separate characters: a backslash and ‘t’.
That is notably helpful when coping with common expressions, or when it is advisable to embody loads of backslashes in your string.
Working with ‘u’ and ‘r’ Prefixes in Python 2.x
Now, let’s speak about Python 2.x. In Python 2.x, the ‘u’ prefix was used to indicate a Unicode string, whereas the ‘r’ prefix was used to indicate a uncooked string, identical to in Python 3.x.
Nonetheless, the distinction lies within the default string kind. In Python 3.x, all strings are Unicode by default. However in Python 2.x, strings have been ASCII by default. So, when you wanted to work with Unicode strings in Python 2.x, you needed to prefix them with ‘u’.
unicode_string = u"Good day, world!"
print(unicode_string)
Output:
Good day, world!
However what when you wanted a string to be each Unicode and uncooked in Python 2.x? You could possibly use each ‘u’ and ‘r’ prefixes collectively, like this:
unicode_raw_string = ur"tHello, world!"
print(unicode_raw_string)
Output:
tHello, world!
Observe: The ‘ur’ syntax is not supported in Python 3.x. In case you want a string to be each uncooked and Unicode in Python 3.x, you need to use the ‘r’ prefix alone, as a result of all strings are Unicode by default.
The important thing level right here is that the ‘u’ prefix was extra necessary in Python 2.x as a result of ASCII default. In Python 3.x, all strings are Unicode by default, so the ‘u’ prefix shouldn’t be as important. Nonetheless, the ‘r’ prefix remains to be very helpful for working with uncooked strings in each variations.
Utilizing Uncooked String Literals
Now that we perceive what uncooked string literals are, let’s take a look at extra examples of how we are able to use them in our Python code.
One of the vital widespread makes use of for uncooked string literals is in common expressions. Common expressions usually embody backslashes, which may result in points if not dealt with accurately. Through the use of a uncooked string literal, we are able to extra simply keep away from these issues.
One other widespread use case for uncooked string literals is when working with Home windows file paths. As chances are you’ll know, Home windows makes use of backslashes in its file paths, which may trigger points in Python as a result of backslash’s position as an escape character. Through the use of a uncooked string literal, we are able to keep away from these points completely.
This is an instance:
path = "C:pathtofile"
print(path)
path = r"C:pathtofile"
print(path)
As you possibly can see, the uncooked string literal permits us to accurately characterize the file path, whereas the usual string doesn’t.
Frequent Errors and The best way to Keep away from Them
When working with ‘u’ and ‘r’ string prefixes and uncooked string literals in Python, there are a variety of widespread errors that builders usually make. Let’s undergo a few of them and see how one can keep away from them.
First, one widespread mistake is utilizing the ‘u’ prefix in Python 3.x. Keep in mind, the ‘u’ prefix shouldn’t be wanted in Python 3.x as strings are Unicode by default on this model. Utilizing it will not trigger an error, nevertheless it’s redundant and will doubtlessly confuse different builders studying your code.
u_string = u'Good day, World!'
Second, forgetting to make use of the ‘r’ prefix when working with common expressions can result in sudden outcomes because of escape sequences. At all times use the ‘r’ prefix when coping with common expressions in Python.
regex = 'bwordb'
regex = r'bwordb'
Final, not understanding that uncooked string literals don’t deal with the backslash as a particular character can result in errors. For example, when you’re making an attempt to incorporate a literal backslash on the finish of a uncooked string, you may run into points as Python nonetheless interprets a single backslash on the finish of the string as escaping the closing quote. To incorporate a backslash on the finish, it is advisable to escape it with one other backslash, even in a uncooked string.
raw_string = r'C:path'
# That is the right means
raw_string = r'C:path'
Conclusion
On this article, we have explored the ‘u’ and ‘r’ string prefixes in Python, in addition to uncooked string literals. We have realized that the ‘u’ prefix is used to indicate Unicode strings, whereas the ‘r’ prefix is used for uncooked strings, which deal with backslashes as literal characters fairly than escape characters. We additionally delved into widespread errors when utilizing these prefixes and uncooked string literals, and keep away from them.