[Regular Expressions][正規表示式] Python’s Regex Symbols
... ... ...
# Sample Code1:
import re
phoneNumRegex1 = re.compile(r'\d{3}-\d{3}-\d{4}')
mo1 = phoneNumRegex1.search('My number is 415-555-4242.')
print('Phone number found: ' + mo1.group())
# Phone number found: 415-555-4242
1. Grouping with Parentheses: ( )
2. Matching Multiple Groups with the Pipe: | 亦即 or
3. Optional Matching with the Question Mark: (wo)? 亦即 Optional = Match zero or one
4. Matching Zero or More with the Star: (wo)* 亦即 Match zero or more
5. Matching One or More with the Plus: (wo)+ 亦即 Match one or more (at least one)
6. Matching Specific Repetitions with Braces: {3}
7. The findall() Method: 回傳 list
8. Making Your Own Character Classes: [aeiouAEIOU]
# Sample Code2:
import re
vowelRegex = re.compile(r'[aeiouAEIOU]')
mo2 = vowelRegex.findall('RoboCop eats baby food. BABY FOOD.')
print(type(mo2))
# <class 'list'>
print(mo2)
# ['o', 'o', 'o', 'e', 'a', 'a', 'o', 'o', 'A', 'O', 'O']
9. The Caret (^ 跳脫符號: begin with) and Dollar Sign ($: end with) Characters
10. The Wildcard Character: . (dot) = wildcard 亦即 Match any character except for a new line.
11. Matching Everything with Dot-Star (.*):
dot . 表示: any single character except the newline
star * 表示: zero or more of the preceding character
# Sample Code3:
import re
nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
mo3 = nameRegex.search('First Name: Al Last Name: Amzshar')
print(mo3.group(1))
# 'Al'
print(mo3.group(2))
# 'Amzshar'
12. Matching Newlines with the Dot Character: newlineRegex = re.compile('.*', re.DOTALL)
The ? matches zero or one of the preceding group.
The * matches zero or more of the preceding group.
The + matches one or more of the preceding group.
The {n} matches exactly n of the preceding group.
The {n,} matches n or more of the preceding group.
The {,m} matches 0 to m of the preceding group.
The {n,m} matches at least n and at most m of the preceding group.
{n,m}? or *? or +? performs a non-greedy (also called lazy) match of the preceding group.
^spam means the string must begin with spam.
spam$ means the string must end with spam.
The . (dot) = wildcard matches any character, except newline characters.
\d , \w , and \s match a digit, word, or space character, respectively.
\D , \W , and \S match anything except a digit, word, or space character, respectively.
[abc] matches any character between the brackets (such as a, b, or c).
[^abc] matches any character that isn’t between the brackets.
(1) Basic Syntax
.
沒有留言:
張貼留言