2024年2月23日 星期五

[Python][Regular Expressions][正規表示式] Python’s Regex Symbols

... ... ...
[Regular Expressions][正規表示式] Python’s Regex Symbols
... ... ...


# Sample Code1:
import re
phoneNumRegex1 = re.compile(r'\d{3}-\d{3}-\d{4}')
mo1 = phoneNumRegex1.search('My number is 415-555-4242.')
print('Phone number found: ' + mo1.group())
# Phone number found: 415-555-4242


1. Grouping with Parentheses:  ( )
2. Matching Multiple Groups with the Pipe:  |  亦即 or
3. Optional Matching with the Question Mark: (wo)? 亦即 Optional = Match zero or one
4. Matching Zero or More with the Star: (wo)* 亦即 Match zero or more
5. Matching One or More with the Plus: (wo)+ 亦即 Match one or more (at least one)
6. Matching Specific Repetitions with Braces: {3}
7. The findall() Method: 回傳 list
8. Making Your Own Character Classes: [aeiouAEIOU]

# Sample Code2:
import re
vowelRegex = re.compile(r'[aeiouAEIOU]')
mo2 = vowelRegex.findall('RoboCop eats baby food. BABY FOOD.')
print(type(mo2))
# <class 'list'>
print(mo2)
# ['o', 'o', 'o', 'e', 'a', 'a', 'o', 'o', 'A', 'O', 'O']

9. The Caret (^ 跳脫符號: begin with) and Dollar Sign ($: end with) Characters
10. The Wildcard Character: . (dot) = wildcard 亦即 Match any character except for a new line.
11. Matching Everything with Dot-Star (.*)
  dot .  表示: any single character except the newline
  star * 表示: zero or more of the preceding character

# Sample Code3:
import re
nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
mo3 = nameRegex.search('First Name: Al Last Name: Amzshar') 
print(mo3.group(1))
# 'Al' 
print(mo3.group(2))
# 'Amzshar'

12. Matching Newlines with the Dot Character: newlineRegex = re.compile('.*', re.DOTALL



The  ?  matches zero or one of the preceding group.
The  *  matches zero or more of the preceding group.
The  +  matches one or more of the preceding group.
The  {n}  matches exactly n of the preceding group.
The  {n,}  matches n or more of the preceding group.
The  {,m}  matches 0 to m of the preceding group.
The  {n,m}  matches at least n and at most m of the preceding group.
{n,m}?  or  *?  or  +?  performs a non-greedy (also called lazy) match of the preceding group.
^spam  means the string must begin with spam.
spam$  means the string must end with spam.
The  .  (dot) = wildcard matches any character, except newline characters.


\d ,  \w , and  \s  match a digit, word, or space character, respectively.
\D ,  \W , and  \S  match anything except a digit, word, or space character, respectively.
[abc]  matches any character between the brackets (such as a, b, or c).
[^abc]  matches any character that isn’t between the brackets.



(1) Basic Syntax


(2) Regex Character Classes


.

沒有留言:

張貼留言