介绍

福哥今天要给大家讲讲Python的正则表达式的使用技巧，正则表达式（Regular expressions）就是通过一系列的特殊格式的匹配符号去描述一个字符串的工具。

使用正则表达式可以快速检测字符串的格式，也可以从字符串里面查找出符合特定规则的字符串片断，还可以将字符串按照特定的规则替换或者重组成新的字符串。

正则表达式

表达式

re.compile

使用re.compile方法可以定义一个pattern，用来使用其他方法调用这个pattern。

url = "https://tongfu.net/home/35.html"

pattern = re.compile(r"tongfu\.net", re.I)

print(re.findall(pattern, url))

home/topic/2021/0619/15/d51aab73147c81e9f7ed7c128b874d98.png

re.template

re.template方法和re.compile方法类似，可以达到相同的目的。

url = "https://tongfu.net/home/35.html"

pattern = re.template(r"tongfu\.net", re.I)

print(re.findall(pattern, url))

匹配

re.match

re.match可以实现使用pattern去匹配字符串，结果是一个对象，可以有很多功能可以使用。

re.match是从字符串开头进行匹配的，pattern如果不包含字符串开头部分的话，匹配一定会失败！

url = "https://tongfu.net/home/35.html"

match = re.match(r"https\:\/\/([^\/]+)\/home\/(\d+)\.html", url)

print(match.group())
print(match.groups())

home/topic/2021/0619/15/91ddca1216cecaa3287e2810cc4cc831.png

re.search

re.search和re.match类型，区别在于re.search不是从字符串开头匹配的。

如果我们的pattern本身就是从字符串开头匹配的话建议使用re.match，因为效率它更快！

url = "https://tongfu.net/home/35.html"

match = re.search(r"home\/(\d+)\.html", url)

print(match.group())
print(match.groups())

home/topic/2021/0619/15/1be428d83eb35c70354aa79eb46c62f6.png

re.findall

re.findall可以直接返回一个tuple数组，而且可以实现多组匹配。

urls = "https://tongfu.net/home/35.html," \
       "https://tongfu.net/home/8.html"

matches = re.findall(r"https\:\/\/([^\/]+)\/home\/(\d+)\.html", urls)

print(matches)

home/topic/2021/0619/15/15160986ffb35d85c404f87b213e8a34.png

替换

re.sub

使用re.sub可以将pattern匹配的字符串片断替换为我们想要的内容，这里面还可以将pattern中的匹配组应用到替换内容里面。

urls = "https://tongfu.net/home/35.html," \
       "https://tongfu.net/home/8.html"

matches = re.sub(r"\/home\/(\d+)\.html", r"/homepage/\1.htm", urls)

print(matches)

home/topic/2021/0619/15/532ead7d6056e24387905fda4402a061.png

re.subn

re.subn和re.sub在字符串替换功能上面没有区别，re.subn比re.sub多了一个替换次数的统计，这个会在返回值里面体现出来。

urls = "https://tongfu.net/home/35.html," \
       "https://tongfu.net/home/8.html"

matches = re.subn(r"\/home\/(\d+)\.html", r"/homepage/\1.htm", urls)

print(matches)

home/topic/2021/0619/15/c63554cfe55e3620aa1805dabe2dc478.png

修饰符

修饰符就是参数flags，用来对pattern进行一个补充。

修饰符	描述
re.I	忽略大小写敏感，就是不管大小写问题，字母对就算匹配了。
re.L	本地化识别匹配。
re.M	多行匹配，默认正则表达式会在遇到换行符后结束匹配，设置这个之后就会一直匹配到末尾。
re.S	使字符“.”匹配换行符，默认字符“.”是不包括换行符的。
re.U	使用Unicode解析字符串，它会影响“\w”，“\W”，“\b”，“\B”的作用。
re.X	这个福哥还没有研究过，官方说法就是可以让编写pattern更加简单。

re.match

org = "福哥说他已经35岁了，大叔的年纪亚里士多德呀！福哥的英文名字就是Tongfu，注意大小写哦~"

print(re.match(r"福哥[^\d]*", org, re.IGNORECASE))

re.search

org = "福哥说他已经35岁了，大叔的年纪亚里士多德呀！福哥的英文名字就是Tongfu，注意大小写哦~"

print(re.search(r"([a-z]+)", org, re.IGNORECASE))

re.findall

org = "福哥说他已经35岁了，大叔的年纪亚里士多德呀！福哥的英文名字就是Tongfu，注意大小写哦~"

print(re.findall(r"福哥", org, re.IGNORECASE))

re.sub

org = "福哥说他已经35岁了，大叔的年纪亚里士多德呀！福哥的英文名字就是Tongfu，注意大小写哦~"

print(re.sub(r"福哥", "鬼谷子叔叔", org, 0, re.IGNORECASE))

re.subn

org = "福哥说他已经35岁了，大叔的年纪亚里士多德呀！福哥的英文名字就是Tongfu，注意大小写哦~"

print(re.subn(r"福哥", "鬼谷子叔叔", org, 0, re.IGNORECASE))

home/topic/2021/1130/11/9e50d0d3850ed1894de91f8efab3e306.png

总结

今天福哥带着童鞋们学习了Python的正则表达式库re的使用技巧，正则表达式在各种语言的编程时候都是非常重要的库，使用正则表达式可以让我们处理字符串变得更加简单、更加优雅~~

Python正则表达式的使用技巧V1.2【20210618】

介绍

介绍

正则表达式

表达式

re.compile

re.template

匹配

re.match

re.search

re.findall

替换

re.sub

re.subn

修饰符

re.match

re.search

re.findall

re.sub

re.subn

总结

鬼谷子叔叔

相关日志