python实现前向匹配中查找最大长度的最高重复模式的子字符串
之前写过一篇文章是:python获取指定字符串中重复模式最高的字符串
在这里的方法很简单无非就是一个滑窗,重叠的滑动,将字符串所有的可能都找到,然后统计一下即可,现在我要的是前向匹配最大长度的同时重复模式最多的子字符串,这样的话就需要加一层统计处理了,不过做法也很好理解,具体实现如下:
#!usr/bin/env python
#encoding:utf-8
"""
__Author__:沂水寒城
功能:找到给定字符串中的最大重复模式
"""
def slice_window(str_list,n):
"""
滑窗切片操作
"""
result_list=[]
for i in range(0,len(str_list)-n+1,n):
result_list.append("/".join(str_list[i:i+n]))
return result_list
def find_repeat_pattern(str_list):
result_list=[]
result_dict={}
for i in range(2,len(str_list)):
result_list+=slice_window(str_list,i)
for one in result_list:
if one in result_dict:
result_dict[one]+=1
else:
result_dict[one]=1
# return sorted(result_dict.items(), key=lambda e:e[1], reverse=True)[0]
return result_dict
def find_real_pattern(result_dict):
new_result_dict={}
keys_list=result_dict.keys()
for i in range(len(keys_list)):
first=keys_list[i]
for j in range(len(keys_list)):
second=keys_list[j]
if first in second and first!=second and result_dict[first]>1 and result_dict[second]>1:
new_result_dict[keys_list[j]]=result_dict[keys_list[j]]
return new_result_dict
if __name__ == "__main__":
str_list=["1","2","3","4","5","6","7","8","4","3","4","3","4","3","4","3","4","3"]
result_list=slice_window(str_list,2)
print result_list
result_dict=find_repeat_pattern(str_list)
print result_dict
print find_real_pattern(result_dict)结果如下:
["1/2", "3/4", "5/6", "7/8", "4/3", "4/3", "4/3", "4/3", "4/3"]
{"1/2/3": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3": 1, "4/3/4/3/4/3": 1, "1/2/3/4/5/6/7/8/4/3/4/3": 1, "1/2/3/4/5/6/7/8/4": 1, "8/4/3/4/3/4/3": 1, "1/2/3/4/5": 1, "4/3/4/3/4": 1, "1/2/3/4": 1, "4/3/4/3": 2, "6/7/8/4/3": 1, "7/8/4/3/4/3": 1, "7/8/4": 1, "1/2/3/4/5/6/7/8/4/3": 1, "3/4/3/4/3/4/3/4/3": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4": 1, "4/5/6": 1, "4/3/4": 1, "1/2": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3/4": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3/4/3/4": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3/4/3": 1, "3/4": 1, "1/2/3/4/5/6/7/8/4/3/4": 1, "1/2/3/4/5/6/7": 1, "5/6": 1, "4/3": 5, "7/8": 1, "5/6/7/8": 1, "3/4/3": 2, "1/2/3/4/5/6/7/8": 1, "4/3/4/3/4/3/4/3": 1, "1/2/3/4/5/6": 1}
{"3/4/3": 2, "4/3/4/3": 2}
[Finished in 0.4s]可以看到:
[4,3,4,3]即为我们所要的前向匹配过程中重复模式最高的子字符串了,如果想要只输出这一个,那么可以在输出前加一个比较,若值相等那么输出长度更大的字符串即可
声明:该文观点仅代表作者本人,入门客AI创业平台信息发布平台仅提供信息存储空间服务,如有疑问请联系rumenke@qq.com。
- 上一篇:没有了
- 下一篇: python找出数组中第2大的数字
