python实现前向匹配中查找最大长度的最高重复模式的子字符串
之前写过一篇文章是:python获取指定字符串中重复模式最高的字符串
在这里的方法很简单无非就是一个滑窗,重叠的滑动,将字符串所有的可能都找到,然后统计一下即可,现在我要的是前向匹配最大长度的同时重复模式最多的子字符串,这样的话就需要加一层统计处理了,不过做法也很好理解,具体实现如下:
#!usr/bin/env python #encoding:utf-8 """ __Author__:沂水寒城 功能:找到给定字符串中的最大重复模式 """ def slice_window(str_list,n): """ 滑窗切片操作 """ result_list=[] for i in range(0,len(str_list)-n+1,n): result_list.append("/".join(str_list[i:i+n])) return result_list def find_repeat_pattern(str_list): result_list=[] result_dict={} for i in range(2,len(str_list)): result_list+=slice_window(str_list,i) for one in result_list: if one in result_dict: result_dict[one]+=1 else: result_dict[one]=1 # return sorted(result_dict.items(), key=lambda e:e[1], reverse=True)[0] return result_dict def find_real_pattern(result_dict): new_result_dict={} keys_list=result_dict.keys() for i in range(len(keys_list)): first=keys_list[i] for j in range(len(keys_list)): second=keys_list[j] if first in second and first!=second and result_dict[first]>1 and result_dict[second]>1: new_result_dict[keys_list[j]]=result_dict[keys_list[j]] return new_result_dict if __name__ == "__main__": str_list=["1","2","3","4","5","6","7","8","4","3","4","3","4","3","4","3","4","3"] result_list=slice_window(str_list,2) print result_list result_dict=find_repeat_pattern(str_list) print result_dict print find_real_pattern(result_dict)
结果如下:
["1/2", "3/4", "5/6", "7/8", "4/3", "4/3", "4/3", "4/3", "4/3"] {"1/2/3": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3": 1, "4/3/4/3/4/3": 1, "1/2/3/4/5/6/7/8/4/3/4/3": 1, "1/2/3/4/5/6/7/8/4": 1, "8/4/3/4/3/4/3": 1, "1/2/3/4/5": 1, "4/3/4/3/4": 1, "1/2/3/4": 1, "4/3/4/3": 2, "6/7/8/4/3": 1, "7/8/4/3/4/3": 1, "7/8/4": 1, "1/2/3/4/5/6/7/8/4/3": 1, "3/4/3/4/3/4/3/4/3": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4": 1, "4/5/6": 1, "4/3/4": 1, "1/2": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3/4": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3/4/3/4": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3/4/3": 1, "3/4": 1, "1/2/3/4/5/6/7/8/4/3/4": 1, "1/2/3/4/5/6/7": 1, "5/6": 1, "4/3": 5, "7/8": 1, "5/6/7/8": 1, "3/4/3": 2, "1/2/3/4/5/6/7/8": 1, "4/3/4/3/4/3/4/3": 1, "1/2/3/4/5/6": 1} {"3/4/3": 2, "4/3/4/3": 2} [Finished in 0.4s]
可以看到:
[4,3,4,3]即为我们所要的前向匹配过程中重复模式最高的子字符串了,如果想要只输出这一个,那么可以在输出前加一个比较,若值相等那么输出长度更大的字符串即可
声明:该文观点仅代表作者本人,入门客AI创业平台信息发布平台仅提供信息存储空间服务,如有疑问请联系rumenke@qq.com。
- 上一篇:没有了
- 下一篇: python找出数组中第2大的数字