入门客AI创业平台(我带你入门,你带我飞行)
博文笔记

python实现前向匹配中查找最大长度的最高重复模式的子字符串

创建时间:2017-07-11 投稿人: 浏览次数:170

    之前写过一篇文章是:python获取指定字符串中重复模式最高的字符串

    在这里的方法很简单无非就是一个滑窗,重叠的滑动,将字符串所有的可能都找到,然后统计一下即可,现在我要的是前向匹配最大长度的同时重复模式最多的子字符串,这样的话就需要加一层统计处理了,不过做法也很好理解,具体实现如下:

#!usr/bin/env python
#encoding:utf-8

"""
__Author__:沂水寒城
功能:找到给定字符串中的最大重复模式
"""


def slice_window(str_list,n):
    """
    滑窗切片操作
    """
    result_list=[]
    for i in range(0,len(str_list)-n+1,n):
        result_list.append("/".join(str_list[i:i+n]))
    return result_list


def find_repeat_pattern(str_list):
    result_list=[]
    result_dict={}
    for i in range(2,len(str_list)):
        result_list+=slice_window(str_list,i)
    for one in result_list:
        if one in result_dict:
            result_dict[one]+=1
        else:
            result_dict[one]=1
    # return sorted(result_dict.items(), key=lambda e:e[1], reverse=True)[0]
    return result_dict


def find_real_pattern(result_dict):
    new_result_dict={}
    keys_list=result_dict.keys()
    for i in range(len(keys_list)):
        first=keys_list[i]
        for j in range(len(keys_list)):
            second=keys_list[j]
            if first in second and first!=second and result_dict[first]>1 and result_dict[second]>1:
                new_result_dict[keys_list[j]]=result_dict[keys_list[j]]
    return new_result_dict

if __name__ == "__main__":
    str_list=["1","2","3","4","5","6","7","8","4","3","4","3","4","3","4","3","4","3"]
    result_list=slice_window(str_list,2)
    print result_list
    result_dict=find_repeat_pattern(str_list)
    print result_dict
    print find_real_pattern(result_dict)

结果如下:

["1/2", "3/4", "5/6", "7/8", "4/3", "4/3", "4/3", "4/3", "4/3"]
{"1/2/3": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3": 1, "4/3/4/3/4/3": 1, "1/2/3/4/5/6/7/8/4/3/4/3": 1, "1/2/3/4/5/6/7/8/4": 1, "8/4/3/4/3/4/3": 1, "1/2/3/4/5": 1, "4/3/4/3/4": 1, "1/2/3/4": 1, "4/3/4/3": 2, "6/7/8/4/3": 1, "7/8/4/3/4/3": 1, "7/8/4": 1, "1/2/3/4/5/6/7/8/4/3": 1, "3/4/3/4/3/4/3/4/3": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4": 1, "4/5/6": 1, "4/3/4": 1, "1/2": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3/4": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3/4/3/4": 1, "1/2/3/4/5/6/7/8/4/3/4/3/4/3/4/3": 1, "3/4": 1, "1/2/3/4/5/6/7/8/4/3/4": 1, "1/2/3/4/5/6/7": 1, "5/6": 1, "4/3": 5, "7/8": 1, "5/6/7/8": 1, "3/4/3": 2, "1/2/3/4/5/6/7/8": 1, "4/3/4/3/4/3/4/3": 1, "1/2/3/4/5/6": 1}
{"3/4/3": 2, "4/3/4/3": 2}
[Finished in 0.4s]

可以看到:

    [4,3,4,3]即为我们所要的前向匹配过程中重复模式最高的子字符串了,如果想要只输出这一个,那么可以在输出前加一个比较,若值相等那么输出长度更大的字符串即可

声明:该文观点仅代表作者本人,入门客AI创业平台信息发布平台仅提供信息存储空间服务,如有疑问请联系rumenke@qq.com。