KMP算法及应用(hdu2087剪花布条 )Power Strings (POJ2046)Cyclic Nacklace(HDU3746)
KMP由Knuth(D.E.Knuth)、Morris(J.H.Morris)和Pratt(V.R.Pratt)三人设计的线性时间字符串匹配算法。所以叫做KMP。。。。。
字符串匹配,就是从一个字符串中查找出另一个字符串所在位置,当然也可能出现查询不到的情况。
比如给出目标字符串 ss:
- abcabcabce
所要匹配的模式串 s:
- abcabce
当匹配到前6位是,都是成功的,但是到第7位就失败了(ss[6]!=s[6]),我们通长的做法是回溯到ss的第2位,然后让s重新开始从第一位匹配,这样的时间复杂度明显是n*m(n和m分别是ss串和s串的长度),所以就可以想能不能用其他方法解决。
以下是我的理解:
1.我们观察模式串的特点:
前三位abc和第4-6位abc 是相同的,如果说当前6位都是匹配的,把s字符串往后移动三位,是不是前三位也是匹配的。
虽然我们知道后面一定是匹配的,但是通常情况下,还是得让计算机跑一遍,(虽然是无用功,但是形式还是要走滴,所以就有TLE……不扯了,扎心qaq)
2.再来看这一个字符串
目标串(蓝色)的每一位都不相同,但是依然匹配到第六位失败,注意是每一位都不相同,所以当回溯到下一步的时候,很明显,一定没有一个字符是能匹配成功的,所以又做了无用功
3.额,,再看一对串
这个很有特点,前面的字符都是一样的,只有最后一个不一样,看到最后一个适配后,肯定知道,应该将模式字符向后移动6位嘛(但是计算机没有人的智商,所以我们需要写一个程序然她,他,它,听话)
为了避免不必要的流程,让时间复杂度降到最低,所以,K,M,P,三人总结出,—->KMP
- ####KMP是如何工作的
如果是通常算法的话,需要16步比较,而kmp就只要。。。。
为什么能过省略那么多比较过程呢?———KMP的核心next[]跳转表
其实,模式串往往含有一定的信息——前缀包含。
对于模式串而言,其前缀字符串,有可能也是模式串中的非前缀子串,这个问题我称之为前缀包含问题。
如果不理解上述,,请忽略,看下面的解释
如图这个模式字符串,后面三位和前面的三位相同,所以就可以姑且理解为前缀包含。。
以模式串a b c a b c a c a b为例,其前缀的4个字符a b c a,正好也是模式串的一个子串a b c (a b c a) c a b,所以当目标串与模式串执行匹配的过程中,如果直到第 8 个字符才匹配失败,同时也意味着目标串当前字符之前的 4 个字符,与模式串的前 4 个字符是相同的,所以当模式串向后移动的时候,可以直接将模式串的第 5 个字符与当前字符对齐,执行比较,这样就实现了模式串一次性向前跳跃多个字符。
next数组就起到能一次性跳跃多个字符的作用
next中下标k含义:对于模式串的第 j 个字符 s[j],是所有满足使 s[1···k-1] =s[j-(k-1)···j-1] (k < j) 成立的 k 的最大值。next[j]=k
简单来说就是当前后缀字符中所包含的前缀的最大值
先不管next怎么用,先来看是怎么得到的:
void getnex(int nex[],char *s)
{int len=strlen(s);int i,j=-1;nex[0]=-1;for(i=1;i<len;i++){while(j>-1&&s[j+1]!=s[i])j=nex[j];if(s[j+1]==s[i])j++;nex[i]=j;}
}
//这里用nex表示next数组,在c++中next已经变成保留字符串,所以不能用next当做跳转数组名,否则报错
还有这种写法
void get_nex(int nex[],char *s)
{int len=strlen(s);int i=0,j=-1;nex[0]=-1;while(i!=len) {if(j==-1||s[i]==s[j])nex[++i]=++j;else j=nex[j];}
}
//和上面那种写法区别,建议自己试试
看next是如何完成跳转的
int kmp(char *ss,char *s,int nex[])
{getnex(nex,s);int lens=strlen(s);int lenss=strlen(ss);int i,j=-1;for(i=0;i<lenss;i++){while(j>-1&&s[j+1]!=ss[i])j=nex[j];if(s[j+1]==ss[i])j++;if(j==lens-1){return i-lens+1;//找到模式串,返回在在目标串第一个字符位置}}
}
具体看一个问题
问题(hdu2087剪花布条 )
一块花布条,里面有些图案,另有一块直接可用的小饰条,里面也有一些图案。对于给定的花布条和小饰条,计算一下能从花布条中尽可能剪出几块小饰条来呢?
Input
输入中含有一些数据,分别是成对出现的花布条和小饰条,其布条都是用可见ASCII字符表示的,可见的ASCII字符有多少个,布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符,则不再进行工作。
Output
输出能从花纹布中剪出的最多小饰条个数,如果一块都没有,那就老老实实输出0,每个结果之间应换行。
Sample Input
abcde a3
aaaaaa aa
#
Sample Output
0
3
这个题的思路很明显,就是要找出目标串中的“花纹字符串”,虽然暴力能过,但还举荐用kmp
#include<iostream>
#include<cstdio>
#include<algorithm>
#include<cstring>
#include<set>
#include<map>
#include<cmath>using namespace std;
void getnex(char *s,int nex[])
{int j=-1;int len=strlen(s);nex[0]=-1;int i;for(i=1;i<len;i++){while(j>-1&&s[j+1]!=s[i])j=nex[j];if(s[j+1]==s[i])j++;nex[i]=j;}
}
int kmp(char *ss,char *s,int nex[])
{getnex(s,nex);int i,j,lenss,lens;lenss=strlen(ss);lens=strlen(s);j=-1;int ans=0;for(i=0;i<lenss;i++){while(j>-1&&s[j+1]!=ss[i])j=nex[j];if(s[j+1]==ss[i])j++;if(j==lens-1){j=-1;ans++;}}return ans;
}
int main()
{char ss[1005],s[1005];while(~scanf("%s",ss)){if(strcmp(ss,"#")==0) break;scanf("%s",s);int nex[1005];int ans=kmp(ss,s,nex);printf("%d\n",ans);}return 0;
}
循环节问题
什么是循环节?
—->baidu
利用KMP算法中的next值可以求出字符串的循环节,如ababab的循环节为ab,abcd的循环节为abcd,具体做法如下:假设字符串的长度为len,next[len]为字符串的最后一个字符的下一个字符的next值(下标从0开始),如果len % (len - next[len]) == 0,那么循环节的循环次数为len / (len - next[len]),否则为1
循环节问题
Power Strings (POJ2046)
Given two strings a and b we define a*b to be their concatenation. For example, if a = “abc” and b = “def” then a*b = “abcdef”. If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = “” (the empty string) and a^(n+1) = a*(a^n).
Input
Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.
Output
For each s you should print the largest n such that s = a^n for some string a.
Sample Input
abcd
aaaa
ababab
.
Sample Output
1
4
3
Hint
This problem has huge input, use scanf instead of cin to avoid time limit exceed.
#include<iostream>
#include<cstdio>
#include<algorithm>
#include<cstring>
#include<set>
#include<map>
#include<cmath>using namespace std;
char s[1000000+100];
int nex[1000000+100];
void get_nex(int nex[],char *s)
{int len=strlen(s);int i=0,j=-1;nex[0]=-1;while(i!=len){if(j==-1||s[i]==s[j])nex[++i]=++j;else j=nex[j];}
}
int main()
{int T,i;while(scanf("%s",s)){if(strcmp(s,".")==0) break;int len=strlen(s);get_nex(nex,s);int t=len-nex[len];if(len%t==0&&nex[len]!=0)printf("%d\n",len/t);else printf("%d\n",1);}return 0;
}
还有循环节的扩展问题
Cyclic Nacklace(HDU3746)
CC always becomes very depressed at the end of this month, he has checked his credit card yesterday, without any surprise, there are only 99.9 yuan left. he is too distressed and thinking about how to tide over the last days. Being inspired by the entrepreneurial spirit of “HDU CakeMan”, he wants to sell some little things to make money. Of course, this is not an easy task.
As Christmas is around the corner, Boys are busy in choosing christmas presents to send to their girlfriends. It is believed that chain bracelet is a good choice. However, Things are not always so simple, as is known to everyone, girl’s fond of the colorful decoration to make bracelet appears vivid and lively, meanwhile they want to display their mature side as college students. after CC understands the girls demands, he intends to sell the chain bracelet called CharmBracelet. The CharmBracelet is made up with colorful pearls to show girls’ lively, and the most important thing is that it must be connected by a cyclic chain which means the color of pearls are cyclic connected from the left to right. And the cyclic count must be more than one. If you connect the leftmost pearl and the rightmost pearl of such chain, you can make a CharmBracelet. Just like the pictrue below, this CharmBracelet’s cycle is 9 and its cyclic count is 2:
Now CC has brought in some ordinary bracelet chains, he wants to buy minimum number of pearls to make CharmBracelets so that he can save more money. but when remaking the bracelet, he can only add color pearls to the left end and right end of the chain, that is to say, adding to the middle is forbidden.
CC is satisfied with his ideas and ask you for help.
Input
The first line of the input is a single integer T ( 0 < T <= 100 ) which means the number of test cases.
Each test case contains only one line describe the original ordinary chain to be remade. Each character in the string stands for one pearl and there are 26 kinds of pearls being described by ‘a’ ~’z’ characters. The length of the string Len: ( 3 <= Len <= 100000 ).
Output
For each case, you are required to output the minimum count of pearls added to make a CharmBracelet.
Sample Input
3
aaa
abca
abcde
Sample Output
0
2
5
*这个题就是要求给出字符串,,问还差几位,能形成循环节
所以,直接上代码*
#include<iostream>
#include<stdio.h>
#include<cstring>
using namespace std;
char a[100001];
int nexts[100001];
int main()
{int T,len,i,j,nima;while(scanf("%d",&T)!=EOF){getchar();while(T--){scanf("%s",a+1);len=strlen(a+1);i=1;j=0;nexts[1]=0;while(i<=len){if(j==0||a[i]==a[j]){i++;j++;nexts[i]=j;}elsej=nexts[j];}nima=(len+1)-nexts[len+1];if(len%nima==0&&len!=nima)//这里要注意个是循环节的长度不能等于它自己本身就是len!=nimaprintf("%d\n",0);elseprintf("%d\n",nima-len%nima);}}return 0;
}
//别在意nima这个变量名
这次kmp先写到这,讲真的真不好理解。。。。。。可能我太菜 了吧
KMP算法及应用(hdu2087剪花布条 )Power Strings (POJ2046)Cyclic Nacklace(HDU3746)
KMP由Knuth(D.E.Knuth)、Morris(J.H.Morris)和Pratt(V.R.Pratt)三人设计的线性时间字符串匹配算法。所以叫做KMP。。。。。
字符串匹配,就是从一个字符串中查找出另一个字符串所在位置,当然也可能出现查询不到的情况。
比如给出目标字符串 ss:
- abcabcabce
所要匹配的模式串 s:
- abcabce
当匹配到前6位是,都是成功的,但是到第7位就失败了(ss[6]!=s[6]),我们通长的做法是回溯到ss的第2位,然后让s重新开始从第一位匹配,这样的时间复杂度明显是n*m(n和m分别是ss串和s串的长度),所以就可以想能不能用其他方法解决。
以下是我的理解:
1.我们观察模式串的特点:
前三位abc和第4-6位abc 是相同的,如果说当前6位都是匹配的,把s字符串往后移动三位,是不是前三位也是匹配的。
虽然我们知道后面一定是匹配的,但是通常情况下,还是得让计算机跑一遍,(虽然是无用功,但是形式还是要走滴,所以就有TLE……不扯了,扎心qaq)
2.再来看这一个字符串
目标串(蓝色)的每一位都不相同,但是依然匹配到第六位失败,注意是每一位都不相同,所以当回溯到下一步的时候,很明显,一定没有一个字符是能匹配成功的,所以又做了无用功
3.额,,再看一对串
这个很有特点,前面的字符都是一样的,只有最后一个不一样,看到最后一个适配后,肯定知道,应该将模式字符向后移动6位嘛(但是计算机没有人的智商,所以我们需要写一个程序然她,他,它,听话)
为了避免不必要的流程,让时间复杂度降到最低,所以,K,M,P,三人总结出,—->KMP
- ####KMP是如何工作的
如果是通常算法的话,需要16步比较,而kmp就只要。。。。
为什么能过省略那么多比较过程呢?———KMP的核心next[]跳转表
其实,模式串往往含有一定的信息——前缀包含。
对于模式串而言,其前缀字符串,有可能也是模式串中的非前缀子串,这个问题我称之为前缀包含问题。
如果不理解上述,,请忽略,看下面的解释
如图这个模式字符串,后面三位和前面的三位相同,所以就可以姑且理解为前缀包含。。
以模式串a b c a b c a c a b为例,其前缀的4个字符a b c a,正好也是模式串的一个子串a b c (a b c a) c a b,所以当目标串与模式串执行匹配的过程中,如果直到第 8 个字符才匹配失败,同时也意味着目标串当前字符之前的 4 个字符,与模式串的前 4 个字符是相同的,所以当模式串向后移动的时候,可以直接将模式串的第 5 个字符与当前字符对齐,执行比较,这样就实现了模式串一次性向前跳跃多个字符。
next数组就起到能一次性跳跃多个字符的作用
next中下标k含义:对于模式串的第 j 个字符 s[j],是所有满足使 s[1···k-1] =s[j-(k-1)···j-1] (k < j) 成立的 k 的最大值。next[j]=k
简单来说就是当前后缀字符中所包含的前缀的最大值
先不管next怎么用,先来看是怎么得到的:
void getnex(int nex[],char *s)
{int len=strlen(s);int i,j=-1;nex[0]=-1;for(i=1;i<len;i++){while(j>-1&&s[j+1]!=s[i])j=nex[j];if(s[j+1]==s[i])j++;nex[i]=j;}
}
//这里用nex表示next数组,在c++中next已经变成保留字符串,所以不能用next当做跳转数组名,否则报错
还有这种写法
void get_nex(int nex[],char *s)
{int len=strlen(s);int i=0,j=-1;nex[0]=-1;while(i!=len) {if(j==-1||s[i]==s[j])nex[++i]=++j;else j=nex[j];}
}
//和上面那种写法区别,建议自己试试
看next是如何完成跳转的
int kmp(char *ss,char *s,int nex[])
{getnex(nex,s);int lens=strlen(s);int lenss=strlen(ss);int i,j=-1;for(i=0;i<lenss;i++){while(j>-1&&s[j+1]!=ss[i])j=nex[j];if(s[j+1]==ss[i])j++;if(j==lens-1){return i-lens+1;//找到模式串,返回在在目标串第一个字符位置}}
}
具体看一个问题
问题(hdu2087剪花布条 )
一块花布条,里面有些图案,另有一块直接可用的小饰条,里面也有一些图案。对于给定的花布条和小饰条,计算一下能从花布条中尽可能剪出几块小饰条来呢?
Input
输入中含有一些数据,分别是成对出现的花布条和小饰条,其布条都是用可见ASCII字符表示的,可见的ASCII字符有多少个,布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符,则不再进行工作。
Output
输出能从花纹布中剪出的最多小饰条个数,如果一块都没有,那就老老实实输出0,每个结果之间应换行。
Sample Input
abcde a3
aaaaaa aa
#
Sample Output
0
3
这个题的思路很明显,就是要找出目标串中的“花纹字符串”,虽然暴力能过,但还举荐用kmp
#include<iostream>
#include<cstdio>
#include<algorithm>
#include<cstring>
#include<set>
#include<map>
#include<cmath>using namespace std;
void getnex(char *s,int nex[])
{int j=-1;int len=strlen(s);nex[0]=-1;int i;for(i=1;i<len;i++){while(j>-1&&s[j+1]!=s[i])j=nex[j];if(s[j+1]==s[i])j++;nex[i]=j;}
}
int kmp(char *ss,char *s,int nex[])
{getnex(s,nex);int i,j,lenss,lens;lenss=strlen(ss);lens=strlen(s);j=-1;int ans=0;for(i=0;i<lenss;i++){while(j>-1&&s[j+1]!=ss[i])j=nex[j];if(s[j+1]==ss[i])j++;if(j==lens-1){j=-1;ans++;}}return ans;
}
int main()
{char ss[1005],s[1005];while(~scanf("%s",ss)){if(strcmp(ss,"#")==0) break;scanf("%s",s);int nex[1005];int ans=kmp(ss,s,nex);printf("%d\n",ans);}return 0;
}
循环节问题
什么是循环节?
—->baidu
利用KMP算法中的next值可以求出字符串的循环节,如ababab的循环节为ab,abcd的循环节为abcd,具体做法如下:假设字符串的长度为len,next[len]为字符串的最后一个字符的下一个字符的next值(下标从0开始),如果len % (len - next[len]) == 0,那么循环节的循环次数为len / (len - next[len]),否则为1
循环节问题
Power Strings (POJ2046)
Given two strings a and b we define a*b to be their concatenation. For example, if a = “abc” and b = “def” then a*b = “abcdef”. If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = “” (the empty string) and a^(n+1) = a*(a^n).
Input
Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.
Output
For each s you should print the largest n such that s = a^n for some string a.
Sample Input
abcd
aaaa
ababab
.
Sample Output
1
4
3
Hint
This problem has huge input, use scanf instead of cin to avoid time limit exceed.
#include<iostream>
#include<cstdio>
#include<algorithm>
#include<cstring>
#include<set>
#include<map>
#include<cmath>using namespace std;
char s[1000000+100];
int nex[1000000+100];
void get_nex(int nex[],char *s)
{int len=strlen(s);int i=0,j=-1;nex[0]=-1;while(i!=len){if(j==-1||s[i]==s[j])nex[++i]=++j;else j=nex[j];}
}
int main()
{int T,i;while(scanf("%s",s)){if(strcmp(s,".")==0) break;int len=strlen(s);get_nex(nex,s);int t=len-nex[len];if(len%t==0&&nex[len]!=0)printf("%d\n",len/t);else printf("%d\n",1);}return 0;
}
还有循环节的扩展问题
Cyclic Nacklace(HDU3746)
CC always becomes very depressed at the end of this month, he has checked his credit card yesterday, without any surprise, there are only 99.9 yuan left. he is too distressed and thinking about how to tide over the last days. Being inspired by the entrepreneurial spirit of “HDU CakeMan”, he wants to sell some little things to make money. Of course, this is not an easy task.
As Christmas is around the corner, Boys are busy in choosing christmas presents to send to their girlfriends. It is believed that chain bracelet is a good choice. However, Things are not always so simple, as is known to everyone, girl’s fond of the colorful decoration to make bracelet appears vivid and lively, meanwhile they want to display their mature side as college students. after CC understands the girls demands, he intends to sell the chain bracelet called CharmBracelet. The CharmBracelet is made up with colorful pearls to show girls’ lively, and the most important thing is that it must be connected by a cyclic chain which means the color of pearls are cyclic connected from the left to right. And the cyclic count must be more than one. If you connect the leftmost pearl and the rightmost pearl of such chain, you can make a CharmBracelet. Just like the pictrue below, this CharmBracelet’s cycle is 9 and its cyclic count is 2:
Now CC has brought in some ordinary bracelet chains, he wants to buy minimum number of pearls to make CharmBracelets so that he can save more money. but when remaking the bracelet, he can only add color pearls to the left end and right end of the chain, that is to say, adding to the middle is forbidden.
CC is satisfied with his ideas and ask you for help.
Input
The first line of the input is a single integer T ( 0 < T <= 100 ) which means the number of test cases.
Each test case contains only one line describe the original ordinary chain to be remade. Each character in the string stands for one pearl and there are 26 kinds of pearls being described by ‘a’ ~’z’ characters. The length of the string Len: ( 3 <= Len <= 100000 ).
Output
For each case, you are required to output the minimum count of pearls added to make a CharmBracelet.
Sample Input
3
aaa
abca
abcde
Sample Output
0
2
5
*这个题就是要求给出字符串,,问还差几位,能形成循环节
所以,直接上代码*
#include<iostream>
#include<stdio.h>
#include<cstring>
using namespace std;
char a[100001];
int nexts[100001];
int main()
{int T,len,i,j,nima;while(scanf("%d",&T)!=EOF){getchar();while(T--){scanf("%s",a+1);len=strlen(a+1);i=1;j=0;nexts[1]=0;while(i<=len){if(j==0||a[i]==a[j]){i++;j++;nexts[i]=j;}elsej=nexts[j];}nima=(len+1)-nexts[len+1];if(len%nima==0&&len!=nima)//这里要注意个是循环节的长度不能等于它自己本身就是len!=nimaprintf("%d\n",0);elseprintf("%d\n",nima-len%nima);}}return 0;
}
//别在意nima这个变量名
这次kmp先写到这,讲真的真不好理解。。。。。。可能我太菜 了吧
发布评论