View previous topic :: View next topic |
Author |
Message |
martix
Joined: 20 Apr 2007 Posts: 17
|
Posted: Tue Apr 29, 2008 12:03 am Post subject: Problem matching multiple words |
|
|
How do I match multiple instances of a regex?
For example
Quote: | 07 This_Weekends__Vacation 02.03.2008
03 My.Summer.Camp 210 pic
12 The Villa 12 pics
... |
How do I make it match all of the words and replace all whitespaces with space " ".
Something like:
Code: | (([^a-zA-Z]*)([a-zA-Z]+))+?.*? |
Which right now matches only the first word...
What should remain basically is just all the words, nothing else:
Quote: | This Weekends Vacation
My Summer Camp
The Villa |
|
|
Back to top |
|
|
admin Site Admin
Joined: 09 Mar 2007 Posts: 448 Location: Canada
|
Posted: Thu May 01, 2008 7:07 am Post subject: |
|
|
In principle you can take a regex, surround it in parentheses, and then add a repeat specifier.
Your example could become:
((([^a-zA-Z]*)([a-zA-Z]+))+?.*?)+
or
((([^a-zA-Z]*)([a-zA-Z]+))+?.*?){1,3}
If that doesn't solve the problem then I need more information.
For instance what replace specifier are you using?
Maybe another approach is to eliminate the numbers and the word pic (or pics) as follows:
Either cut and paste the search/replace row content below (without the single quotes) or use the CTL-F2 trick to copy everything below (with the mouse), select the PFrank window (with the mouse), and press CTL-F2.
Row: 1
Search: '(?E)[_ .]*[0-9][_.]*'
Replace:
Row: 2
Search: '(?E)[_.]'
Replace: ' '
Row: 3
Search: '(?E)[ ]+pic[s]*([ ]|$)'
Replace: ' '
Row: 4
Search:
Replace: '*Delete All Extra Whitespace in*Prefix*'
Cheers.
Peter. |
|
Back to top |
|
|
martix
Joined: 20 Apr 2007 Posts: 17
|
Posted: Tue May 06, 2008 9:51 pm Post subject: |
|
|
Well...
Its an interesting problem, to see just how powerful the python engine really is. Thats kinda the main point. The items are not so much (22) so I could do them by hand in less than 5 mins. But I just wanted to see if it could tackle such a task using regex. Had nothing better to do.
But
((([^a-zA-Z]*)([a-zA-Z]+))+?.*?)+
or
((([^a-zA-Z]*)([a-zA-Z]+))+?.*?){1,3}
Both give the last matched string only; {1,#} gives only the #-th word.
And I have no idea what (?E) should mean.
Leave the specific example out of it. The general problem here is matching multiple repetitions of a regex, not just its last possible occurrence like it happens now.
I guess its because there is a single group reference which stores only the last of the repetitions.
Encapsulating it in another set of ()'s and using the new group's reference doesn't change anything too. |
|
Back to top |
|
|
bitmonger
Joined: 03 Oct 2007 Posts: 3
|
Posted: Sat Jun 07, 2008 11:41 pm Post subject: |
|
|
Maybe I don't understand the problem correctly, but from the example given I think a better approach would be to search for and replace the non letter groups with a single space, rather than capturing the letters and putting spaces in between.
I put
(?x)[^a-zA-Z]+
as a search pattern
and a single space as a replace. This replaces all groups of non letters with a single space and the result is
This Weekends Vacation
My Summer Camp
The Villa
There will be a single space at the start of the name (if it started with non letters), but if that is a problem just do a second line search for
(?Ex)^\s+
and a replace with nothing
Cheers,
bit |
|
Back to top |
|
|
|