> Programming Languages > Python
Various Topics Home | Disclaimer | Report Adult Posts

Various Topics on Python



Python - "How to write this regular expression?" in Programming Languages


Old 05-04-2005   #11
..re.. ..we..
 
Default Re: How to write this regular expression?

On Wed, 04 May 2005 20:24:51 +0800, could ildg wrote:

> Thank you.
>
> I just learned how to use re, so I want to find a way to settle it by
> using re. I know that split it into pieces will do it quickly.


I'll say this; you have two problems, splitting out the numbers and
verifying their conformance to some validity rule.

I strongly recommend treating those two problems separately. While I'm not
willing to guarantee that an RE can't be written for something like ("[A
number A]_[A number B]" such that A < B) in the general case, it won't be
anywhere near as clean or as easy to follow if you just write an RE to
extract the numbers, then verify the constraints in conventional Python.

In that case, if you know in advance that the numbers are guaranteed to be
in that format, I'd just use the regular expression "\d+", and the
"findall" method of the compile expression:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> m = re.compile("\d+")
>>> m.findall("344mmm555m1111")

['344', '555', '1111']
>>>


If you're checking general matching of the parameters you've given, I'd
feel no shame in checking the string against r"^(_\d+){1,3}$" with .match
and then using the above to get the numbers, if you prefer that. (Note
that I believe .match implies the initial ^, but I tend to write it
anyways as a good habit. Explicit better than implicit and all that.)

(I just tried to capture the three numbers by adding a parentheses set
around the \d+ but it only gives me the first. I've never tried that
before; is there a way to get it to give me all of them? I don't think so,
so two REs may be required after all.)
 
Old 05-05-2005   #12
..u.. ....
 
Default Re: How to write this regular expression?

On 5/5/05, Jeremy Bowers <jerf@jerf.org> wrote:
> On Wed, 04 May 2005 20:24:51 +0800, could ildg wrote:
>
> > Thank you.
> >
> > I just learned how to use re, so I want to find a way to settle it by
> > using re. I know that split it into pieces will do it quickly.

>
> I'll say this; you have two problems, splitting out the numbers and
> verifying their conformance to some validity rule.
>
> I strongly recommend treating those two problems separately. While I'm not
> willing to guarantee that an RE can't be written for something like ("[A
> number A]_[A number B]" such that A < B) in the general case, it won't be
> anywhere near as clean or as easy to follow if you just write an RE to
> extract the numbers, then verify the constraints in conventional Python.
>
> In that case, if you know in advance that the numbers are guaranteed to be
> in that format, I'd just use the regular expression "\d+", and the
> "findall" method of the compile expression:
>
> Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
> [GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import re
> >>> m = re.compile("\d+")
> >>> m.findall("344mmm555m1111")

> ['344', '555', '1111']
> >>>

>
> If you're checking general matching of the parameters you've given, I'd
> feel no shame in checking the string against r"^(_\d+){1,3}$" with .match
> and then using the above to get the numbers, if you prefer that. (Note
> that I believe .match implies the initial ^, but I tend to write it
> anyways as a good habit. Explicit better than implicit and all that.)
>
> (I just tried to capture the three numbers by adding a parentheses set
> around the \d+ but it only gives me the first. I've never tried that
> before; is there a way to get it to give me all of them? I don't think so,
> so two REs may be required after all.)

You can capture each number by using group, each group can have a name.

> --
> http://mail.python.org/mailman/listinfo/python-list
>

 
Old 05-05-2005   #13
..re.. ..we..
 
Default Re: How to write this regular expression?

On Thu, 05 May 2005 09:30:21 +0800, could ildg wrote:
> Jeremy Bowers wrote:
>> Python 2.3.5 (#1, Mar 3 2005, 17:32:12) [GCC 3.4.3 (Gentoo Linux
>> 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2 Type "help", "copyright",
>> "credits" or "license" for more information.
>> >>> import re
>> >>> m = re.compile("\d+")
>> >>> m.findall("344mmm555m1111")

>> ['344', '555', '1111']
>>
>> (I just tried to capture the three numbers by adding a parentheses set
>> around the \d+ but it only gives me the first. I've never tried that
>> before; is there a way to get it to give me all of them? I don't think
>> so, so two REs may be required after all.)


> You can capture each number by using group, each group can have a name.


I think you missed out on what I meant:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> m = re.compile(r"((?P<name>\d+)_){1,3}")
>>> match = m.match("12_34_56_")
>>> match.groups("name")

('56_', '56')
>>>


Can you also get 12 & 34 out of it? (Interesting, as the non-named groups
give you the *first* match....)

I guess I've never wanted this because I usually end up using "findall"
instead, but I could still see this being useful... parsing a function
call, for instance, and getting a tuple of the arguments instead of all of
them at once to be broken up later could be useful.
 
Old 05-05-2005   #14
..u.. ....
 
Default Re: How to write this regular expression?

Sorry to Jeremy, I send my email derectly to your mailbox just now.

Group is very useful.
On 5/5/05, Jeremy Bowers <jerf@jerf.org> wrote:
> On Thu, 05 May 2005 09:30:21 +0800, could ildg wrote:
> > Jeremy Bowers wrote:
> >> Python 2.3.5 (#1, Mar 3 2005, 17:32:12) [GCC 3.4.3 (Gentoo Linux
> >> 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2 Type "help", "copyright",
> >> "credits" or "license" for more information.
> >> >>> import re
> >> >>> m = re.compile("\d+")
> >> >>> m.findall("344mmm555m1111")
> >> ['344', '555', '1111']
> >>
> >> (I just tried to capture the three numbers by adding a parenthesesset
> >> around the \d+ but it only gives me the first. I've never tried that
> >> before; is there a way to get it to give me all of them? I don't think
> >> so, so two REs may be required after all.)

>
> > You can capture each number by using group, each group can have a name.

>
> I think you missed out on what I meant:
>
> Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
> [GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import re
> >>> m = re.compile(r"((?P<name>\d+)_){1,3}")
> >>> match = m.match("12_34_56_")
> >>> match.groups("name")

> ('56_', '56')
> >>>

>
> Can you also get 12 & 34 out of it? (Interesting, as the non-named groups


Yes, you can extract **anything** you want if you like, to get each number
is easy, the only thing you need to do is to give a name to the number.

import re
str=r"_2_544_44000000"
r=re.compile(r'^(?P<slice1>_(?P<number1>[1-3]?\d))'
'(?P<slice2>_(?P<number2>(3[2-9])|([4-9]\d)|(\d{3,})))?'
'(?P<slice3>_(?P<number3>(3[2-9])|([4-9]\d)|(\d{3,})))?$',re.VERBOSE)
mo=r.match(str)
if mo:
print mo.groupdict()
else:
print "doesn't matche"

The code above will get the following rusult:
{'slice1': '_2', 'slice2': '_544', 'slice3': '_44000000', 'number2':
'544', 'number3': '44000000', 'number1': '2'}

> give you the *first* match....)
>
> I guess I've never wanted this because I usually end up using "findall"
> instead, but I could still see this being useful... parsing a function
> call, for instance, and getting a tuple of the arguments instead of all of
> them at once to be broken up later could be useful.
> --
> http://mail.python.org/mailman/listinfo/python-list
>

 
Old 05-05-2005   #15
.. ..
 
Default Re: How to write this regular expression?

Peter Hansen wrote:
> could ildg wrote:
>
>> I need a regular expression to check if a string matches it.

>
>
> Why do you think you need a regular expression?
>
> If another approach that involved no regular expressions worked much
> better, would you reject it for some reason?
>
> -Peter


A regular expression will work fine for his problem.
Just match the digits separated by underscores using a regular
expression, then afterward check if the values are valid.
 
Old 05-05-2005   #16
..edr.. ..n..
 
Default Re: How to write this regular expression?

"D H" <a@b.c> wrote:

> > Why do you think you need a regular expression?
> >
> > If another approach that involved no regular expressions worked much
> > better, would you reject it for some reason?

>
> A regular expression will work fine for his problem.
> Just match the digits separated by underscores using a regular
> expression, then afterward check if the values are valid.


you forgot to mention Boo here, Doug. nice IronPython announcement,
btw. the Boo developers must be so proud of you.

</F>



 
Old 05-09-2005   #17
.. ..
 
Default Fredrik Lundh

Fredrik Lundh wrote:
> "D H" <a@b.c> wrote:
>
>
>>>Why do you think you need a regular expression?
>>>
>>>If another approach that involved no regular expressions worked much
>>>better, would you reject it for some reason?

>>
>>A regular expression will work fine for his problem.
>>Just match the digits separated by underscores using a regular
>>expression, then afterward check if the values are valid.

>
>
> you forgot to mention Boo here, Doug. nice IronPython announcement,
> btw. the Boo developers must be so proud of you.
>
> </F>


You never learn, do you Fredrik. I guess that explains why Boo will
never be mentioned on the python daily site your pythonware business
controls.

Here are some of Fredrik's funnier crazy rants right here:
http://www.oreillynet.com/pub/wlg/6291

Any that you perceive as competition and threatening to your consulting
business really draws out your true nature.
 
Old 05-09-2005   #18
..be.. ....
 
Default Re: Fredrik Lundh

D H wrote:
> Fredrik Lundh wrote:


>>you forgot to mention Boo here, Doug. nice IronPython announcement,
>>btw. the Boo developers must be so proud of you.
>>
>></F>

>
> You never learn, do you Fredrik. I guess that explains why Boo will
> never be mentioned on the python daily site your pythonware business
> controls.


It's called Daily Python-URL not Daily Python-Like-Languages-URL. *That*
explains it. It's not like Pythonware is hiding its relationship.

> Here are some of Fredrik's funnier crazy rants right here:
> http://www.oreillynet.com/pub/wlg/6291


Funny you should mention that article since I showed that Fredrik's
benchmarks were correctly done (if not diligently-reported) while Uche's
were wrong on both marks.

http://www.oreillynet.com/cs/user/view/cs_msg/51158

> Any that you perceive as competition and threatening to your consulting
> business really draws out your true nature.


Oy, my head hurts. Take it off-list, both of you. The rest of us don't
care about your bickering.

--
Robert Kern
rkern@ucsd.edu

"In the fields of hell where the gr*** grows high
Are the graves of dreams allowed to die."
-- Richard Harter

 

Thread Tools
Display Modes





Powered by vBulletin®
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0