email - scan a file using python for a specific string -


this related question writing xml document. im trying read email file (txt/html) doesnt matter on format, know how specific string (i.e. build) never in same place twice , has associated string of interest me? btw, i'm writing script in python. can provide example of type of email i'm referring when comes looking information im trying use.

my code sits:

    open('daily build email  07012013.txt','r') x:       b = 1       linka = b       linkm = b       line in x:         print b,' + ',line         if "link1" in line:          linka = line          string.strip (s[link1: ])          print "link ", linka         #else:         #   continue         if "link2" in line:          linkb = line          print "link ", linkm         else:             continue         b += 1  x.close() 

the string strip make line contain network location linka , linkm, because of leading characters in line before \ in opened file need remove characters lines contain links in them. plus need write both links file (build.xml) can use build.xml file automate test process every time new build email. plus need allow 2 or more builds per email msg (not sure on yet).

i think main problem string.strip - i'm assuming you've seen in documentation somewhere. word string in string.strip not meant there literally, it's meant replaced name of string want strip. telling string (in case, linka) strip leading characters itself. takes list of characters, not string. not modify string itself, returns new string, can put in same variable, or another. think you're after more string.replace(fromstr,tostr), in case, linka=linka.replace("link1: ","")

assuming b line-counter, don't want else: continue either - skips rest of loop, b not incremented.

also, should read on regular expressions - regex - perfect trying here. have steep learning curve (especially if try start practical examples, can hard read @ first), worthwhile, , perfect this.

i realise there debugging code in there.

i rewrite of above follows:

import re line in open('daily build email  07012013.txt','r'):     match=re.match(r'link1: (.*)',line)     if (match):             linka = match.group(1)      match=re.match(r'link2: (.*)',line)     if (match):             linkb = match.group(1) 

so main thing different (other stripping debugging code...) use of regular expressions - using module re. instruction match=re.match(r'link1: (.*)',line) magic happens. link1: (.*) pattern for. in patterns, letters stand - searches link1: @ beginning of string (in case, beginning of line. single dot.can represent character, , a*` means, 0 or more. bit between brackets says number (including 0) of character. regular expressions, unless other wise specified, "greedy" - match as possible. match end of line. because bit in brackets, assigned "group" (more on later).

so re.match search second parameter (line) , try match pattern. if finds match, returns information match, otherwise returns none.

on next line, if (match): - matches pass test, none fails, code block run if there match. group(1) match (i.e. the bit in first [and in case only] set of brackets - information after "link1: " put in linka, and, hey! we're done!

repeat same link2/linkb.

then, go on next line.

done!


Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -