Parsing Tables With Beautiful Soup
Just a quick snippet, since it is obvious after writing it but was not obvious while searching for it:
html = file("whatever.html") soup = BeautifulSoup(html) t = soup.find(id=label) dat = [ map(str, row.findAll("td")) for row in t.findAll("tr") ]
… or, map a different function is you need to further parse the individual table comments. At any rate, with Beautiful Soup, many things become trivial; it really is an amazing library.
Thanks! That was very helpful :D
Thanks, this is pretty much what I needed. I tweaked it a little get the contents of columns (no td tags):
rows = [[c.string for c in row.findAll("td")] for row in t.findAll(“tr”)]
It works seamlessly but doesn’t work if the person has inserted tags in between ..
E.g.
…
….
.
.
…..
…..
[...] This post was mentioned on Twitter by Marcel Caraciolo, quippd Python News. quippd Python News said: RT @onelinetips #python scrape html tables with a few quick lines of BeautifulSoup http://bit.ly/hyN1I4 [...]