Brool brool (n.) : a low roar; a deep murmur or humming

Parsing Tables With Beautiful Soup

 |  beautiful soup coding python

Just a quick snippet, since it is obvious after writing it but was not obvious while searching for it:

html = file("whatever.html")
soup = BeautifulSoup(html)
t = soup.find(id=label)
dat = [ map(str, row.findAll("td")) for row in t.findAll("tr") ]

… or, map a different function is you need to further parse the individual table comments. At any rate, with Beautiful Soup, many things become trivial; it really is an amazing library.


Comments are moderated whenever I remember that I have a blog.

tim | 2015-07-12 19:13:01
There a few really rare cases where you might parse HTML through a regex (say, machine-generated HTML that you need to parse into a database and you'll only run it once and you just need to get the data imported, already) but otherwise... no, just don't do it. The canonical reference on this topic is <a href="" rel="nofollow">this answer on StackOverflow</a>.
David Underhill | 2010-12-18 05:05:16
Thanks, this is pretty much what I needed. I tweaked it a little get the contents of columns (no td tags): rows = [[c.string for c in row.findAll("td")] for row in t.findAll("tr")]
Sankalp Agarwal | 2011-01-06 13:24:42
It works seamlessly but doesn't work if the person has inserted tags in between .. E.g. ... .... . . </b> ..... .....
Minder | 2009-12-08 19:12:22
Thanks! That was very helpful :D
Amos | 2015-06-27 01:08:02
I've tried using Beautiful Soup as well and didn't like the overhead or dcmooentatiun. But maybe it has gotten better now. Sounds like you have an aversion to using regular expressions for this. I'm curious why? In my mind, the nice thing about them is that they are pretty much cross-language compatible.
Add a comment