brool

brool \brool\ (n.) : a low roar; a deep murmur or humming

Parsing Tables With Beautiful Soup

Just a quick snippet, since it is obvious after writing it but was not obvious while searching for it:

html = file("whatever.html")
soup = BeautifulSoup(html)
t = soup.find(id=label)
dat = [ map(str, row.findAll("td")) for row in t.findAll("tr") ]

… or, map a different function is you need to further parse the individual table comments. At any rate, with Beautiful Soup, many things become trivial; it really is an amazing library.

4 Responses to “Parsing Tables With Beautiful Soup”

  1. Minder ()

    Thanks! That was very helpful :D

  2. David Underhill ()

    Thanks, this is pretty much what I needed. I tweaked it a little get the contents of columns (no td tags):

    rows = [[c.string for c in row.findAll("td")] for row in t.findAll(“tr”)]

  3. Sankalp Agarwal ()

    It works seamlessly but doesn’t work if the person has inserted tags in between ..

    E.g.


    ….

    .
    .

    …..
    …..

  4. Tweets that mention brool ยป Parsing Tables With Beautiful Soup -- Topsy.com ()

    [...] This post was mentioned on Twitter by Marcel Caraciolo, quippd Python News. quippd Python News said: RT @onelinetips #python scrape html tables with a few quick lines of BeautifulSoup http://bit.ly/hyN1I4 [...]

Leave a Reply