Let’s turn {on|off} TV

August 1, 2008

I am not a TV addicted, but sometimes I like to watch movies on TV. As I don’t have money to spend with Pay-per-view or Cable TV, so before turn on mine TV and tune on some channel, I search at Folha Ilustrada for good films.

I think the Folha Ilustrada is better to consult, because in there all movies are classified as good, bad, not so bad, etc..

Yesterday, I was already bored to put the URL in my Firefox browser to go Folha Ilustrada and find something intersting. Then, I make a python script to bring to me the informations, look:

#!/usr/bin/python

import urllib2
import datetime
import re
from textwrap import TextWrapper
from BeautifulSoup import BeautifulSoup

class Films():

 _url          = 'http://www1.folha.uol.com.br/folha/ilustrada/filmes/'

 _today        = datetime.date.today().strftime('%A')

 _days_of_week = { 'Monday':'segunda',
 'Tuesday':'terca',
 'Wednesday':'quarta',
 'Thursday':'quinta',
 'Friday':'sexta',
 'Saturday':'sabado',
 'Sunday':'domingo'
 }

 def __init__(self):
     self.view_films()

 def view_films(self):
     regex = re.compile('localItem*')
     clean_tags = re.compile('<(/|)(div|p|h1|h3|b|i)(| class=".*")>')

     text_wrapper = TextWrapper()
     text_wrapper.width = 72

     page = self._url + self._days_of_week[self._today] + '.shtml'
     resp = urllib2.urlopen(page)
     html = resp.read()
     resp.close()

     for i in BeautifulSoup(''.join(html)).findAll('div'):
         try:
             if re.match(regex,i['class']):
                 formatted = text_wrapper.wrap(re.sub(clean_tags,'',i.__str__()))
                 for paragraph in formatted:
                     print paragraph.decode('utf8')
                 print '\n'
         except:
             pass

if __name__ == "__main__":
    Films()

Bye. ;)