floating in space… Ladies and Gentlemen, we are

13Nov/1031

Top TV schedule in XMLTV format

Played around a bit with the Top TV schedule and created a scraper. Expect bugs.
Get the Top TV schedule in XMLTV format here:
http://bluegray.co.za/xmltv/toptv/

Comments (31) Trackbacks (0)
  1. Thanks for the service. I was just about to write my own scrubber for the TopTV site and found you on the ZAXMLTV.flash forum.
    The individual downloading works, but for some reason downloading the complete list gives me a binary blob?

  2. Thanks for the feedback.
    The file is in .gz format. You can now choose between .zip, .gz or uncompressed.

  3. Found the problem with the GZIP file. It seems to be gzipped twice? Can you confirm this.
    Thanks

  4. Yeah I had that problem before, but I thought I fixed it. If it continues to be double gziped, please let me know.

  5. Working now. thanks

  6. Oh yes, one other thing.
    How difficult will it be to parse the Title field and splitting it at the “:” character into “Title” and “Sub Title” (Sub title being the episode title).
    This will make scheduling recordings of series work in Mythtv. (at the moment each of the series titles are unique and it is impossible to tell Mythtv to record all upcoming titles in one go, if you know what I mean)

  7. Done ;)
    Please test and report back.

  8. Perfect. Thanks.

    (zip file is not complete, gzip is fine)

  9. Listing seem to be not updating past a day or two.

  10. Should be fixed now.

  11. Thanx for the guide,

    but some of the programme times are off.
    Mostly have the start time after the stop time
    Can you please have alook

  12. Sure, which programmes and channels is this? Can you check if it’s correct on the toptv site?

  13. Hi,
    I see the xmltv_all.xml.gz is not updated past 20110315?
    Can please take a look.
    Thanks

  14. RE: Mostly have the start time after the stop time

    I had a lok at the import errors .. its mainly for the last program on the channel, since the is no way of knowing when it ends. Your script seems to be setting the stop time to be the date when you run the script

    eg:
    programme channel=”4.toptv.co.za” start=”20110330223000 +0200″ stop=”20110317020000 +0200″

    The start date is 30 March , but
    The stop date is 17 March , which is the day with the xmlfile was created.

  15. guide is blank again.
    Thanks

  16. Yeah, TopTV’s site had a makeover. Will have to wait till I get time to see what changed.

  17. Anyone else also having problems with MythTV and mythfilldatabase?
    The spaces in the channel ids are causing mythfilldatabase to skip them completely?

  18. I don’t use mythtv, but spaces should now be replaces by underscores in channel ids.

  19. ETA for the updated guide after the TopTV site change??

  20. Need any help getting the new schedule parsed?
    What tools do you use for the scrapping and packing into XMLTV?
    I contacted TopTV and asked if they wont just post the schedule in XLS or CSV format, they responded “We are currently updating our TV Schedule on an ongoing basis. We will provide a solution soon.” So I dont think they understood the question :) ?

  21. Yeah, I don’t expect any help from TopTV ;)
    Thanks for offering help, but I’ll have to look into the changes myself.

    I don’t have an ETA, I’ll get to it when I get to it – seeing as they made quite a few changes recently over a few weeks, I think it’s best to let them settle before I update.

  22. Hi, I took this opportunity to learn a bit of XPath parsing in Python using the lxml lib. If you are interested I have a very much alpha parser for the new site that kinda works.

  23. Sure, i’ll have a look at it – send it to bluegraydragon at gmail dot com

  24. Awesome to have the guide back. I’fve picked up some issues though
    Can you remove the line

    it fails to import with that line even IE view doesn’t format properly with that line in.

    also your DOCTYPE tv SYSTEM reference to the DTD file must be in ” ” and not ‘ ‘

  25. oops html tags….

    The line with xml-stylesheet

  26. The xml is generated by the python modules and the quotes used are beyond my control. Besides, it’s perfectly valid: http://stackoverflow.com/questions/2045257/are-single-quotes-valid-in-a-doctype

    The xml-stylesheet line is also valid, where are you trying to import it to?

  27. Thank you so much for this service. No one else is providing guide data for TopTV any more. Was getting a bit desperate to find guide data for my media center.

  28. Any thoughts on creating a guide for DSTV? ZAZMLTV (Robert) seems to be hanging up his guns so to speak. I am battling to find another site that I can use for Mythtv.

  29. Hi, thanks for the xmltv generator, I recently got TopTV and had to schecule recordings manually, now I can just point and click. :-)
    I use MythTV, but I noticed that the dates for Jan are all for 2011 and not 2012.
    At the moment I use sed to fix it before I import it:
    sed -i ‘s/=”201101/=”201201/g’ xmltv_toptv_all.xml

  30. @Andrew
    Will see. DSTV is constantly making it difficult to write a scraper script, so don’t hold your breath… it’s not worth the trouble.

    @Andre
    Yup, dates are wrong with the jump to the new year because the TopTV schedule does not include the year with the scheduled time, so I have to guess. But should be ok now. Will see if I can implement a workaround for next year ;)


Leave a comment




No trackbacks yet.