Radio Shows As Podcasts: Part 3b – Scheduling the Python Code

This is the second part of the post on how I daemonised my Podcast XML auto generating Python programme. Part 3a can be found here.

When writing the scan_for_podcasts.py file, I’d already written a check to see whether the current time (i.e. datetime.datetime.now()) was x minutes from the time of the last scan for podcasts. I had initially thought that I would be able to generalise this in some way, in that I would have the script constantly check whether the current time was appropriately later than the last scan time, and then execute the scan when this was the case.

In the end this approach was wildly inefficient, using far more cpu time than it should have. I had to learn how to use the threading module, partly through hacking SickRage apart a little.

A Test Case and First Attempt

I wrote the following test Python script, which exhibited the desired behaviour (i.e. wrote to the log every 5 seconds).

import logging, time, date time

# find out when first executed
then = datetime.datetime.now()

# and 5 seconds from then
fiveSec = then + datetime.timedelta(seconds=5)

while True:
       # find out now
       now = datetime.datetime.now()

       # compare now to five seconds from when first executed
       if not (now < fiveSec):
               logging.info("It's now 5 seconds from when the script first $
               # reset then to time this if is triggered
               then = datetime.datetime.now()
               # and 5 seconds from then / reset fiveSec
               fiveSec = then + datetime.timedelta(seconds=5)

During my debugging of this, I had included the line print now within the while True loop. A while True loop will continuously execute code unless it is broken from, which meant I was created with a stream of times being printed out, about as quickly as my Pi could. This prompted me to consider exactly how much CPU this method would take up.

After finding out the process id with ps -aef | grep "myservice" (where myservice was the name I gave my tester scripts) – lets say it was 1234 – I could find out the current %cpu with the command top -p 1234. The little bit of code contained in the test Python script was using roughly 99% of the Pi’s CPU. A highly undesirable situation for a process I wanted to run in the background. The obvious cause of this was that the while True loop was continuously checking what the current time was. Where “continuously” meant every microsecond, or perhaps on an even shorter timescale. An completely unnecessary thing to do, especially when the script was designed to be ran most frequently once every hour (or maybe half hour, not decided on that yet!).

A better method would be to find some way of making the script execute itself every x minutes. The above method actually consisted of the script being ran continuously, hence the high CPU time. A script which executed itself, then waited x minutes, executed itself and then waited again, would be far more efficient. I knew that I could set a timer and have a Python script sleep (i.e. wait) for a set period, but I was unsure how to use this to make a script which would re-execute itself without causing looping issues or writing a programme which would become too self-referential to be understandable.

A quick google search turned up this Stackoverflow answer which appeared to be along the right lines. Copying the example code into my test Python script and running the top command again showed me that, even set at a 2 second interval, the script was only using ~0.4% CPU, and only, approximately, when it was scheduled to run (i.e. it would use 0.0% CPU one second, then ~0.4% CPU the next). This looked like a good, potential solution.

A problem outlined in the answers, and I noticed when checking the timestamps in the log, is that this approach does not consider the amount of time the code takes to execute. This can cause a “creep” in the time in which the code is executed. e.g. if one wants the code to be executed every 5 minutes, starting at 1 am, then executing at 1:05 am, etc., and the script takes 1 minute to execute, then the script will execute at 1:06 am, then 1:12 am, etc.

Dealing With Drift: SickRage & Threading

There were a couple of suggestions of how to approach a solution to this problem in other answers to the parent Stackoverflow question I linked to above (question linked to here). For example, one answer suggests taking the difference in between before the code is executed and after it, then taking this away from the delay you wish (i.e. 60 seconds – 0.1 second for how long the code takes to execute = 59.9 second to wait), while others suggest making use of either the Twisted or the APScheduler packages.

I’d already considered the first approach, and wanted to avoid adding any further dependencies to this application, especially for something which I was fairly confident I would be able to code a standard Python solution for. In general I’ve been inspired by how SickRage and CouchPotato operate in constructing this application, so I decided to try to find out how SickRage schedules its daily scans.

The scheduling of the daily scans appears to be handled by the Scheduler class, which is found in sickbeard/schedule.py (the def __init__ part of which I’ve included below[1])

class Scheduler(threading.Thread):
    def __init__(self,
                 action,
                 cycleTime=datetime.timedelta(minutes=10),
                 run_delay=datetime.timedelta(minutes=0),
                 start_time=None,
                 threadName="ScheduledThread",
                 silent=True):
        super(Scheduler, self).__init__()

but also by the dailySearchScheduler object, which is instantiated (defined?) in sickbeard/__init__.py:

update_interval = datetime.timedelta(minutes=DAILYSEARCH_FREQUENCY)
dailySearchScheduler = scheduler.Scheduler(dailysearcher.DailySearcher(),
                                           cycleTime=update_interval,
                                           run_delay=update_interval,
                                           threadName="DAILYSEARCHER")

If I understand the code properly, the dailySearchSchedular instantiation sets the “action” to the DailySearcher() object (a class defined in sickbeard/dailysearcher.py uses the values given for the daily search in the settings, e.g. 60 minutes, and names the thread “DAILYSEARCHER”. Importantly, it does not set a value for the start_time variable, meaning any code that is executed if start_time has a variable can be ignored for my purposes.

The class itself checks whether a start time, which is the last run time plus the run delay, minus the cycle time, for the search has been given. If not, it sets that value to now, plus the run delay, minus the cycle time. Presumably for some instantiations of the Scheduler class these are different values; for the daily search they are the same time, the “update interval”. There is a function which calculates how long we have to wait until the next scan, timeLeft, one which runs the thread, i.e. the actual search, run, and one which concerns whether a daily search has been forced (which does not concern me now),

The timeLeft function is fairly straightforward, in that it returns the cycle time (the update time), minus the current time, minus the last run time. This will be a number which counts down from whatever the cycle time is set to, to 0. This check is contained within an if statement:

if self.isAlive():

The isAlive() attribute returns True or False depending on whether the thread is alive.[2] What is a thread? It is:[3]

A Thread or a Thread of Execution is defined in computer science as the smallest unit that can be scheduled in an operating system. Threads are normally created by a fork of a computer script or program in two or more parallel (which is implemented on a single processor by multitasking) tasks. Threads are usually contained in processes. More than one thread can exist within the same process. These threads share the memory and the state of the process. In other words: They share the code or instructions and the values of its variables.

By using the threading module, we can treat threads as objects, hence the ability to use the .isAlive() attribute.

The significance of SickRage employing an isAlive() check, is that it is using threads to schedule the scan, which allows it use far less cpu time. This is basically what the run function does.

Searching for “scheduling code threading python” allowed me to find this example Python script for event scheduling which makes using of the thread and threading modules. This example has a similar structure, with a function to handle the timings and a run function. It also contained the line self.setDaemon(True). Reading the documentation on the threader module, I recognised that the thread that schedules the scan can be started as a daemon thread itself, meaning that this thread would run separately to the code, which would then be executed when the thread’s counter ran down.

With this example and what I’d learned from reading through SickRage, I was able to write the two scripts found in my Test Scheduler repository which printed “test” every 30 seconds. I then modified these slightly to run the (modified, as I briefly explained in the parent post to this post) programme itself. Satisfyingly, this approach results in ~0.3% cpu usage every few seconds.

[1] One piece of code I’ve missed out from here is the declaration of the variable self.enable = True in the definition of the __init__ function of the Scheduler class. In modifying this code I had to set this variable to True for the script to work (there is an if self.enable: conditional block). It appears to be a variable set through the Web UI, and the altering of the variable is carried out somewhere in a .mako file (which I’ve yet to find). Mako is a non-XML template library written in Python. It’s (as it’s homepage declares) used by Reddit. prima facie, it appears to plug into Python quite nicely; the ability to present the status of variables on a webpage is quite a neat thing.

After figuring out what this variable was, and where it came from, I simply commented out the variable and the conditional block, as I would not be enabling a choice of enabling and disabling the scan in the first instance.

[2] See here for the rather short section on this attribute in the documentation.

[3] This definition is taken from the Threads in Python lesson from Python-Course.eu. I learnt some of what I know about threads from there, the rest of messing around with Python and reading Stack Overflow etc.

This entry was posted in Programming, Python. Bookmark the permalink.

2 Responses to Radio Shows As Podcasts: Part 3b – Scheduling the Python Code

  1. Pingback: Radio Shows As Podcasts: Part 3 – Daemonisation | confusedpublic

  2. Pingback: Radio Shows As Podcasts: Part 3a – The init.d Script | confusedpublic

Leave a Reply

Your email address will not be published. Required fields are marked *