Django, Web Sockets & AJAX

As part of the re-write of the curated Funko Vault, I wanted to implement Twitters style pagination and instant searching. After looking at how CouchPotato achieves this I started to look up WebSockets. In this post I’ll outline what steps I took in attempting to implement instant search via WebSockets for the vaulted Pops!, the problems I faced, and the AJAX solution I eventually settled on.

The theory behind WebSockets is pretty simple. Traditional websites involve a single request from a client to the server, which then serves the page to the client. That’s it, the client receives a static object. Any new information results in a new HTTP request and a new page load. In order to have content that can be pushed to a client without them reloading the page, a constant connection has to be established between the client and the server. Before WebSockets, this wasn’t really possible, and work arounds had to be implemented, such as forcing a continuous connection through “long polling”[1]. WebSockets is an attempt to solve this. They’re a new protocol in HTML5 that establishes a separate (non-HTTP) connection between the client and the server which remains open until it is closed (or the page is navigated away from).

WebSockets on Django: A Multitude of Issues

My first attempt at getting WebSockets working on Django was to install Channels, which is supposed to be an “easy to understand extension of the Django view model”. Before I could even get to testing that claim, however, I struggled to simply get it to run. Following the “Get Started” guide to install the app and Redis was fairly straightforward, and sending a message through the console as with the first step was possible. But moving beyond this proved too challenging. After asking for help, I was pointed towards Channels API, but I found no help here.

I decided to abandon Channels (unfortunately only after about a month or two of trying to get it to work) in favour of Tornado, which I knew was capable of providing WebSockets, as this is what CouchPotato uses. Install Tornado was simple, and using just a simple script (though I eventually used the one provided by Jorge below), I had Tornado and Django running simultaneously. It was a little tricky to get the URLs correct in order to open the WebSocket connection, but fairly quickly I was able to send a message over WebSockets (opening a simple alert box) via the console.

Now that I had the WebSocket connection and could send messages, the challenge was figuring out how to adapt the extant Django installation and apps to work via WebSockets. This is where I ran into my next big problem. I was following this blog post (by Jorge Silva) to get Tornado and Django to work together, which resulted in a message workflow that isolated the WebSockets connection (and hence messages) from the Django core. That is, the WebSockets messages could not be easily sent to the Django apps. Unfortunately I could not see a way of solving this issue.

AJAX to the Rescue

At this point, I decided that I would abandon WebSockets and look at what I could achieve with the older technology of AJAX. And within about 10 minutes, with the help of this blog post, I had a established an AJAX connection and was returning search results in a JSON format to the console. Within about half an hour, I had a functioning AJAX based instant search.

In conclusion I think a large part of the problem I was facing is two-fold. First, Django itself was not designed with WebSockets in mind. Its class and view based structure does not easily lend itself to the data flow required for working with a WebSocket application. Second, most of the WebSockets examples I looked at involved building simple chat rooms, a far more complicated application than I was attempting to design. I think that using WebSockets for instant search and Twitter style pagination is effectively over-engineering a solution.

“Long polling” is one of the ways in which AJAX is leveraged to keep a connection up for a sustained (perhaps indefinite) period of time. See this Slack Overflow answer for a brief explanation.

Leave a comment

Dynamic Select Based Date Entry Forms

The code I’m going to talk about can be found in my Dynamic Date Selects repository

Been a while since I’ve posted anything! One of the reasons is that I’ve spent a bit of time over the last few months building and developing my curated Funko Pop! Vault. I developed the backend (simple web-scraping) using Python, and had to re-learn some php to get the front side working again (it’s currently using a hacked version of this blog). For the last couple of months, I’ve split my time between learning Django and auditing some Coursera courses on Data Science.

This post is to share with you a small piece of code I worked on yesterday, which aims to help with the search form for the vault. Here I give users the option to search for when a Pop! might have been vaulted. One of the advantages of Django is that it comes with built in form validation. However, one of the recommended ways to build a field such as a multi-select date entry field is to build it as a “widget”, that extends a class.[1] Unfortunately, due to building a custom widget, I needed to also build my own date validation.

I decided that it would be easier for users if I did my best to prevent them from entering an invalid date, rather than checking on the validity of the date and then returning them to the search form if they made a mistake. Having recently taken the codecademy course on jQuery, I figured this would be the easiest way to implement what I had in mind: dynamically alter the number of days a user could select when they chose a specific month.

The code was, initially, fairly straightforward, until I hit a stumbling block. I had set up a Javascript object to hold the days of the month, with the days the keys and the months in an array as the values, like so:

days = {
    31 : [1, 3, 5, 7, 8, 10, 12],
    30 : [4, 6, 9, 11],
    28 : [2]
};

I set up the rest of the function to grab the month selected and input it into a function that would cycle through this days object and for each number of days (each key of the object), it would cycle through the months in the array. Upon finding the month in the array, the function was supposed to return the number of days, the key. Unfortunately, every time I ran this function, the variable I had set to it was returning undeclared. I’d stumbled upon a problem due to function closure. In order to cycle through the object and the arrays as values properly, I was having to use the jQuery $.each() built in function, which requires a function as its second value to separate out the keys and values:

$.each(obj, function(keys, values) {
   ...
}

This meant that when I attempted to return the key once I had found the correct number of days in the selected month, I was only returning it to the parent function, not to the variable that was calling the function. However, as this child function was within the scope of the $.each() function, and it was unnamed, I had no way of assigning the returned value to a variable in the parent function.

Fortunately I’m not the first person to come across this issue. This great answer on Stack Overflow explains what I found to be the easiest solution: create a variable within the scope of the parent function, which can be accessed by the child $.each() function. Assign the value you want to return in the parent function to this variable, and crucially, return the child function as false to break out of the loop set up by the $.each() function.

[1] I started by copying & adjusting the example for this, given in the documentation.

Leave a comment

Quick Update: Numpy & Pandas on Raspberry Pi

Just a quick update. I’ve been working on an app to scrape & analyse the prices and availability of records released on Record Store Day from the Discogs marketplace. I’ve also started to write something to analyse Pops that have been [vaulted](). As part of this, I needed to install Numpy and pandas on the Raspberry pi.

While this appeared to be easy enough by using apt-get python-pandas, my set up seemed to be stuck on installing pandas 0.8.0, despite version 0.14.1-2 showing as available in apt-cache show python-pandas. I found that one could install a specific version of a package by appending = to the end of the apt-get command.

However, this meant that apt-get would not attempt to automatically install dependencies, and once these were specified, it would automatically attempt to install the old versions of the dependencies. The newest versions therefore had to be specified. The full command with updated, working versions of the dependancies (at the time of writing) is as follows:

sudo apt-get install python-pandas=0.14.1-2 python-pandas-lib=0.14.1-2 python-tables python-numexpr python-xlrd python-statsmodels python-openpyxl python-xlwt python-bs4 python-numpy=1:1.8.2-2
Leave a comment

Radio Shows As Podcasts: Part 3b – Scheduling the Python Code

This is the second part of the post on how I daemonised my Podcast XML auto generating Python programme. Part 3a can be found here.

When writing the scan_for_podcasts.py file, I’d already written a check to see whether the current time (i.e. datetime.datetime.now()) was x minutes from the time of the last scan for podcasts. I had initially thought that I would be able to generalise this in some way, in that I would have the script constantly check whether the current time was appropriately later than the last scan time, and then execute the scan when this was the case.

In the end this approach was wildly inefficient, using far more cpu time than it should have. I had to learn how to use the threading module, partly through hacking SickRage apart a little.

Continue reading

2 Comments

Radio Shows As Podcasts: Part 3a – The init.d Script

This is the first part of the post on how I daemonised my Podcast XML auto generating Python programme. Part 3b can be found here.

After some research, I found this blog post to contain most of the information I needed.

I already knew that I could generate a daemon by altering the skeleton init.d file found in /etc/init.d/ to point to the appropriate file. However, I was unsure how to set the flags on the start-stop-daemon command to ensure a Python script behaved properly. The skeleton file is structured as follows:

  • Start daemon function
  • Stop daemon function
  • Restart daemon function
  • A case block to process which function is called through sudo service myservice command

The start, stop and restart functions contain the appropriate flags to achieve the wanted behaviour from the start-stop-daemon command. Details on how I edited the skeleton file can be found below the fold.

Continue reading

2 Comments

Radio Shows As Podcasts: Part 3 – Daemonisation

Note: this post was original published before I’d had a chance to write up Part 2, which discussed how I generated the XML for the podcast feed, hence the out of order numbering. It also got way bigger than I initially intended for it to be, hence it being broken up into two parts!

The hardest part in making this programme has been finding out how to run a Python script as a service on the Raspberry Pi. This is known as making the programme a daemon. Achieving this required three steps:

  1. Writing an init.d script to launch the Python script, allow it to be controlled via service service name start|stop|status commands, and to allow the script to be ran at start up. My explanation of how to modify the skeleton init.d file can be found here, and the init.d script itself can be found here
  2. Creating a way to run the Python code every x minutes, a time period set by the user (and stored in the configuration file). My (rather long) explanation of how I figured out how to achieve this with the threading Python module can be found here, and the Python scripts I ended up using can be found here.
  3. Re-arranging and editing the code so that the Python files could be imported properly into the new ipodcasts.py file. Some of the important steps I undertook are outlined below, below the fold.

All told, achieving the daemonisation of my programme took about 10 hours. If it wasn’t for missing obvious mistakes it would probably have taken less time than that. Perhaps the most important lesson I’ve learnt is to NOT USE RELATIVE PATHS, especially not when writing scripts which might be ran by any user.

I’m going to leave the feature-demonise fork open on GitHub for the time being, to allow some other people to look at, comment, edit and contribute to the code. I’ve now closed & deleted the feature-demonise fork on GitHub, as I’m happy with how everything is working.

Continue reading

Leave a comment

Python for Data Analysis: Quick Update & Fix

Been a little while since I’ve been able to update this blog, or do much Python programming. Finally had a chance to start working my way through the first proper chapter and rather quickly ran into a problem with using matplotlib within the virtual environment.

Briefly, I need to run Python as a framework within the virtual environment for matplotlib to be able to use the GUI elements to draw plots. A quick google turned up this script which solved the problem nice and easily.


 

Edit: I (of course) hadn’t fully tested this; this fix meant that matplotlib was working, but I hadn’t tested to see whether it would actually draw the plots. To get the plots to display, I had to alter the backend parameter in the matplotlib/mpl-data/matplotlibrc file. It was set to MacOSX (which can be easily found by running matplotlib.get_backend()); setting it to GTK3Agg as explained here, and running the show() as below made all of the plots appear.

import matplotlib.pyplot as p
p.show()

And, of course, making this change resulted in further problems at another stage. After closing my IPython session and starting a new one, I discovered that attempting to import pandas with matplotlib‘s backend set to GTK3Agg causes an illegal line. Resetting the backend to MasOSX fixed this.

After some testing, the other options for the backend setting all produced the “illegal line” warning. Strangely, however, this error didn’t really seem to effect anything. pandas was still loading fine and working.


 

While the Python for Data Analysis book provides a lot of data for playing around with, I’ve also found The Leeds Data Mill which includes a lot of data sets originating from Leeds, the local council and so on. This should provide me with a great way to apply what I learn from the Data Analysis book to some real world data sets. Will be interesting to see if I learn anything from it as well.

Leave a comment

Python for Data Analysis: First Steps

I’ve been looking at Data Scientist roles, and a little more of what skills one needs for Analytics roles. A frequent requirement is some history or experience of using R or other statistical analysis programmes. I don’t have any history of using these types of programmes. I had used Mathematica in my Physics masters research project back in 2010, but not for a lot of statistical work. I have used python, however, and I knew that O’Reilly sold a book entitled “Python for Data Analysis”. Hopefully I can gain some skills and experience from this book, which I’ll summarise here.

First job was setting up python properly. I tried to install all of the packages required directly to the python found in OS X (I’m running 10.11, El Capitan), but something went very wrong with the installation of pandas. (Lots of reports of unused functions.) In googling for answers, I found this blog post which explained how to set up and install the relevant packages within virtual environments.

The steps laid out in the blog post were all correct, except I found that I had to paste the following lines into a newly created .bash_profile and then run the command source .bash_profile, rather than into an already existing .bash_rc (that is, there was no .bash_rc).

export WORKON_HOME=$HOME/.virtualenvs
source /usr/local/bin/virtualenvwrapper.sh

A second difference was finding the version of pandas. Rather than pandas.version.version, as in the blog post, I had to use pandas.__version__.

Leave a comment

Rare Tasting Night: Alpaca

I’ve had a personal rule for a few years now, that when I’m eating out (especially somewhere new) I’ll always eat the strangest, rarest thing I can on the menu. It’s a nice way of trying new things, and stopping one’s self from eating the same things all the time. This has lead me to eat some rather different things, from snails, to pigs trotter, to ox heart and soft shell crabs, to sea urchin. So imagine my delight to find that the great steak restaurant, Rare in Leeds, offers special tasting nights every couple of weeks that include not only nights dedicated to particular cuts of cow, but unusual animals as well. And to make things even more appealing, it’s £30 for 4 courses and drinks.

I’d been waiting to go to one of these nights since earlier this year when I first heard of them while in Canada, so I was very excited to go to their most recent night on Wednesday, Alpaca and Rum. Alpaca is most commonly known for its very soft wool; I had no knowledge of what it would be like to eat, what kind of dishes would be prepared for us, or why rum might be the drink to pair with the meat. It was a slight shock, then, to discover that the menu was mainly India/sub-continent inspired, and the rum was severed in cocktails. Given as our menu told us, that Alpaca has a taste between beef and lamb, the choice to use it in curried and spiced dishes seemed more obvious. And it was used very well.

Starter

Kofta
This dish was probably my favourite. The Alpaca was incredibly well spiced and soft. It smelled smokey, with a hint of cumin. The flat bread was very light, and just the right size to hold the two cubes of kofta. There was a delightful crunch coming from one of the spices, which complimented the soft meat and bread very well.

Though this was the only non-Indian/sub-continent inspired dish, it still maintained the theme of strongly spiced meat.

The cocktail was a lovely, sweet mixture of white rum, rose liquor, lime & grenadine.

Mains

Chaat, with Aloo
This was a multifaceted dish that had several high points. The Alpaca meat was again very well spiced. The little potato cubes were on point and, along with the chickpeas, provided a nice mix of textures with the Alpaca. Possibly the highlight of this dish was actually the bun. It was an incredibly light and soft bun, and I was rather surprised at the quality of the bread. I knew they could cook meat well at Rare, but I was not expecting such great breads!

Biryani
The biryani was probably the least good of the four courses. The rice was cooked well, however the pieces of Alpaca meat were inconsistent. Nearly all of the pieces on my plate were soft and easy to eat; Vi was less fortunate however. She complained that around half of her pieces were tough. I was a little disappointed to hear this, but I think it’s probably the nature of the animal, and I’m willing to cut the chefs some slack here. The lamb-like nature of the alpaca meat really stood out here, with both the colour and texture being very reminiscent.

Desert

Lemon Desi Tea
We had no idea what Desi Tea was. A quick google between courses told us that it is a spiced tea, which made us wonder whether we were getting two drinks as our desert. The cocktail this course was a rather strong white rum, white corcua and chocolate affair, which tasted great but was hard work to get through.

The Desi Tea ended up being a kind of ice cream. Lemon ice cream on top of a softer, slightly melted tea ice cream. While the two ice creams themselves had a nice flavour, together they were quite exquisite.

No photos available right now.

Please verify your settings, clear your RSS cache on the Slickr Flickr Admin page and check your Flickr feed

Note: I will be returning to this page at some point to update the way the gallery is displayed here. I’m currently using the slickr-flickr plug-in, which does an okay job, but I want something more dynamics and customisable. I’m guessing I’ll probably have to write it myself. Yet another learning opportunity.
Leave a comment

Deluge & Private Internet Access

I’ve been using Private Internet Access for about a year and a half now, and I’m pleased enough with the outcomes and cost. It’s simple enough to use on the Mac, and it can be run through OpenVPN. This is how I use it on my Raspberry Pis. Several of their servers offer port forwarding, which several applications need to work properly, such as Bittorrent clients, like Deluge. Finding out which port is open through the Mac application is simple (hover the mouse over the system tray icon and it appears at the end of the tool tip).

Finding it out on the Raspberry Pi requires a little bit of work. Fortunately, PIA provide an API for this purpose. The information is found out by using one’s log in details, local IP address and a curl command:

curl -d "user=USERNAME&pass=PASSWORD&client_id=$(cat ~/.pia_client_id)&local_ip=LOCAL_IP" https://www.privateinternetaccess.com/vpninfo/port_forward_assignment

This returns the port in the following format:

{ "port": 23423 }

If we set the incoming port in Deluge to this, we find that the active port works when using the VPN. Success!

However: PIA say that we should run this at least every hour to check whether the port has changed. Manually checking the port every hour would be silly, and completely impractical if one wants an automated system. This is exactly the right situation for using cron to run a script every hour, check the open port and change the incoming port options in Deluge if there is a discrepancy. I therefore took this opportunity to learn a little more about shell scripting and to finally get myself set up on GitHub.

About the Script

The script itself is pretty simple. It calls the PIA port forwarding API with user specific information set in the variables, cleans up the output, and compares it to the start and end ports currently in use by Deluge. If there is a mismatch, the script stops the daemon, modifies the configuration file (preserving the whitespace for the json) and then relaunches the daemon.

There are two caveats to running the script:

  1. It uses ifconfig to find out what the local internet address is of the VPN tunnel. I provided a variable in case your relevant VPN tunnel is named something other than tun0.
  2. It has to be ran as root/sudo due to using ifconfig and because it stops/starts a service.

Hopefully there are no security issues due to this, but please inform me if there are! Mistakes are a great way to learn, but I’d like to find out about them before running anyone’s or my own system.

I have saved the script in my /usr/local/bin folder so that I can run the script as a command from anywhere in the system. Remember to set the permissions of the script to allow it to be executed:

chmod +x deluge_check_ports

GitHub

I’d created a GitHub for myself three years ago. There I my first useful piece of Python coding, a script which edited the ID3 tags for a large number of old In Out Time podcast files I had downloaded. No one had touched that repository, including me, since I’d first uploaded it.

Anyway, the new script is held in this repository, entitled “PIA Deluge Ports” (at least for now). It contains the script, a licence and a read me. Pretty simple. Hopefully Git helps me keep track of what I’ve done (that is the point, after all), and lets other contribute if they find anything I’m doing useful.

What I’ve immediately found useful in the desktop client is that it displays the lines that have changed in red, and the new lines (with the changes) in green. This makes it very easy to comment on what the updates are, and see easily how much one has changed the file.

Set Up

In order to run the script, we have to schedule a cron job to run every hour, and as we have to run it as root/sudo, we should run it with the sudo user’s crontab:

sudo crontab -e

We then add the command we want to run, and how frequently we want it. For every hour, enter the time as every wildcards for every day, hour, day of the week, day of the month and mont, and specify a particular minute in the hour that we want it to be ran. I’ve gone for 36 here, as it’s good practice to runs scheduled tasks at random times so that they are not all running at once.

36 * * * * deluge_check_ports &> /var/log/deluge/check_ports_out.log

Note here I’ve also piped the standard out and standard error to a separate log file, just incase something else goes wrong I haven’t attempted to capture. This also means (hopefully) that the progress bar from curl will not appear on the 36th minute of every hour in my terminal window, interpreting other commands.

Edit: the PATH variable

Unfortunately the script wasn’t running with this set up. Why? Well, as I feared but hadn’t feared enough to investigate, the sudo user's cron‘s PATH variable did not include /usr/local/bin/. So while I can run the script from anywhere on my system by issuing the command sudo deluge_check_ports, this fails when running the cron job.

I followed the first post about failures of cron jobs here, which suggested adding the following line to your crontab in order to see what environment variables are being used by cron:

* * * * * env > /tmp/env.output

I found that my sudo crontab was using the following PATH variable:

PATH=/usr/bin:/bin

I decided to create a symbolic link to the file:

ln -s /usr/local/bin/deluge_check_ports deluge_check_ports

The Script

So, the script itself can be found here, but as it’s not too long, here it is as well, after the fold.

Continue reading

Leave a comment