Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Sunday, September 8, 2013

Correlation of investment funds - python pandas

While playing around with python data mining framework Pandas I really liked how easy it is to calculate pairwise correlation in data series. Let's check out a sample. As data we can download historical investment fund data and we will calculate the correlation between them.

note 1.: If you are using windows the easiest way to install python with every necessary packages is the Anaconda distribution. Just download and run the installer from here and you are ready to start :)


note 2.: It wasn't easy to find historical data about investment funds. Finally I get the data from Bloomberg website. It was a kind of reserve engineering by checking the network communication when the site was drawing the graphs so there is no warranty that the data format won't change in the future. If somebody know better way to get this data I would be happy to hear about it.

Let's download first the data into an array:


Now we load it into a Pandas DataFrame and plot it (maybe you have to import matplotlib first):


The two dataframe has to contain data for the same period of time so we can simply merge them:

The result is:

Finally let's calculate the correlation:

And we get the pairwise correlation of the numeric columns:

Friday, August 30, 2013

Django unchained - python in the cloud

I was experimenting with the django web framework. First I found this helpful tutorial: http://effectivedjango.com/tutorial/getting-started.html. I really liked that it's starting from the basics which makes it usable for python beginners too.

After reading the tutorial I wanted to create a django project in the cloud. My first idea was to use Google App Engine because it has python support and it's free for small projects. I found this tutorial http://www.allbuttonspressed.com/projects/djangoappengine about running django projects on Google App Engine but I realized that I have to extend django with support for non relational databases. I found this step a bit complicated. There was an other tutorial to run django with Google Cloud SQL but it turned out to have no free version.

So keep the story short after spending several hours to find out how to run my django project on Google App Engine I find myself in searching for other free django hosting solutions. And I come across pythonanywhere.com and it was a really nice surprise for me. It doesn't have so many features like Google App Engine but it's optimized for python. I finished the registration in 5 minutes and it was really easy to find a description how to run django project here (https://www.pythonanywhere.com/wiki/DjangoTutorial). It took me just 30 minutes to configure everything end write a hello world application from the scratch in their in browser console (yes, they have an in browser bash console) and open it in my browser for the first time.

So just to summarize my positive impressions about pythonanywhere.com:
  • free plan with mysql support
  • web based bash console
  • really easy to set up
  • good tutorial (first I missed how to refresh the server, but it was just at the end of the tutorial)
  • they have dropbox sync which makes synchronization very straightforward
  • easy way to check access and error logs
I also heard good things about heroku. They also have a good getting started guide. Maybe worth to try it as well:  https://devcenter.heroku.com/articles/django

Additionally here is a really short introduction video:

Sunday, July 21, 2013

Vectorize Image with Python scikit-image

Short story: a friend of mine wanted to display an interactive dental chart on the web but most of the images he found was some hand-drawn image which wasn't fit into his site look-and-feel. So decided to vectorize one image, it shouldn't be a hard task ...
After some research I end up here:  http://scikit-image.org/docs/dev/auto_examples/ and I succeeded to get vectorized outlines in an hour. Lets go through it step-by-step:

0. This is an image with teeth I wanted to get in vectorized format, each tooth separately:


I have anaconda on my windows machine but if you have python with the general science tools (numpy, matplotlib, skimage, skipy) this code should work for you.

1. Loading the image from the file. With imread we get a 3D numpy array. In the 3rd dimension are the RGB values:


2. In the samples the algorithms where used on grayscale images, so firstr I had to convert the image to grayscale. This means I get a 2D array from the 3D one.

If it's needed the image can be croped simply with array slice (e.g.: cropedimg = gimg[330:480, 50:480]) and with matplotlib the image can be displayed any time just calling imshow(gimg).

3. Detecting the contours with skimage.

As result we get an array containing the vector representation of all found contour lines separately. Let's display the results:

4. For me the contour line was too detailed and rough so I wanted to have a more schematic result. With the tolerance parameter it's possible to set how detailed is the approximation. Finally we print an original and an approximated contour.