Piwik logo

Piwik, a free open-source analytics tool

I have been using Piwik for a couple of months and I love it.

I started using it first at my internship at leadelephant.com, not really as an analytics tool, but more as a data collection tool.

The main goal of Piwik is an analytics tool, a direct competitor to Google Analytics. But it’s really more than that. Like the data collection method I just mentioned, it could also be a  community monitoring tool, which is how we use it at Habbies.nl.

Piwik is a JavaScript tracking snippet backed by a PHP dashboard and MySQL backend. Installation is really simple and it works out of the box.
If you’re site is having 1000 visitors a day it’s a good idea to read through the Optimization docs. If you’re using Cloudflare, you might want to tweak the config file to start using X-FORWARDED-FOR headers.
Other than that it’s just a quick install and you’re done!

Piwik underwent a big redesign in 2014 and is looking quite good. (It used to look a little dated!)

Piwik dashboard

Piwik dashboard in 2014

I run both Google Analytics and Piwik. I use Piwik daily, to check up on real-time visitors and trends. I use Google Analytics for end-of-the-month kind of actions, looking at the aggregated data. There are a lot of blogposts already comparing GA with Piwik, but as you can see, you don’t have to pick one.

You can use both!

So take 5 minutes out of your day and download and install Piwik.

Akismet doesn’t cut it anymore, utilize Cloudflare against spam

Spam queue in WordPress

Spam queue in WordPress

I have been noticing a severe drop in spam in my comment queue. I used to have to delete anywhere from 5 to 10 comments a day in the queue with only Akismet. Now there’s not one comment.. 99.99% of the time.

I didn’t really think anything of it until I just realized why this change occurred; I had enabled Cloudflare for my blog.

Cloudflare saving requests and thus MBs

Cloudflare saving requests and thus MBs

Cloudflare acts as somewhat of a firewall and CDN in one, it blocks malicious requests and serves cached versions of your blog/site/resources. For this particular blog the stats aren’t that overwhelming, but for one of my other sites we’re talking about multiple gigabytes!

The Cloudflare plugin for WordPress doesn’t only adjust the incoming IPs (they have to change it to the X-FORWARDED-FOR IP, because my webserver will otherwise only see the Cloudflare IPs) but it also will tag any comment (and accompanying meta-data) as a spammer in the CF system.

So with my efforts and those of the other thousand of Cloudflare users combined, the spam queue is getting smaller and smaller every month. The spammers don’t even get to reach the stage where Akismet has to kick in.

SocialSpace at ESTEC in Noordwijk

A day worths of swag!

A days worth of swag!

Yesterday I went to the open day at ESTEC (European Space Research and Technology Centre) in Noordwijk, the Netherlands. This is the ESA (European Space Agency) facility that houses most of the technology research and project development. I’m not even sure what it was anymore, but they have some rooms and utilties over there that even make NASA jealous!
Continue reading

The problem(s) with Open Data

To put this blogpost into context; this is written by a 25-year old developer who uses the Dutch open data (http://data.overheid.nl) and doesn’t like to read too much. I rather hack tutorials than read specs and I code stuff for the use of the general public, i.e.: data needs to accomodate to the common people.

With that out of the way and allowing you to see this from my perspective here goes; the biggest problem(s) with Open Data.

data.overheid.nl

data.overheid.nl

I love Open Data, I love all kinds of data really, I can easily browse my phpMyAdmin page for an entire day and not get bored. I also love real-timeness and my first use of data/APIs was with the Twitter API. Can’t get it more realtime than that!
But the open data ecosystem (in the Netherlands at least) is just like a retirement home, where one zealous intern occasionally organizes bingo events and turns the place buzzing before it’s time for pudding.

Static data

As you can see I like real-time data, here’s where we see one problem; dead tree data a.k.a. static data.
Sure, it’s nice to have any data at all, but what’s the added benefit of forging this into something interactive, like an app for example? Static data is only useful for infographics, graphs or historical references.

The Dutch open data collection contains a lot of this dead weight data (sorry, I like to make up silly synonyms). Big files containing lots of numbers, which, in their appropiate usecase, can provide a lot of insight, no doubt. But it’s this category of infographics and historical references that guides this data. I’m sure that when you add geographical data it could be slightly more interesting, you can view the information on a map, but it’s still a bunch of information wether you view it in an infographic or on a map.

Data formats and delivery

There’s a big movement within the Open Knowledge sphere that we need to push for Open Formats. This is wonderful and most formats are indeed ‘open’. But what does open mean? It clearly doesn’t mean accessible.

Most of the Open Data sets at Overheid.nl that contain geo-data (which I find the best kind of data) are using this wonderful X,Y coordinate system called ‘Rijksdriehoekscoördinaten‘, a word that even for Dutch standards is long. This system is based on ancient times where we had to plot stuff from 3 points, preferably high points like churches or hills.

RD to WGS, look at all them pow()s!

RD to WGS, look at all them pow()s!

But then we got satellites and GPS and Europe even has their own Galileo project (what’s happening with that?) so coordinates became global. Truth be told, a lot of the internet innovations have been led by the United States to a point where GPS coordinates, actually called WGS 84 (oh god no, specs!), are the de-facto digits we use when it comes to geographical data. So why do we still use this RD system? The answer is probably legacy code..
The image on the right shows you how to convert RD to WGS, by bruteforcing it with pow()’s and floats. You don’t want to convert this data on the fly! Specially not in Javascript.

Another example literally just appeared. A new dataset about roadmeasurements was published, when I skimmed the text it mentioned ‘live data up to 1 minute’. Totally had to check this out. “Oh hey, they have sample data..” and I scrolled down “183MB.. umm okay.. fine”. So I downloaded it and unzipped it. There I found 1437 files (somehow 3 minutes where missing?) named “{hh:mm}_traveltime.gz”. Opened one in Notepad++ wondering what the Open Data Saint had put in my shoe (Sinterklaas reference!).. binary data. Great, Notepad++ can’t do anything with it and I have no idea how to open .gz files. Gotta love ‘open’ formats ;)

Edit: Okay, so apparently I still had to unzip them (sure, who needs documentation!) but now I have a SOAP XML file with this beautiful scheme URL: xsi:schemaLocation=”http://datex2.eu/schema/2_0/2_0 D:\NDW\CSS\DataGenerator\DATEXIISchema_2_0_2_0.xsd”. The XML structure itself does look pretty okay, I can’t quickly figure out what’s going on though, so maybe in another blogpost..

So if you have to pick a format, please make it so that it can be read in a text editor. JSON, XML, comma-separated, tab-separated.. anything really.

Delivery
So there are some annoying formats, that while open are not accessible at all. But how does this get delivered?
For any person who’s native tongue is not Dutch, it’s impossible to use any of the data. A lot of open data sets are behind websites, weird linkstructures, deeply nested Dutch named folders FTP servers and whatever horrible solution you can think of.

I haven’t checked everything out, but I don’t think there’s any dataset that comes with a RESTful API. But that was to be expected, since most data is static.

Wiegtotweg.nl (From cradle to road)

Wieg tot weg(From cradle to road)

I do have actively used 3 datasets though; meteorological data by the KNMI (weather service), sensory data from Rijkswaterstaat (agency for infrastructure and environment) and car registration data from RDW.
For that last one I made “Wieg tot Weg” when they first opened their data to the public along with a competition, which I became 3rd in! They actually used Microsoft’s Azure cloudplatform to have some kind of API, not properly RESTful though.
For the first one I wrote what I call an “API”, but it’s not really, just see for yourself.  And at the moment I’m making a library for the sensory data from RWS that combines some of the files (.dat and .adm. Wtf?) that they put in their zip-file. The library  is at an hacked stage; yes it works, but no it won’t win “Best code ever”-awards.

If you look at the class I wrote, you can see through what kind of hoops I have to jump to collect and combine the data. First 1) download a .zip, 2) unzip it, 3) collect the .adm and .dat files, 4) explode() them on newlines, 5) then per line parse them as comma-seperated values, 6) also trim the data values (whitespace everywhere!), then when both files are as arrays in memory, 7) combine them on their linenumber (so fragile!) and 8) format them a bit.

At the moment I still haven’t incorporated the locations XML file, this file does however contain GPS coords instead of RD, so a plus point there!

But the point here, at least now ‘normal’ people can use the data, you don’t need to have a PhD in Fluid Mechanics (yeah, I actually had to look this stuff up to decipher the dataset) and use the tools of the trade, some obscure Visual Basic program that only runs on governmental supplied Windows XP machines.

Bureaucracy

Which brings me to the last point, for which I’m not sure if this is the right heading title, but we all hate the bureacracy so it will do!

What I mean here, is the fact you have to open this data in ancient programs, which are used in the governmental industry and while they probably work, it’s not what’s being used in tech. Just like the RD coordinates I already mentioned, there are other bumps in the open data road. A favourite format/extension (I don’t even know exactly what it is!) is ‘GIS’. Which is ‘open’ of course. But you need this program OpenGIS or ArcGis or VB6XPOpenGovSuperExclusiveGIS, okay maybe that last one won’t work but I hope you catch my drift.

Why not JSON or XML with GPS coordinates? Or be totally hip and provide them as GeoJSON so they instantly work in Github.
Ask yourself “Who am I opening up this data for?”, if the answer is your colleague in another department or ministry, then carry on. If not, hire some adviser and ‘youthify’ your data.

How to fix Open Data

Fixing open data

Fixing open data

This surely isn’t a checklist, just merely some of my ideas and I hope some discussion material for whoever (that damn bureacracy!) is in charge.

Note that when I say ‘fix’, I mean make it accessible to all the young hipster programmers out there that can actually skyrocket the open data initiative, open it up to the broader public and have open data interest and passion be something that could end up in the Maslow pyramid. Or at least the public agenda.

  • Open AND accessible formats (XML, JSON, CSV, TSV)
  • Cut the bureaucracy, show code (Use Github, it’s hip!)
  • Focus on live data (that’s the most interesting, sorry infographics!)

Even when I’m done with my water sensory library and finding no usecase to use it and having that licensenumber app which is just a gimmick, I know that one day the ideas will be in overflow and we could use all the data we need. That’s why I think it’s important to make sure, at this point in time, that we open up more (preferably live) data and have them formatted and delivered in a way that the tech industry wants them.

P.s. I do like infographics, I just needed a blacksheep for this blogpost!
P.p.s. This once was a concrete idea but while writing it became a big blur of stuff I wanted to say, Congratulations if you read through all of it!