software engineering

Catching up on dependencies

Are you someone who wants to keep project’s dependencies up-to-date? Well, I am. Reasoning behind this is that when you don’t and wait to long to upgrade, their will come a time when the cost to upgrade is to high and you’re stuck with a legacy project.

I’ve found the perfect service to keep me posted on all the updates on my favorite projects. It’s called Bintray. Although Bintray is mainly focused on software publishers it has great features for content consumers as well. It is also the home of the popular jcenter repository.

A hidden gem for consumers is the watch feature. Since I found this, I started watching all my most used packages that I have dependencies on. At the time Cloud Dataflow was in beta every few weeks they published a new SDK tight to the service. All I had to do was put a watch on the package page ( google-cloud-dataflow-java-sdk-all package ) and every time a new SDK was published I got a notification by mail.

The service is free for consumers, as well for publishers of open source projects.

Standard
embeeded, software engineering

Building an Eddystone™ beacon

Vacation, hot sun, swimming pool, relaxing walks, no distractions… what better time todo some hacking. And I know just the project: an Eddystone beacon. I was looking for a cheap Eddystone™ beacon but most of the current vendors do have a tight coupling to their proprietary cloud platform and are damn expensive for one beacon, certainly with shipping. But luckily the Eddystone protocol is completely open and license free and I had a BLE board lying around anyway, so I could start making my own.

Eddystone LogoBut what is Eddystone™ and why am I so excited about it? Well Eddystone is a protocol specification that defines a Bluetooth low energy (BLE) message format for proximity beacon messages. Think of it as an iBeacon™ on steroids, but open. All the scenarios where you think an iBeacon (with it’s unique id per beacon) would be applicable an Eddystone beacon would fit. But it adds 2 important features and those are discoverability and fleet management.

Without discoverability a classic beacon without an app installed on your phone is just a meaningless id that is being broadcasted to it’s surroundings. The Eddystone URL message adds discoverability by sending the URL of the service that’s associated with the beacon.

The second addition is the telemetry message. That message contains information about the health of the beacon itself, like temperature, battery voltage and uptime. With this information you can manage all the beacons that you have deployed knowing when to replace the battery or the beacon in your fleet.

This is only the beginning, I see a lot more message types being added to the spec in the future. For more information go read on the spec at https://github.com/google/eddystone.

That was the spec, now on to the implementation. The first thing you need is a Bluetooth developers board. After playing around with BLE chips from other vendors I now swear with the nRF5x System-on-a-Chip series from Nordic Semiconductors. Not only are they the most powerful BLE chips, they also have great developer support with good documentation, a great stack overflow style forum and dev-boards. Important for me is the support for the gcc compiler with the soft drivers. Soft drivers are full BLE stacks in software that you flash on the chip. The support of gcc makes it possible to start programming without spending a fortune on a bad embedded IDE. Now I can use Jetbrains CLion (IntelliJ style IDE for C/C++) for writing my code and build my program with the gcc toolchain.

The board I used was the nRF51-DK kit. It’s a full board build around the nRF51 and a SEGGER-debugger chip. It also has Arduino Shield support having instant access to a lot of Arduino compatible hardware and it comes with the usual buttons and leds. If you want to play with Bluetooth Low Energy this is the board to get. As soon as the nRF52-DK (the successor) is out of preview I certainly order that kit as well. The nRF52 is even more powerful and energy efficient and adds NFC support.

nRF51-DK

With the board, the SDK and my IDE I could start building the Eddystone implementation. I started with the example of a classic beacon included in the nRF51’s SDK. In a few hours I had the UID frame implemented. Because the folks for Nordic Semi already had support for Eddystone frames in one of their Android debugging apps (nRF Master Control) I could validate the correctness of the frame. Once I got the UID frame, it was trivial to implement the URL and TLM (telemetry) frames. Having the frames it was just a matter of interleaving the messages at the correct interval. The code is actually only a few hundred lines of code. You can find the implementation at https://github.com/alex…/eddystone_nrf5x_beacon.

Now I have my own implementation I can start experimenting on Android and I’m ready for all the Eddystone frames that will come out in the future. No need to wait for a vendor to implement them.

Standard
big data, software engineering

Luigi and Google Cloud in production – retrospective

We’ve been running Luigi now in production for 3 weeks without any issues, so I thought it was time to share the code I wrote to link luigi with Google Cloud (see the previous article). I have to warn you though, it’s a first iteration and it’s far from perfect, still it will give you an idea of how to start linking the two product. I thought I would have more time to clean it up. Anyway, fetch it here: github/luigiext-gcloud

The tasks concentrate on using BigQuery because for storage, we can get by using the Google Cloud Storage connector for HDFS. But we’re planning to add some GCS tasks this sprint to speed up some of the processes.

Luigi_CPU

I also wanted to show you our CPU graph of our nightly tasks as shown in the cloud dashboard. Till now most of the Hadoop MR tasks are written in Pig and started sequentially in a batch process. You can imagine that this all but optimal. Writing disaster recovery in batch is almost impossible. You see it clearly in the left part of the graph: Only a small slice of about 45 minutes the cluster is being used optimal (by 1 big pig script), the rest of the time most of the CPU’s are idle because the tasks where not big enough.

But if you look at the right side of the graph, you see that cluster is being use more optimal, reaching almost 100% CPU usage of the 32 nodes. This was be reached by using Luigi and setting multiple workers. For our relative small cluster 6 seems like the sweet spot. Granted, you see almost no CPU usage in the beginning and that’s because the building of the complete dependency graph of all our new Luigi tasks is quite slow. But this is mainly because checking the output file is done by starting the hadoop fs command-line utility for each output. That will be solved once I’ve written the native support for Google Cloud Storage. The last dip are all the BigQuery import tasks that run at the end.

I’m a lot happier since we adapted Luigi for running our data pipeline (the new part anyway) and if you’re looking for a tool for managing all you hundreds of Hadoop jobs make sure to look at Luigi.

Standard
big data, software engineering

Using Luigi on the Google Cloud Platform

When you hear about big data you only hear about the hottest compute engines, the fastest algorithms and the sexiest visualisations. But almost nobody is talking about the plumbing that is needed to link everything together and move your data from one algorithm to the next. Strange because for me that’s as important as the algorithms you are using.

Due to this lack of information we had some setbacks and made some wrong choices in the past. We started out with LinkedIn’s Azkaban but never got that reliable enough to get it running in production. It was cumbersome to update workflows and extending it seems to be completely out of the question on the limit time we have. So we switched to the pragmatic approach and link everything together with batch and python.

But as time go’s on and more and more algorithms get implemented, it’s hard to keep up with writing script that are easy to maintain and still fault tolerant. Luckily I discovered Luigi, an open-source no-nonsense workflow manager created by Spotify. The beauty lies in it’s simplicity. Describing a Task in a workflow graph is done in python, giving you the benefit you can edit your workflow in your favorite IDE.

And important feature to look for in a workflow manager is extensibility. You will never find a product that will support all of the exotic components you use out of the box. As at Vente-Exclusive.com we’re a big consumer of Google Services (like BigQuery, Cloud Storage and GCS-Connector for Hadoop) they needed to be integrated in the Luigi workflows.

The task proved to be amazingly simple. Once you understand the internals it is a simple tasks to write a Luigi Task and Target for a service (example for Google BigQuery). It took a bit over a day to write and less then 200 lines of code. But once that’s done that, a node in your workflow is only a few lines of code. Look at the following task node:

class SEG_BQ_Day(gcloud.BqTableLoadTask):
    day = luigi.DateParameter(default=dateutils.yester_day())

    def requires(self):
        return SEG_Export4BQ_Day(self.day)

    def source(self):
        return self.day.strftime('/datasets/output/bigquery/segment/%Y/%m/%d/part*')

    def table(self):
        return "XXXXXXXX:XXXXXXX.Segment"

    def output(self):
        return gcloud.BqQueryTarget(
            table="XXXXXXXX:XXXXXXX.Segment",
            query=self.day.strftime("SELECT 0 < count(bucket_date) "
                                    "FROM [XXXXXXX.Segment] "
                                    "WHERE bucket_date = '%Y-%m-%d 00:00:00 UTC'"),
        )

It describe everything you need to load some data in BigQuery: requires tells Luigi what the dependencies are that produce the data we need to load, which in turn can have other dependencies. Output tells Luigi what to do to check if the task already ran or what output it produces, here it returns on of our custom targets that queries BigQuery.

If we have the required dependencies and output tells that the data is not already loaded the task will load the source into the table. If we try to run the task again with the same parameters, output will say the data is already loaded in out BigQuery table and will not run again. Small and simple, but a lot of small blocks can quickly build up to a big graph.

Luigi Task Visualiser

So, if you’re still searching for a way to plumb all your algorithms together, be it Hadoop MR, Spark, BigQuery, DataFlow, DIY stuff… be sure to check out Spotify’s Luigi.

Oh, and I got the approval to open source the code we’ve written for integrating with the Google Services so you can expect the code on github in the near future. I’ll keep you posted.

Standard
software engineering

Devoxx 2014: Looking back

NFC readers

Only part of the NFC readers used

It’s been a week since I’m back from Devoxx 2014. It’s always fun, but it’s exhausting. It’s my 3th year as a steering member now and it’s still hard work. We’re all a bunch of crazy people and every year we dedicate a lot of our free time to organise the best community driven conference in Europe. This year it was fun with the Bluetooth LE beacon game. I do regret that I couldn’t dedicate a lot of time helping out though, but other things had to be done.

This year Google was kind enough to lend use 30 Chromebooks. So one of the tasks at hand was port the partner app to Chrome. Last year we got help from engineers from Google to get us started with NFC on ChromeOS and it was a fun interaction ( I even got to go to Google France to do some pair programming with “François Beaufort” ). But this year we needed to do it alone, while last years NFC library was conference driven and has a lot of shortcuts I wanted to refactor it so it would be easier to extend to other NFC readers and tag technologies. So I spend about 6 months dedicating a lot of evenings to the refactor.Although it’s not quite ready for prime time yet I’m quite happy with the result. The partner app using the library proved that it worked. We didn’t have any issues worth mentioning.

Me on the big screen

My on a big kinepolis screen

A surprisingly fun experience was my guest appearance at Alexis’s talk about NFC at Google I/O. As Francois could note come I had to jump in. I was surprised how calm I was compared with how nerveus I was 2 years ago. I think I will try to give a few more full talks this year.

Well after a week of rest, it’s time to prepare the Chrome NFC library so it can be pushed it back to the main chrome repository.

Standard
software engineering

NFC your Google I/O 2014 badge, part 1

Oh boy. Am I the only one feeling the “Post-GoogleIO-Depression”, better known as Jet-Lag? I do know a good remedy though, some hacking: And what better to hack then the Google I/O NFC badge.

I/O badgeIn general NFC programming is hard to start with because most people don’t have a tag to their disposal. But now the people from Google where so nice to leave some of the tag’s empty space writable, giving more that 5000 people access to a tag to experiment with. I saw this as a good opportunity to start a mini-series educating the I/O masses some “Near Field Communication” development. Didn’t attend I/O 2014, don’t disappear, you still can order some blank “Ultralight C” tag and do the same exercises.

In this first post we’ll start simple with a quick look at the content of the NFC tag. After that we use the Android API’s for a simple sector read and write (+1 our badge), simulating a counter on the tag. In later posts we’ll dive deeper into the details. The best tool to examine the content of an NFC tag is NFC TagInfo by NXP. If you use it to read the tag you’ll notice a lot of stuff about the tag. For now we’ll just look at the tag as just a block of memory. Here’s a part of the content of my badge:

# Memory content:
[00] * 04:71:B5 48 (UID0-UID2, BCC0)
[01] * 2A:0A:28:80 (UID3-UID6)
[02] * 88 48 FF FF (BCC1, INT, LOCK0-LOCK1)
[03] * E1:11:12:00 (OTP0-OTP3)
[04] * 01 03 A0 10 |....|
[05] * 44 03 5B 91 |D.[.|
[06] * 01 26 55 04 |.&U.|
[07] * 70 6C 75 73 |plus|
...
[25] . 00 00 00 00 |....|
[26] . 00 00 00 00 |....|
[27] . 00 00 00 00 |....|
[28] . 00 00 -- -- (LOCK2-LOCK3)
[29] . 00 00 -- -- (CNT0-CNT1, value: 0)
[2A] . 30 -- -- -- (AUTH0)
[2B] . 00 -- -- -- (AUTH1)
[2C] .- 42 52 45 41 |BREA|
[2D] .- 4B 4D 45 49 |KMEI|
[2E] .- 46 59 4F 55 |FYOU|
[2F] .- 43 41 4E 21 |CAN!|

The tag is split in 2F (47) pages of each 4 bytes, making a total of 188 bytes. But don’t be fooled, you don’t have access to all of those pages for storing data. Only page 0x04 – 0x27 are available for writing arbitrary data. The other pages have other purposes that I’ll explain in a later article.

Now it’s time to start a bit of programming, a the small tag +1 application. The sample is available on GitHub. Let use one of the pages to read an integer (4 bytes) value and add 1 and write it back. A page at the back of the tag is used, one that is still writable and will not contain anything: here page 0x26.

First you need the NFC permission set in your Manifest

<uses-permission android:name="android.permission.NFC" />

and in the onCreate get a hold of the NFC service and manager.

nfcManager = (NfcManager) getSystemService(NFC_SERVICE);
nfcAdapter = nfcManager.getDefaultAdapter();

We want to “auto +1” the badge as soon as you touch the tag with your device. For this we need to setup a PendingIntent and some Intent filters. For simplicity we’re only interested in detecting any tag. So we only need to Filter on NfcAdapter.ACTION_TAG_DISCOVERED. Do this once in the onCreate.

private void setupNfcFilters() {
  mPendingIntent = PendingIntent.getActivity(this, 0, new Intent(this, getClass()).addFlags(Intent.FLAG_ACTIVITY_SINGLE_TOP), 0);

  IntentFilter tag = new IntentFilter(NfcAdapter.ACTION_TAG_DISCOVERED);
  mFilters = new IntentFilter[]{tag};
}

Only listen to a tag when the activity is in the foreground. We do this by initialising the foreground dispatcher with the filters we created. We do this in the onResume, but don’t forget to close the foreground dispatcher once your activity pauses.

public void onResume() {
  super.onResume();
  nfcAdapter.enableForegroundDispatch(this, mPendingIntent, mFilters, null);
}

Now comes the surprisingly simple part. Once a tag is detected by the device it will call the onNewIntent method on your activity. The Intent contains the Tag object in the bundle. The Tag object is the way you interact with the tag, although you need to create a technology specific tag object. We know that the Google I/O badge is a Mifare Ultralight C tag so we create the most compatible tech object Android has: MifareUltralight. Enough for what we need now.

public void onNewIntent(Intent intent) {
  Bundle bundle = intent.getExtras();
  final Tag tag = (Tag) bundle.get("android.nfc.extra.TAG");

  MifareUltralight ultraC = MifareUltralight.get(tag);
  ultraC.connect();
  byte[] buffer = ultraC.readPages(0x26);
  ultraC.writePage(0x26, plusOne(buffer));
  ultraC.close();
}

If you have your tech object, you need to connect to the tag. This creates a physical communication channel with the tag and now it’s possible to start reading. We planned to read page 0x26 and then write the value +1 on the same page. Calling the readPages does read 4 pages in one go though, as an optimisation. So for the +1 we just take the first 4 bytes and write it back. Only thing that’s let to do is close the tag.

That’s it for now. Next time we’ll learn a bit about the NDEF messages that are written on the badge. Happy NFC’ing and get the code at GitHub.

Standard
devops

Trouble with the clock

You know what they say, time is money. But for us engineers, time needs to be correct. When it isn’t you lose a lot of it and thus money. Trouble with the time can even pop-up in unexpected places: Your lab.

The lab is a playground for software engineers where they get things done. Viewed by corporate IT as a necessary evil and block from the rest of the world. IT will generally give zero to no support for machines in the lab, so the engineers are left maintaining the machines themselves. Although they are creative, they are not the best operators.

The last thing I came across in our lab was trouble with the clock. The last months I noticed, when we had rapid build/test cycles, the latest build was not picked up by our machines that we’re running the tests on. In general it was OK, but when you needed it the most it sometimes failed to pick up the most recent build.

We have a fairly complex setup with different servers and agents. We have our Bamboo CI-server pushing our artifacts to the repository. On our build agents will let Maven poll the repository for the latest artifacts. The Maven Parent POM’s on our build agents are slightly different from the POM’s that the developers have on their machines. The agent POM’s allow getting SNAPSHOT build so we can move our not-released artifacts to our agents, for further processing and testing.

Over the past months the problem got worse so it was time to start investigating. The problem was quickly found: clock drift. Because our agents where in an IP range that had no access to the internet the default time sync servers where not accessible. In about half a year we had drift of about 30 minutes. So if we had a build/test cycle in that 30 minute slot we where testing an older release. Lets go over what happened:

Cycle 1:

  • Bamboo Server (14:30) -> Repo (14:30)
  • Agent (14:35 + 30m drift) asks for new artifact, my last is from 9:12?
  • Repo -> I got a newer file (14:30)
  • Agent downloads, and saves to disk (14:35 + 30m drift = 15:05)

Cycle 2:

  1. Bamboo Server (14:45) -> Repo (14:45)
  2. Agent (14:50 + 30m drift) asks for new artifact, my last is from 15:05?
  3. Repo -> My file is older, it’s from (14:45)
  4. Agent uses the previous older version

The biggest problem was that this went unnoticed for a long time, because the problem only occurred when a new cycle is started on the same agent (we have multiple agents) within the slot created by the clock drift.

The solution is simple though, use one of your server that have access to the standard time-server and use that as a delegate for the sync requests.

# in not yet installed
sudo yum -y install ntp
# sync the time from your internal NTP server
sudo /usr/sbin/ntpdate -v 192.168.42.42
# edit the ntp config and start/restart the NTP deamon
sudo vim /etc/ntp.conf # set server to 192.168.42.42
sudo /etc/init.d/ntpd start
sudo /sbin/chkconfig ntpd on

Conclusion

Making sure that the clocks of your machines are in sync is not only a matter for the production server. It’s also important in your lab. It would be easier if IT didn’t disconnect the lab from the rest of the world, but this is just reality. We now added a new item on our agent build checklist: Make sure the machines a synced automatically with a sync server it can access. Could be an interesting feature for our Atlassian Bamboo CI server though: Send a notification to the admins that the agents have clock drift.

Standard