Planet LILUG

April 10, 2014

Justin Dearing

The case for open sourcing the SQL Saturday Website

My name is Justin Dearing. I write software for a living. I also write software for free as hobby and for personal development. When I’m not writing code, I speak at user groups, events and conferences about code and code related topics. Once such event is SQL Saturday. I haven’t spoken in a while because I became a dad in June. However, my daughter is 9 months old now and the weather is warm. I feel comfortable attending a regional SQL Saturday or two.

So last night I submitted to SQL Saturday Philadelphia. The submission process (I mean the mechanical process of using the website to submit my abstract) was annoying, as usual. What really got me going though was when I realized two things:

  • My newlines were not being preserved so that my asterisks that were supposed to punctuate bullet points were not at the beginnings of lines.
  • I could not edit my submission once submitted.

I like bullet points, a lot. However, I digress. In response to my anger, I complained on twitter that the site should be open sourced, so I the end user could create a better experience for myself and my fellow SQL Saturday Speakers.

I got three retweets. At least I wasn’t completely alone in my sentiment. I complained again in the morning, started a conversation and eventually Tim sent this out this:

So the site was being rewritten, but it would not be open sourced.

Should I have been happy at that point, or at least patiently await the changes? One could presume that session editing and submission would be improved. At the very least, things would get progressively better as there were revisions to the code. If the federal government could pull off the ObamaCare site, with some hiccups, why can’t a group of DBAs launch a much smaller website, with much simpler requirements and lower load?

I’d be willing to bet they will. I’d be willing to bet that this site will suck a lot less than the old site, and that it will continue to progress. I’m sure smart people are working on it, and a passionate BoD are guiding the process. At the very least I’ll withhold judgement until the new site is live.

Despite my confidence in the skills of the unknown (to me) parties working on the site, there are so many hours in the day and only so many things a team of finite size can do. However, a sizable minority of PASS’s membership are .NET developers. Many of them speak at SQL Saturdays. They have to submit to the site. Some of them will no doubt be annoyed at some aspect of the site. Some of them might fix that annoyance, or scratch their itch in OSS parlance, if the site was open source and there was a process to accept pull requests.

I’m not describing a hypothetical nirvana. I’ve seen the process I describe work because I’m submitted a lot of patches to a lot of OSS projects. I’ve submitted a patch to the (not actually open source, as Brent will be the first to state) sp_blitz and Brent accepted it. I’ve contributed to NancyFX. I once contributed a small patch to PHP to make it consume WCF services better. I’ve contributed to several other OSS projects as well.

Perhaps your saying SQL Server is a Microsoft product, not some hippie Linux thing. Perhaps you share the same sentiment as Noel McKinney:

However, as I pointed out to Noel, the mothership’s (i.e, Microsoft’s Editors Note: Noel has stated to me he meant Microsoft) beliefs are not anti OSS. Microsoft has fully embraced Open Source. You can become an MVP purely for OSS without any speaking or forum contributions. One of the authors of NancyFX is an example of such a recipient. F#, ASP.NET and Entity Framework are all open source. Just this week Microsoft Open Sourced Roslyn. As a matter of fact I’ve even submitted a patch to the nuget gallery website, which is operated by Microsoft and owned by the OuterCurve foundation. The patch was accepted and my code, along with the code of others was pushed to nuget.org. So I’ve already submitted source code for a website owned and operated by an independent organization  setup by Microsoft, they’ve already accepted it, and the world seems a slightly better place as a result.

So I ask the PASS BoD to consider releasing the SQL Saturday Website source code on github, and I ask the members of PASS to ask their BoD to release the source code as well.

by Justin at April 10, 2014 03:04 AM

April 07, 2014

Josef "Jeff" Sipek

Happy 50th, System/360

It’s been a while since I blahged about mainframes. Rest assured, I’m still a huge fan, I’m just preoccupied with other things to continuously extoll their virtues.

The reason I’m writing today is because it is the 50th anniversary of the System/360 announcement. Aside from the “50 years already?” sentiment, I have a couple of images to share. (I found these several years ago on someone’s GeoCities site. It’s a good thing I made a mirror :) )

I also came across this video from 1964:

by JeffPC at April 07, 2014 03:44 PM

March 24, 2014

Josef "Jeff" Sipek

Netflix Chaos Monkey

Somehow, I managed to miss that about two years ago Netflix open sourced their chaos monkey.

Based on my quick look over the code, it appears to be written in Java. Meh. Regardless of the language, it’s great to see large companies open source their code.

by JeffPC at March 24, 2014 07:18 PM

March 02, 2014

Josef "Jeff" Sipek

Comment Spam Filtering Experiments

Just a heads up, I’m getting fed up with all the comment spam that ends up on the moderation queue. So, I’m working on some code to reject comment spam before it hits it. As the title for this post implies, these are experiments; I’ll try my best not to reject any valid comments. I appologize if a valid comment does get rejected.

If you end up being a victim of my overzealous filters, please email me: jeffpc@josefsipek.net.

by JeffPC at March 02, 2014 03:33 PM

February 22, 2014

Josef "Jeff" Sipek

Greetings from Nexenta

In case you missed it, back in mid-2011 I discovered Illumos and OpenIndiana. At that point, I already missed hacking on the (Linux) kernel. Based on my blahg posts [1,2], it shouldn’t surprise you that it didn’t take long before I wanted to hack on the Illumos kernel…and so I did.

If you ever contributed to an open source project in your free time while employed full-time, you understand that there’s only so much time you can devote to the open source project and therefore there is only so much you can do.

A couple of months ago, I decided to explore the possibility of working full-time on Illumos. There are only a handful of companies that visibly participate in the Illumos ecosystem, but their use of Illumos is pretty varied (from public clouds to virtualized databases to SAN/NAS appliances). As of this past Tuesday (Monday was a holiday), I’m at Nexenta. At least for now, I’m working remotely (from Ann Arbor) with the fine folks in the Wikipedia article: Lowell office. It feels great to work on open source again.

by JeffPC at February 22, 2014 06:11 PM

February 16, 2014

Nate Berry

Arch Linux on bootable, persistent USB drive

ComputerLinux
I recently got a new laptop from work. Its a refurbished Dell Latitude E6330 with an Intel Core i5 processor, a 13″ screen and a 120GB SSD drive that came with Windows 7 Pro. I haven’t used Windows regularly in quite some time (I’ve been using a WinXP VM on the rare occassion I need […]

by Nate at February 16, 2014 08:07 PM

Justin Dearing

Creating a minimally viable CentOS OpenLogic rapache instance

Recently I’ve been dealing with R and rapache at work. R is a language for statisticians. rapache is an apache module for executing R scripts in apache. Its like mod_perl or mod_php for R. I’ve been writing simple RESTful scripts that return graphics and JSON, and calling them from static html pages. I’ve been also using my MSDN Azure subscription to engage in R self study at home. In the spirit of my last post, I’ve posted the setup notes here to get you stated with a new Azure VM for running an rapache instance. Azure used a special cloud enabled version fo CentoS 6.3 called OpenLogic. However, it seems to work similarly to the vanilla CentoOS 6.4 instances I’ve used at work. So everything should apply there. If something doesn’t work leave a comment.

  • First, CentOS is very conservative, but Fedora makes EPEL to give you a more modern set of RPMs
    • rpm -Uvh http://epel.mirror.freedomvoice.com/6/i386/epel-release-6-8.noarch.rpm
  • Now lets install the packages we need. The kernel will be updated, so we will need to reboot.
    • yum update -y
    • yum install -y vim-x11 vim-enhanced xauth R terminator xterm rxvt R httpd git httpd-devel gcc cairo cairo-devel libXt-devel
    • yum groupinstall -y fonts
    • ldconfig
    • shutdown -r now
  • Now as a regular user lets compile rapache.
    • mkdir ~/src
    • cd ~/src
    • git checkout https://github.com/jeffreyhorner/rapache.git
    • cd rapache
    • ./configure && make && sudo make install
  • Now lets configure rapache. Create a file called /etc/httpd/conf.d/rapache.conf with the following:
# rapache configuration by Justin Dearing <zippy1981@gmail.com>
LoadModule R_module modules/mod_R.so
<Location /RApacheInfo>
 SetHandler r-info
</Location>
AddHandler r-script .R
RHandler sys.source
  • Now restart apache.  Make sure it’t working by running 
    elinks http://localhost/RApacheInfo.

Azure doesn’t configure swap space by default. You’re going to absolutely need some swap space if you’re using an extra small instance. A good howto for that is here.

by Justin at February 16, 2014 05:52 AM

February 15, 2014

Justin Lintz

Pager Huety

For a hack week project at Chartbeat, I hooked my Philip’s Hue light bulbs into PagerDuty so whenever I get paged my lights will start flashing. Read about the hack over on the PagerDuty blog

by justin at February 15, 2014 06:52 PM

January 07, 2014

Josef "Jeff" Sipek

Google Traffic

Ever wonder how Google gets its traffic information?

traffic

Apparently, there are two sources. The first is the Department of Transportation. The second consists of Android users.

You can always check Google Location History to see what sort of data Google has. (Of course, they may always have more than they show.) Seeing the data can be a bit unnerving. Since I’m not really into giving Google more data than they already have to begin with, and I see no reason for Google to know exactly where I spend my time, I decided to turn this feature off.

Turning it off

You can find the setting by running the “Google Settings” app. That’s right, not “Settings”. Once there, select “Location”.

traffic

As you can see, I want to treat Google apps like any other vendor’s apps. As an added bonus, it looks like my GPS is on way less often.

by JeffPC at January 07, 2014 06:22 PM

January 05, 2014

Josef "Jeff" Sipek

x2APIC, IOMMU, Illumos

About a week ago, I hinted at a boot hang I was debugging. I’ve made some progress with it, and along the way I found some interesting things about which I’ll blog over the next few days. Today, I’m going to talk about the Wikipedia article: APIC, xAPIC, and Wikipedia article: x2APIC and how they’re handled in Illumos.

APIC, xAPIC, x2APIC

I strongly suggest you become at least a little familiar with APIC architecture before reading on. The Wikipedia articles above are a good start.

First things first, we need some definitions. APIC can refer to either the architecture or to very old (pre-Pentium 4) implementation. Since I’m working with a Sandy Bridge, I’m going to use APIC to refer to the architecture and completely ignore that these chips existed. Everything they do is a subset of xAPIC. xAPIC is an extension to APIC. xAPIC chips started showed up in NetBurst architecture Intel CPUs (i.e., Pentium 4). xAPIC included some goodies such as upping the limit on the number of CPUs to 256 (from 16). x2APIC is an extension to xAPIC. x2APIC chips started appearing around the same time Sandy Bridge systems started showing up. It is a major update to how interrupts are handled, but as with many things in the PC industry the x2APIC is fully backwards compatible with xAPICs. x2APIC includes some goodies such as upping the limit on the number of CPUs to $2^{32}$.

Regardless of which exact flavor you happen to use, you will find two components: the local APIC and I/O APIC. Each processor gets their own local APIC and I/O buses get I/O APICs. I/O APICs can service more than one device, and in fact many systems have only one I/O APIC.

The xAPIC uses Wikipedia article: MMIO to program the local and I/O APICs.

x2APIC has two mode of operation. First, there is the xAPIC compatibility mode which makes the x2APIC behave just like an xAPIC. This mode doesn’t give you all the new bells and whistles. Second, there is the new x2APIC mode. In this mode, the APIC is programmed using Wikipedia article: MSRs.

One interesting fact about x2APIC is that it requires an Wikipedia article: iommu. My Sandy Bridge laptop has an Intel iommu as part of the VT-d feature.

Illumos /etc/mach

x2APIC in Illumos has two APIC drivers. First, there is pcplusmp which knows how to handle APIC and xAPIC. Second, there is apix which targets x2APIC, but knows how to operate it in both modes. On boot, the kernel consults /etc/mach to get a list of machine specific modules to try to load. Currently, the default contents (trimmed for display here) are:

#
# CAUTION!  The order of modules specified here is very important. If the
# order is not correct it can result in unexpected system behavior. The
# loading of modules is in the reverse order specified here (i.e. the last
# entry is loaded first and the first entry loaded last).
#
pcplusmp
apix
xpv_psm

Since I’m not running Xen, xpv_psm will fail to load, and apix gets its chance to load.

pcplusmp + apix Code Sharing

The code in these two modules can be summarized with a word: mess. Following what happens when would be enough of an adventure. The code for the two modules lives in four directories: usr/src/uts/i86pc/io, usr/src/uts/i86pc/io/psm, usr/src/uts/i86pc/io/pcplusmp, and usr/src/uts/i86pc/io/apix. But the sharing isn’t as straight forward as one would hope.

Directory pcplusmp apix
i86pc/io mp_platform_common.c, mp_platform_misc.c, hpet_acpi.c mp_platform_common.c, hpet_acpi.c
i86pc/io/psm psm_common.c psm_common.c
i86pc/io/pcplusmp * apic_regops.c, apic_common.c, apic_timer.c
i86pc/io/apix *

This is of course not clear at all when you look at the code. (Reality is a bit messier because of the i86xpv platform which uses some of the i86pc source.)

apix_probe

When the apix module gets loaded, its probe function (apix_probe) is called. This is the place where the module decides if the hardware is worthy. Specifically, if it finds that the CPU reports x2APIC support via Wikipedia article: cpuid, it goes on to call the common APIC probe code (apic_probe_common). Unless that fails, the system will use the apix module — even if there is no iommu and therefore the x2APIC needs to operate in xAPIC mode.

What mode are you using? Easy, just check the apic_mode global in the kernel:

# echo apic_mode::whatis | mdb -k
fffffffffbd0ee4c is apic_mode, in apix's data segment
# echo apic_mode::print | mdb -k
0x2

2 (LOCAL_APIC) indicates xAPIC mode, while 3 (LOCAL_X2APIC) indicates x2APIC mode.

Because this part is as clear as mud, I made a table that tells you what module and mode to expect given your hardware, what CPUID says, and the presence and state of the iommu.

APIC hw CPUID IOMMU IOMMU state Module apic_mode
xAPIC off pcplusmp LOCAL_APIC
x2APIC off pcplusmp LOCAL_APIC
x2APIC on absent apix LOCAL_APIC
x2APIC on present off apix LOCAL_APIC
x2APIC on present on apix LOCAL_X2APIC

Defaults

I’ve never seen apic_mode equal to LOCAL_X2APIC in the wild. This was very puzzling. Yesterday, I discovered why. As I mentioned earlier, in order for the x2APIC to operate in x2APIC mode an iommu is required. Long story short, the default config that Illumos ships disables iommus on boot. Specifically:

$ cat /platform/i86pc/kernel/drv/rootnex.conf | grep -v '^\(#.*\|\)$'
immu-enable="false";

In order to get LOCAL_X2APIC mode, you need to set:

immu-enable="true";
immu-intrmap-enable="true";

Once you put those into the config file, update you boot archive and reboot. You should be set… except the iommu support in Illumos is… shall we say… poor.

(I should point out that it is possible for the BIOS to enable x2APIC mode before handing control off to the OS. This is pretty rare unless you have a really big x86 system.)

1394

It would seem that the hci1394 driver doesn’t quite know how to deal with an iommu “messing” with it’s I/Os and its interrupt service routine shuts down the driver. (On a debug build it throws is ASSERT(0) for good measure.) I just disabled 1394 in the BIOS since I don’t have any Firewire devices handy and therefore no use for the port at the moment.

immu-enable Details

In case you want to know how iommu initialization affects the apix initialization…

During boot, immu_init gets called to initialize iommus. If the config option (immu-enable) is not true, the function just returns instead of calling immu_subsystems_setup which calls immu_intrmap_setup which sets psm_vt_ops to non-NULL value.

Later on, when apix is loaded and is initializing itself in apix_picinit, it calls apic_intrmap_init. This function does nothing if psm_vt_ops are NULL.

The Hang

I might as well tell you a bit about my progress on tracking down the hang. It happens only if I’m using the apix module and I allow deep C states in the idle thread (technically, it could also be an mwait related issue since I cannot disable just mwait without disabling deep C states). It does not matter if the apic_mode is LOCAL_APIC or LOCAL_X2APIC.

Assorted Documentation

  1. Intel 64 Architecture x2APIC Specification
  2. Intel MP Spec 1.4

by JeffPC at January 05, 2014 07:38 PM

January 04, 2014

Josef "Jeff" Sipek

Post Preview

One of the blogs I’ve been reading for a few months now just had a post about partial vs. full entries on blog front pages. Since I have some opinions on the subject, I decided to comment. My response turned into something sufficiently content-full that I decided that my blahg would be a better place for it. Sorry, Chris :P

First of all, my blog doesn’t support partial post display because… technical reasons. (The sinking feeling of discovering a design mistake in your code really resonated with me about this exact thing.) With that said, I don’t think that partial display is necessarily bad. I feel like any reasonable (this is of course subjective) blogging software should follow these rules:

  1. if we’re displaying a atom/rss feed, display full post
  2. if we’re displaying a single post, display full post
  3. if the post contains magical marker that denotes where to stop the preview, display everything above the marker
  4. display full post

I really dislike when the feeds give me the first sentence and I have to click a link to read more. At the very least, it is inconvenient, and in extreeme cases it feels outright insulting.

I think the post-by-post-basis Chris suggests is the way to go, but in the absence of a user-defined division point I would display the whole thing.

Do I write many posts where I wish I could use this magical marker? No. If that were the case, I’d make supporting this a higher priority. However, there have been a handful of times where I believe that the rest of the post is uninteresting to…well…just about everyone and it is really long. So long, that you might get bored trying to scroll past it. (If you are reading my blahg, I don’t want you to be bored because you had to scroll for too long to skip over an entry — you are my guest, and I am here to entertain you.) This is the time I believe displaying a partial post is good.

I’m hoping that eventually I’ll wrestle with my blogging software sufficiently to eliminate the technical reasons preventing me from introducing and processing this special marker. Not that you’ll really notice anything different. :)

by JeffPC at January 04, 2014 03:10 PM

January 02, 2014

Josef "Jeff" Sipek

Designated Initializers

Designated initializers are a neat feature in C99 that I’ve used for about 6 years. I can’t fathom why anyone would not use them if C99 is available. (Of course if you have to support pre-C99 compilers, you’re very sad.) In case you’ve never seen them, consider this example that’s perfectly valid C99:

int abc[7] = {
	[1] = 0xabc,
	[2] = 0x12345678,
	[3] = 0x12345678,
	[4] = 0x12345678,
	[5] = 0xdef,
};

As you may have guessed, indices 1–5 will have the specified value. Indices 0 and 6 will be zero. Cool, eh?

GCC Extensions

Today I learned about a neat GNU extension in GCC to designated initializers. Consider this code snippet:

int abc[7] = {
	[1] = 0xabc,
	[2 ... 5] = 0x12345678,
	[5] = 0xdef,
};

Mind blowing, isn’t it?

Beware, however… GCC’s -std=c99 will not error out if you use ranges! You need to throw in -pedantic to get a warning.

$ gcc -c -Wall -std=c99 test.c
$ gcc -c -Wall -pedantic -std=c99 test.c
test.c:2:5: warning: ISO C forbids specifying range of elements to initialize [-pedantic]

by JeffPC at January 02, 2014 02:46 PM

December 30, 2013

Josef "Jeff" Sipek

Rebooter

I briefly mentioned that I was debugging a boot hang. Since the hang does not happen every time I try to boot, it may take a couple of reboots to get the kernel to hang. Doing this manually is tedious. Thankfully it can be scripted. Therefore, I made a simple script and a SMF manifest that runs the script at the end of boot. If the system boots fine, my script reboots it. If the system hangs mid-boot, well my script never executes leaving the system in a hung state. Then, I can break into the kernel debugger (mdb) and investigate.

I’m sharing the two here mostly for my benefit… in case one day in the future I decide that I need my system automatically rebooted over and over again.

The script is pretty simple. Hopefully, 60 seconds is long enough to log in and disable the service if necessary. (In reality, I setup a separate boot environment that’s the default choice in Grub. I can just select my normal boot environment and get back to non-timebomb system.)

#!/bin/sh

sleep 60

reboot -p

The tricky part is of course in the manifest. Not because it is hard, but because XML is … verbose.

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='rebooter'>
	<service name='site/rebooter' type='service' version='1'>
		<dependency name='booted'
		    grouping='require_all'
		    restart_on='none'
		    type='service'>
			<service_fmri
			    value='svc:/milestone/multi-user-server:default'/>
		</dependency>

		<property_group name="startd" type="framework">
			<propval name="duration" type="astring" value="child"/>
			<propval name="ignore_error" type="astring"
				value="core,signal"/>
		</property_group>

		<instance name='system' enabled='true'>
			<exec_method
				type='method'
				name='start'
				exec='/home/jeffpc/illumos/rebooter/script.sh'
				timeout_seconds='0' />

			<exec_method
				type='method'
				name='stop'
				exec=':true'
				timeout_seconds='0' />
		</instance>

		<stability value='Unstable' />
	</service>
</service_bundle>

That’s all, carry on what you were doing. :)

by JeffPC at December 30, 2013 08:44 PM

December 29, 2013

Josef "Jeff" Sipek

CPU Pause Threads

Recently, I ended up debugging a boot hang. (I’m still working on it, so I don’t have a resolution to it yet.) The hang seems to occur during the mp startup. That is, when the boot CPU tries to online all the other CPUs on the system. As a result, I spent a fair amount of time reading the code and poking around with mdb. Given the effort I put in, I decided to document my understanding of how CPUs get brought online during boot in Illumos. In this post, I’ll talk about the CPU pause threads.

Each CPU has a special thread — the pause thread. It is a very high priority thread that’s supposed to preempt everything on the CPU. If all CPUs are executing this high-priority thread, then we know for fact that nothing can possibly be dereferencing the CPU structures’ (cpu_t) pointers. Why is this useful? Here’s a comment from right above cpu_pause — the function pause threads execute:

/*
 * This routine is called to place the CPUs in a safe place so that
 * one of them can be taken off line or placed on line.  What we are
 * trying to do here is prevent a thread from traversing the list
 * of active CPUs while we are changing it or from getting placed on
 * the run queue of a CPU that has just gone off line.  We do this by
 * creating a thread with the highest possible prio for each CPU and
 * having it call this routine.  The advantage of this method is that
 * we can eliminate all checks for CPU_ACTIVE in the disp routines.
 * This makes disp faster at the expense of making p_online() slower
 * which is a good trade off.
 */

The pause thread is pointed to by the CPU structure’s cpu_pause_thread member. A new CPU does not have a pause thread until after it has been added to the list of existing CPUs. (cpu_pause_alloc does the actual allocation.)

CPU pausing is pretty strange. First of all, let’s call the CPU requesting other CPUs to pause the controlling CPU and all online CPUs that will pause the pausing CPUs. (The controlling CPU does not pause itself.) Second, there are two global structures: (1) a global array called safe_list which contains a 8-bit integer for each possible CPU where each element holds a value ranging from 0 to 4 (PAUSE_*) denoting the state of that CPU’s pause thread, and (2) cpu_pause_info which contains some additional goodies used for synchronization.

Pausing

To pause CPUs, the controlling CPU calls pause_cpus (which uses cpu_pause_start), where it iterates over all the pausing CPUs setting their safe_list entries to PAUSE_IDLE and queueing up (using setbackdq) their pause threads.

Now, just because the pause threads got queued doesn’t mean that they’ll get to execute immediately. That is why the controlling CPU then waits for each of the pause threads to up a semaphore in the cpu_pause_info structure. Once all the pause threads have upped the semaphore, the controlling CPU sets the cp_go flag to let the pause threads know that it’s time for them to go to sleep. Then the controlling CPU waits for each pause thread to signal (via the safe_list) that they have disabled just about all interrupts and that they are spinning (mach_cpu_pause). At this point, pause_cpus knows that all online CPUs are in a safe place.

Starting

Starting the CPUs back up is pretty easy. The controlling CPU just needs to set all the CPU’s safe_list to a PAUSE_IDLE. That will cause the pausing CPUs to break out of their spin-loop. Once out of the spin loop, interrupts are re-enabled and a CPU control relinquished (via swtch). The controlling CPU does some cleanup of its own, but that’s all that is to it.

Synchronization

Why not use a mutex or semaphore for everything? The problem lies in the fact that we are in a really fragile state. We don’t want to lose the CPU because we blocked on a semaphore. That’s why this code uses a custom synchronization primitives.

by JeffPC at December 29, 2013 06:45 PM

December 14, 2013

Josef "Jeff" Sipek

iSCSI boot - Success

In my previous post, I documented some steps necessary to get OpenIndiana to boot from iSCSI.

I finally managed to get it to work cleanly. So, here are the remaining details necessary to boot your OI box from iSCSI.

Installation

First, boot from one of the OI installation media. I used a USB flash drive. Then, before starting the installer, drop into a shell and connect to the target.

# iscsiadm add discovery-address 172.16.0.1
# iscsiadm modify discovery -t enable

At this point, you should have all the LUs accessible:

# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c5t600144F000000000000052A4B4CE0002d0 <SUN-COMSTAR-1.0 cyl 13052 alt 2 hd 255 sec 63>
          /scsi_vhci/disk@g600144f000000000000052a4b4ce0002
Specify disk (enter its number): 

Exit the shell and start the installer.

Now, the tricky part… When you get to the network configuration page, you must select the “None” option. Selecting “Automatically” will cause nwam to try to start on boot and it’ll step onto the already configured network interface. That’s it. Finish installation normally. Once you’re ready to reboot, either configure your network card or use iPXE as I’ve shared before.

e1000g

For the curious, here’s what the iSCSI booted (from the e1000g NIC) system looks like:

# svcs network/physical
STATE          STIME    FMRI
disabled       17:13:10 svc:/network/physical:nwam
online         17:13:15 svc:/network/physical:default
# dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
# ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
e1000g0/?         static   ok           172.16.0.179/24
lo0/v4            static   ok           127.0.0.1/8
lo0/v6            static   ok           ::1/128

nge

Does switching back to the on-board nge NICs work now? No. We still get a lovely panic:

WARNING: Cannot plumb network device 19

panic[cpu0]/thread=fffffffffbc2f400: vfs_mountroot: cannot mount root

Warning - stack not written to the dump buffer
fffffffffbc71ae0 genunix:vfs_mountroot+75 ()
fffffffffbc71b10 genunix:main+136 ()
fffffffffbc71b20 unix:_locore_start+90 ()

by JeffPC at December 14, 2013 06:55 PM

December 08, 2013

Josef "Jeff" Sipek

iSCSI boot

I decided a couple of days ago to try to see if OpenIndiana would still fail to boot from iSCSI like it did about two years ago. This post exists to remind me later what I did. If you find it helpful, great.

First, I got to set up the target. There is a bunch of documentation how to use COMSTAR to export a LU, so I won’t explain. I made a 100 GB LU.

I dug up an older system to act as my test box and disconnected its SATA disk. Booting from the OI USB image was uneventful. Before starting the installer, dropped into a shell and connected to the target (using iscsiadm). Then I installed OI onto the LU. Then, I dropped back into the shell to modify Grub’s menu.lst to use the serial port for both the Grub menu as well as make the kernel direct console output there.

Since the two on-board NICs can’t boot off iSCSI, I ended up using iPXE to boot off iSCSI. First, I made a script file:

#!ipxe

dhcp
sanboot iscsi:172.16.0.1:::0:iqn.2010-08.org.illumos:02:oi-test

Then it was time to grab the source and build it. I did run into a simple problem in a test file, so I patched it trivially.

$ git clone git://git.ipxe.org/ipxe.git
$ cd ipxe
$ cat /tmp/ipxe.patch
diff --git a/src/tests/vsprintf_test.c b/src/tests/vsprintf_test.c
index 11512ec..2231574 100644
--- a/src/tests/vsprintf_test.c
+++ b/src/tests/vsprintf_test.c
@@ -66,7 +66,7 @@ static void vsprintf_test_exec ( void ) {
 	/* Basic format specifiers */
 	snprintf_ok ( 16, "%", "%%" );
 	snprintf_ok ( 16, "ABC", "%c%c%c", 'A', 'B', 'C' );
-	snprintf_ok ( 16, "abc", "%lc%lc%lc", L'a', L'b', L'c' );
+	//snprintf_ok ( 16, "abc", "%lc%lc%lc", L'a', L'b', L'c' );
 	snprintf_ok ( 16, "Hello world", "%s %s", "Hello", "world" );
 	snprintf_ok ( 16, "Goodbye world", "%ls %s", L"Goodbye", "world" );
 	snprintf_ok ( 16, "0x1234abcd", "%p", ( ( void * ) 0x1234abcd ) );
$ patch -p1 < /tmp/ipxe.patch
$ make bin/ipxe.usb EMBED=/tmp/ipxe.script
$ sudo dd if=bin/ipxe.usb of=/dev/rdsk/c8t0d0p0 bs=1M

Now, I had a USB flash drive with iPXE that’d get a DHCP lease and then proceed to boot from my iSCSI target.

Did the system boot? Partially. iPXE did everything right — DHCP, storing the iSCSI information in the Wikipedia article: iBFT, reading from the LU and handing control over to Grub. Grub did the right thing too. Sadly, once within kernel, things didn’t quite work out the way they should.

iBFT

Was the iBFT getting parsed properly? After reading the code for a while and using mdb to examine the state, I found a convenient tunable (read: global int that can be set using the debugger) that will cause the iSCSI boot parameters to be dumped to the console. It is called iscsi_print_bootprop. Setting it to non-zero will produce nice output:

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ unix krtld genunix ]
[0]> iscsi_print_bootprop/W 1
iscsi_print_bootprop:           0               =       0x1
[0]> :c
OpenIndiana Build oi_151a7 64-bit (illumos 13815:61cf2631639d)
SunOS Release 5.11 - Copyright 1983-2010 Oracle and/or its affiliates.
All rights reserved. Use is subject to license terms.
Initiator Name : iqn.2010-04.org.ipxe:00020003-0004-0005-0006-000700080009
Local IP addr  : 172.16.0.179
Local gateway  : 172.16.0.1
Local DHCP     : 0.0.0.0
Local MAC      : 00:02:b3:a8:66:0c
Target Name    : iqn.2010-08.org.illumos:02:oi-test
Target IP      : 172.16.0.1
Target Port    : 3260
Boot LUN       : 0000-0000-0000-0000

nge vs. e1000g

So, the iBFT was getting parsed properly. The only “error” message to indicate that something was wrong was the “Cannot plumb network device 19”. Searching the code reveals that this is in the rootconf function. After more tracing, it became apparent that the kernel was trying to set up the NIC but was failing to find a device with the MAC address iBFT indicated. (19 is ENODEV)

At this point, it dawned on me that the on-board NICs are mere nge devices. I popped in a PCI-X e1000g moved the cable over and rebooted. Things got a lot farther!

unable to connect

Currently, I’m looking at this output.

NOTICE: Configuring iSCSI boot session...
NOTICE: iscsi connection(5) unable to connect to target iqn.2010-08.org.illumos:02:oi-test
Loading smf(5) service descriptions: 171/171
Hostname: oi-test
Configuring devices.
Loading smf(5) service descriptions: 6/6
NOTICE: iscsi connection(12) unable to connect to target iqn.2010-08.org.illumos:02:oi-test

The odd thing is, while these appear SMF is busy loading manifests and tracing the iSCSI traffic to the target shows that the kernel is doing a bunch of reads and writes. I suspect that all the successful I/O was done over one connection and then something happens and we lose the link. This is where I am now.

by JeffPC at December 08, 2013 04:48 PM

December 01, 2013

Josef "Jeff" Sipek

Meili upgrades

A couple of months ago, I decided to update my almost two and a half year old laptop. Twice.

First, I got more RAM. This upped it to 12 GB. While still on the low side for a box which actually gets to see some heavy usage (compiling illumos takes a couple of hours and generates a couple of GB of binaries), it was better than the 4 GB I used for way too long.

Second, I decided to bite the bullet and replaced the 320 GB disk with a 256 GB SSD (Samsung 840 Pro). Sadly, in the process I had the pleasure of reinstalling the system — both Windows 7 and OpenIndiana. Overall, the installation was uneventful as my Windows partition has no user data and my OI storage is split into two pools (one for system and one for my data).

The nice thing about reinstalling OI was getting back to a stock OI setup. A while ago, I managed to play with software packaging a bit too much and before I knew it I was using a customized fork of OI that I had no intention of maintaining. Of course, I didn’t realize this until it was too late to rollback. Oops. (Specifically, I had a custom pkg build which was incompatible with all versions OI ever released.)

One of the painful things about my messed-up-OI install was that I was running a debug build of illumos. This made some things pretty slow. One such thing was boot. The ZFS related pieces took about a minute alone to complete. The whole boot procedure took about 2.5 minutes. Currently, with a non-debug build and an SSD, my laptop goes from Grub prompt to gdm login in about 40 seconds. I realize that this is an apples to oranges comparison.

I knew SSDs were supposed to be blazing fast, but I resisted getting one for the longest time mostly due to reliability concerns. What changed my mind? I got to use a couple of SSDs in my workstation at work. I saw the performance and I figured that ZFS would take care of alerting me of any corruption. Since most of my work is version controlled, chances are that I wouldn’t lose anything. Lastly, SSDs got a fair amount of improvements over the past few years.

by JeffPC at December 01, 2013 01:30 AM

November 29, 2013

Josef "Jeff" Sipek

Biometrics

Last week I got to spend a bit of time in NYC with obiwan. He’s never been in New York, so he did the tourist thing. I got to tag along on Friday. We went to the Statue of Liberty, Ellis Island, and a pizza place.

You may have noticed that this post is titled “Biometrics,” so what’s NYC got to do with biometrics? Pretty simple. In order to get into the Statue of Liberty, you have to first surrender your bags to a locker and then you have to go through a metal detector. (This is the second time you go through a metal detector — the first is in Battery Park before you get on the boat to Liberty Island.) Once on Liberty Island, you go into a tent before the entrance where you get to leave your bags and $2. Among the maybe 500–600 lockers, there are two or three touch screen interfaces. You use these to rent a locker. After selecting the language you wish to communicate in and feeding in the money, a strobe light goes off blinding you — this is to indicate where you are supposed to place your finger to have your finger print scanned. Your desire to rent a locker aside, you want to put your finger on the scanner to make the strobe go away. Anyway, once the system is happy it pops a random (unused) locker open and tells you to use it.

What could possibly go wrong.

After visiting the statue, we got back to the tent to liberate the bags. At the same touch screen interface, we entered in the locker number and when prompted scanned the correct finger. The fingerprint did not get recognized. After repeating the process about a dozen times, it was time to talk to the people running the place about the malfunction. The person asked for the locker number, went to the same interface that we used, used what looked like a Wikipedia article: one-wire key fob near the top of the device to get an admin interface and then unlocked the locker. That’s it. No verification of if we actually owned the contents of the locker.

I suppose this is no different from a (physical) key operated locker for which you lost the key. The person in charge of renting the lockers has no way to verify your claim to the contents of the locker. Physical keys, however, are extremely durable compared to the rather finicky fingerprint scanners that won’t recognize you if you look at them the wrong way (or have oily or dirty fingers in a different way than they expect). My guess the reason the park service went with a fingerprint based solution instead of a more traditional physical key based solution is simple: people can’t lose the locker keys if you don’t use them. Now, are cheap fingerprint readers accurate enough to not malfunction like this often? Are the people supervising the locker system generally this apathetic about opening a locker without any questions? I do not know, but my observations so far are not very positive.

I suspect more expensive fingerprint readers will perform better. It just doesn’t make sense for something as cheap as a locker to use the more expensive readers.

by JeffPC at November 29, 2013 11:33 PM

November 26, 2013

Nate Berry

Increase disk size of Ubuntu guest running in VMware

Linux
A while ago I created a virtual machine (VM) under VMware 5.1 with Ubuntu as the guest OS. I wasn’t giving the task my full attention and I made a couple choices without thinking when setting this one up. The problem I ended up with is that I only allocated about 10GB to the VM […]

by Nate at November 26, 2013 03:25 AM

November 04, 2013

Eitan Adler

Two Factor Authentication for SSH (with Google Authenticator)

Two factor authentication is a method of ensuring that a user has a physical device in addition to their password when logging in to some service. This works by using a time (or counter) based code which is generated by the device and checked by the host machine. Google provides a service which allows one to use their phone as the physical device using a simple app.

This service can be easily configured and greatly increases the security of your host.

Installing Dependencies

  1. There is only one: the Google-Authenticator software itself:
    # pkg install pam_google_authenticator
  2. On older FreeBSD intallations you may use:
    # pkg_add -r pam_google_authenticator
    On Debian derived systems use:
    # apt-get install libpam-google-authenticator

User configuration

Each user must run "google-authenticator" once prior to being able to login with ssh. This will be followed by a series of yes/no prompts which are fairly self-explanatory. Note that the alternate to time-based is to use a counter. It is easy to lose track of which number you are at so most people prefer time-based.
  1. $ google-authenticator
    Do you want authentication tokens to be time-based (y/n)
    ...
    Make sure to save the URL or secret key generated here as it will be required later.

Host Configuration

To enable use of Authenticator the host must be set up to use PAM which must be configured to prompt for Authenticator.
  1. Edit the file /etc/pam.d/sshd and add the following in the "auth" section prior to pam_unix:
    auth requisite pam_google_authenticator.so
  2. Edit /etc/ssh/sshd_config and uncomment
    ChallengeResponseAuthentication yes

Reload ssh config

  1. Finally, the ssh server needs to reload its configuration:
    # service sshd reload

Configure the device

  1. Follow the instructions provided by Google to install the authentication app and setup the phone.

That is it. Try logging into your machine from a remote machine now

Thanks bcallah for proof-reading this post.

by Eitan Adler (noreply@blogger.com) at November 04, 2013 12:56 AM

October 21, 2013

Josef "Jeff" Sipek

Private Pilot, Honeymooning, etc.

Early September was a pretty busy time for me. First, I got my private pilot certificate. Then, three days later, Holly and I got married. We used this as an excuse to take four weeks off and have a nice long honeymoon in Europe (mostly in Prague).

Our flight to Prague (LKPR) had a layover at KJFK. While waiting at the gate at KDTW, I decided to talk to the pilots. They said I should stop by and say hi after we land at JFK. So I did. Holly tagged along.

A little jealous about the left seat

I am impressed with the types of displays they use. Even with direct sunlight you can easily read them.

After about a week in Prague, we rented a plane (a 1982 Cessna 172P) with an instructor and flew around Czech Republic looking at the castles.

OK-TUR

I did all the flying, but I let the instructor do all the radio work, and since he was way more familiar with the area he ended up acting sort of like a tour guide. Holly sat behind me and had a blast with the cameras. The flight took us over Wikipedia article: Bezděz, Wikipedia article: Ještěd, Wikipedia article: Bohemian Paradise, and Wikipedia article: Jičín where we stopped for tea. Then we took off again, and headed south over Wikipedia article: Konopiště, Wikipedia article: Karlštejn, and Wikipedia article: Křivoklát. Overall, I logged 3.1 hours in European airspace.

by JeffPC at October 21, 2013 03:27 PM

October 05, 2013

dotCOMmie

Debian GNU / Linux on Samsung ATIV Book 9 Plus

Samsung just recently released a new piece of kit, ATIV Book 9 plus. Its their top of the line Ultrabook. Being in on the market for a new laptop, when I heard of the specs, I was hooked. Sure it doesn't have the best CPU in a laptop or even amazing amount of ram, in that regard its kind of run of the mill. But that was enough for me. The really amazing thing is the screen, with 3200x1800 resolution and 275DPI. If you were to get a stand alone monitor with similar resolution you'd be forking over anywhere from 50-200% the value of the ATIV Book 9 Plus. Anyway this is not a marketing pitch. As a GNU / Linux user, buying bleeding edge hardware can be a bit intimidating. The problem is that it's not clear if the hardware will work without too much fuss. I couldn't find any reports or folks running GNU / Linux on it, but decided to order one anyway.

My distro of choice is Debian GNU / Linux. So when the machine arrived the first thing I did was, try Debian Live. It did get some tinkering of BIOS (press f2 on boot to enter config) to get it to boot. Mostly because the BIOS UI is horrendus. In the end disabling secure boot was pretty much all it took. Out of the box, most things worked, exception being Wi-Fi and brightness control. At this point I was more or less convinced that getting GNU / Linux running on it would not be too hard.

I proceeded to installing Debian from stable net-boot cd. At first with UEFI enabled but secure boot disabled, installation went over fine but when it came time to boot the machine, it would simply not work. Looked like boot loader wasn't starting properly. I didn't care too much about UEFI so I disabled it completely and re-installed Debian. This time things worked and Debian Stable booted up. I tweaked /etc/apt/sources.list switching from Stable to Testing. Rebooted the machine and noticed that on boot the screen went black. It was rather obvious that the problem was with KMS. Likely the root of the problem was the new kernel (linux-image-3.10-3-amd64) which got pulled in during upgrade to testing. The short term work around is simple, disable KMS (add nomodeset to kernel boot line in grub).

So now I had a booting base system but there was still the problem of Wi-Fi and KMS. I installed latest firmware-iwlwifi which had the required firmware for Intel Corporation Wireless 7260. However Wi-Fi still did not work, fortunately I came across this post on arch linux wiki which states that the Wi-Fi card is only supported in Linux Kernel >=3.11.

After an hour or so of tinkering with kernel configs I got the latest kernel (3.11.3) to boot with working KMS and Wi-Fi. Long story short, until Debian moves to kernel >3.11 you'll need to compile your own or install my custom compiled package. With the latest kernel pretty much everything works this machine. Including the things that are often tricky, like; suspend, backlight control, touchscreen, and obviously Wi-Fi. The only thing remaining thing to figure out, are the volume and keyboard backlight control keys. But for now I'm making due with a software sound mixer. And keyboard backlight can be adjusted with (values: 0-4):

echo "4" > /sys/class/leds/samsung\:\:kbd_backlight/brightness

So if you are looking to get Samsung ATIV Book 9 and wondering if it'll play nice with GNU / Linux. The answer is yes.

by dotCOMmie at October 05, 2013 08:11 PM

August 28, 2013

Josef "Jeff" Sipek

Optimizing for Failure

For the past two years, I’ve been working at Barracuda Networks on a key-value storage system called Moebius. As with any other software project, the development was more focused on stability and basic functionality at first. However lately, we managed to get some spare cycles to consider tackling some of the big features we’ve been wishing for as well as revisiting some of the initial decisions. This includes error handling — specifically how and what size of hardware failures should be handled. During this brainstorming, I made an interesting (in my opinion) observation regarding optimizing systems.

If you take any computer architecture or organization course, you will hear about Wikipedia article: Amdahl’s law. Even if you never took an architecture course or just never heard of Amdahl, eventually you came to the realization that one should optimize for the common case. (Technically, Amdahl’s law is about parallel speedup but the idea of an upper bound on performance improvement applies here as well.) A couple of years ago, when I used to spend more time around architecture people, a day wouldn’t go by when I didn’t hear them focus on making the common case fast, and the uncommon case correct — as well as always guaranteeing forward progress.

My realization is that straightforward optimization for the common case is not sufficient. I’m not claiming that my realization is novel in any way. Simply that it surprised me more than it should.

Suppose you are writing a storage system. The common case (all hardware and software operate correctly) has been optimized and the whole storage system is performing great. Now, suppose that a hardware failure (or even a bug in other software!) occurs. Since this is a rare occurence, you did not optimize for it. The system is still operating, but you want to take some corrective action. Sadly, the failure has caused the system to no longer operate under the common case. So, you have a degraded system whose performance is hindering your corrective action! Ouch!

The answer is to optimize not just for the common case, but for some uncommon cases. Which uncommon cases? Well, the most common ones. :) The problem in the above scenario could have been (hopefully) avoided by not just optimizing for the common case, but also optimizing for the common failure! This is the weird bit… optimize for failures because you will see them.

In the case of a storage system, some failures to consider include:

  • one or more disks failing
  • random bit flips on one or more disks
  • one or more disks responding slowly
  • one or more disks temporarily disappearing and shortly after reappearing
  • low memory conditions

This list is far from exhaustive. You may even decide that some of these failures are outside the scope of your storage system’s reliability guarantees. But no matter what you decide, you need to keep in mind that your system will see failures and it must still behave well enough to not be a hindrance.

None of what I have written here is ground breaking. I just found it sufficiently different from what one normally hears that I thought I would write it up. Sorry architecture friends, the uncommon case needs to be fast too :)

by JeffPC at August 28, 2013 04:06 PM

August 03, 2013

John Lutz

a theorical p2p dynamic messaging and/or voting system which is open sourced for the people.

  The standard model of internet activity is client/server. One server to each client. Another paradigm which is much less often used is Peer to Peer (p2p).

  Peer to Peer allows each client on the internet to serve and well as receive as a client. This allows for a self-administration and self corrective design. But what is most useful and trusted is that control is decentralized. This is good for many reasons; for both uptime and power abuse is *greatly* diminished.

  Some common examples of p2p in action is Bitcoin, Tor and Bittorrent. With the exception of Tor all of these p2p systems are Open Source. Open Source provides the user with the optional ability to self compile and is critical is mediated control to every user involved. It also allows those black boxes called 'Apps', 'Programs','Systems' or 'Applications' the ability for peer review so that nothing suspicious happens without you knowing. (for example bluetooth and web cames automatically set to record and run as the default behaviour.) [I have band aids covering all my personal laptop cameras.]

  There are many services with systems provided with Apple and most notable Windows that prevent us from knowing what traffic becomes transmitted from out personal technological devices. Open Source allows us to not only conserve our personal information but also to extend and share what we've done with those we see fit. Open Source is equated here with Power to The People. And services such as support, administration or development can also use the monetary model. It all depends on each individual situation. The possibilities of Open Source and indeed amazing.

  There are many forms of Open Source software, but none so wordwide known as an operating system called Linux. There are even, in itself many forms of people and group modified Linux. I have with mixed success have used and administrated Debian, Ubuntu, Red Hat, CentOS and SuSE. Depending on any of these or many many other publically downloadable variations it can take as little as 25 minutes or 3 days to successfully fully install and configure these softwares and typical internet apps. You can literally change the source code (if you had a little swagger) to make them change their default behaviour.

  A p2p system in which a bulletin board system (ala a variation of a standard non-p2p model phpBB or vBulletin) delivered messages with time stamps and backups to other nodes could theorically be created along the lines of how famous p2p systems like Bitcoin operate. Except in the case of this theoretical framework instead of virtual currency it would be messages, votes, blogs. Any kind of data. This framework would be a nonstop behemoth using potentially hundreds of thousands or even millions of clients, who in themselves, also acts as servers. Being free from a centralized control system in which the system changes according to the whims of the few are a vital success in producing, like all good policies, a check and balance system free from tyranny.

 I would suggest if ISPS started to ban protocol ports (for example port XXXX where X=1 to 65526) like bitttorrent , a programmer could creatively reprogram this new theoretical p2p message system to alternate between different ports dynamically. That way each very powerful ISP could not ban the people's p2p messaging system as they have, which in my case, was bittorrent.

 I hope you have fully understood what I have written here. If you have any more questions or would like any more indepth to what I've presented here please let me know on @john_t_lutz on twitter. Or here in this blog. Thank you.

Worldy Yours,
John Lutz





by JohnnyL (noreply@blogger.com) at August 03, 2013 07:22 PM

July 16, 2013

Josef "Jeff" Sipek

nftw(3)

I just found out about nftw — a libc function to walk a file tree. I did not realize that libc had such a high-level function. I bet it’ll end up saving me time (and code) at some point in the future.

int nftw(const char *path, int (*fn) (const char *, const struct stat *,
				      int, struct FTW *), int depth,
	 int flags);

Given a path, it executes the callback for each file and directory in that tree. Very cool.

Ok, as I write this post, I am told that nftw is a great way to write dangerous code. Aside from easily writing dangerous things equivalent to rm -rf, I could see not specifying the FTW_PHYS to be dangerous as symlinks will get followed without any notification.

I guess I’ll play around with it a little to see what the right way (if any?) to use it is.

by JeffPC at July 16, 2013 04:14 PM

June 29, 2013

Josef "Jeff" Sipek

Isis

After several years of having a desktop at home that’s been unplugged and unused I decided that it was time to make a home server to do some of my development on and just to keep files stored safely and redundantly. This was in August 2011. A lot has happened since then. First of all, I rebuilt the OpenIndiana (an Illumos-based distribution) setup with SmartOS (another Illumos-based distribution). Since I wrote most of this a long time ago, some of the information below is obsolete. I am sharing it anyway since others may find it useful. Toward the end of the post, I’ll go over SmartOS rebuild. As you may have guessed, the hostname for this box ended up being Wikipedia article: Isis.

First of all, I should list my goals.

storage box
The obvious mix for digital photos, source code repositories, assorted documents, and email backup is easy enough to store. It however becomes a nightmare if you need to keep track where they are (i.e., which of the two external disks, public server (Odin), laptop drives, desktop drives they are on). Since none of them are explicitly public, it makes sense to keep them near home instead on my public server that’s in a data-center with a fairly slow uplink (1 Mbit/s burstable to 10 Mbits/s, billed at 95th percentile).
dev box
I have a fast enough laptop (Thinkpad T520), but a beefier system that I can let compile large amounts of code is always nice. It will also let me run several virtual machines and zones comfortably — for development, system administration experiments, and other fun stuff.
router
I have an old Linksys WRT54G (rev. 3) that has served me well for the years. Sadly, it is getting a bit in my way — IPv6 tunneling over IPv4 is difficult, the 100 Mbit/s switch makes it harder to transfer files between computers, etc. If I am making a server that will be always on, it should handle effortlessly NAT’ing my Comcast internet connection. Having a full-fledged server doing the routing will also let me do better traffic shaping & filtering to make the connection feel better.

Now that you know what sort of goals I have, let’s take a closer look at the requirments for the hardware.

  1. reliable
  2. friendly to OpenIndiana and ZFS
  3. low-power
  4. fast
  5. virtualization assists (to support run virtual machines at reasonable speed)
  6. cheap
  7. quiet
  8. spacious (storage-wise)

While each one of them is pretty easy to accomplish, their combination is much harder to achieve. Also note that is ordered from most to least important. As you will see, reliability dictated many of my choices.

The Shopping List

CPU
Intel Xeon E3-1230 Sandy Bridge 3.2GHz LGA 1155 80W Quad-Core Server Processor BX80623E31230
RAM
Kingston ValueRAM 4GB 240-Pin DDR3 SDRAM DDR3 1333 ECC Unbuffered Server Memory Model KVR1333D3E9S/4G
Motherboard
SUPERMICRO MBD-X9SCL-O LGA 1155 Intel C202 Micro ATX Intel Xeon E3 Server Motherboard
Case
SUPERMICRO CSE-743T-500B Black Pedestal Server Case
Data Drives (3)
Seagate Barracuda Green ST2000DL003 2TB 5900 RPM SATA 6.0Gb/s 3.5"
System Drives (2)
Western Digital WD1600BEVT 160 GB 5400RPM SATA 8 MB 2.5-Inch Notebook Hard Drive
Additional NIC
Intel EXPI9301CT 10/100/1000Mbps PCI-Express Desktop Adapter Gigabit CT

To measure the power utilization, I got a P3 International P4400 Kill A Watt Electricity Usage Monitor. All my power usage numbers are based on watching the digital display.

Intel vs. AMD

I’ve read Constantin’s OpenSolaris ZFS Home Server Reference Design and I couldn’t help but agree that ECC should be a standard feature on all processors. Constantin pointed out that many more AMD processors support ECC and that as long as you got a motherboard that supported it as well you are set. I started looking around at AMD processors but my search was derailed by Joyent’s announcement that they ported KVM to Illumos — the core of OpenIndiana including the kernel. Unfortunately for AMD, this port supports only Intel CPUs. I switched gears and started looking at Intel CPUs.

In a way I wish I had a better reason for choosing Intel over AMD but that’s the truth. I didn’t want to wait for AMD’s processors to be supported by the KVM port.

So, why did I get a 3.2GHz Xeon (E3-1230)? I actually started by looking for motherboards. At first, I looked at desktop (read: cheap) motherboards. Sadly, none of the Intel-based boards I’ve seen supported ECC memory. Looking at server-class boards made the search for ECC support trivial. I was surprised to find a Supermicro motherboard (MBD-X9SCL-O) for $160. It supports up to 32 GB of ECC RAM (4x 8 GB DIMMs). Rather cheap, ECC memory, dual gigabit LAN (even though one of the LAN ports uses the Intel 82579 which was unsupported by OpenIndiana at the time), 6 SATA II ports — a nice board by any standard. This motherboard uses the LGA 1155 socket. That more or less means that I was “stuck” with getting a Sandy Bridge processor. :-D The E3-1230 is one of the slower E3 series processors, but it is still very fast compared to most of the other processors in the same price range. Additionally, it’s “only” 80 Watt chip compared to many 95 or even 130 Watt chips from the previous series.

There you have it. The processor was more or less determined by the motherboard choice. Well, that’s being rather unfair. It just ended up being a good combination of processor and motherboard — a cheap server board and near-bottom-of-the-line processor that happens to be really sweet.

Now that I had a processor and a motherboard picked out, it was time to get RAM. In the past, I’ve had good luck with Kingston, and since it happened to be the cheapest ECC 4 GB DIMMs on NewEgg, I got 4 — for a grand total of 16 GB.

Case

I will let you know a secret. I love hotswap drive bays. They just make your life easier — from being able to lift a case up high to put it on a shelf without having to lift all those heavy drives at the same time, to quickly replacing a dead drive without taking the whole system down.

I like my public server’s case (Supermicro CSE-743T-645B) but the 645 Watt power supply is really an overkill for my needs. The four 5000 RPM fans on the midplane are pretty loud when they go full speed. I looked around, and I found a 500 Watt (80%+ efficiency) variant of the case (CSE-743-500B). Still a beefy power supply but closer to what one sees in high end desktops. With this case, I get eight 3.5" hot-swap bays, and three 5.25" external (non-hotswap) bays. This case shouldn’t be a limiting factor in any way.

I intended to move my DVD+RW drive from my desktop but that didn’t work out as well as I hoped.

Storage

At the time I was constructing Isis, I was experimenting with Wikipedia article: ZFS on OpenIndiana. I was more than impressed, and I wanted it to manage the storage on my home sever. ZFS is more than just a filesystem, it is also a volume manager. In other words, you can give it multiple disks and tell it to put your data on them in several different ways that closely resemble RAID levels. It can stripe, mirror, or calculate one to three parities. Wikipedia has a nice article outlining ZFS’s features. Anyway, I strongly support ZFS’s attitude toward losing data — do everything to prevent it in the first place.

Hard drives are very interesting devices. Their reliability varies with so many variables (e.g., manufacturing defects, firmware bugs). In general, manufacturers give you fairly meaningless looking, yet impressive sounding numbers about their drives reliability. Richard Elling made a great blog post where he analyzed ZFS RAID space versus Mean-Time-To-Data-Loss, or MTTDL for short. (Later, he analyzed a different MTTDL model.)

The short version of the story is nicely summed up by this graph (taken from Richard’s blog):

While this scatter plot is for a specific model of a high-end server, it applies to storage in general. I like how the various types of redundancy clump up.

Anyway, how much do I care about my files? Most of my code lives in distributed version control systems, so losing one machine wouldn’t be a problem for those. The other files would be a bigger problem. While it wouldn’t be a complete end of the world if I lost all my photos, I’d rather not lose them. This goes back to the requirements list — I prefer reliable over spacious. That’s why I went with 3-way mirror of 2 TB Seagate Barracuda Green drives. It gets me only 2 TB of usable space, but at the same time I should be able to keep my files forever. These are the data drives. I also got two 2.5" 160 GB Western Digital laptop drives to hold the system files — mirrored of course.

Around the same time I was discovering that the only sane way to keep your files was mirroring, I stumbled across Constantin’s RAID Greed post. He basically says the same thing — use 3-way mirror and your files will be happy.

Now, you might be asking… 2 TB, that’s not a lot of space. What if you out grow it? My answer is simple: ZFS handles that for me. I can easily buy three more drives, plug them in and add them as a second 3-way mirror and ZFS will happily stripe across the two mirrors. I considered buying 6 disks right away, but realized that it’ll probably be at least 6-9 months before I’ll have more than 2 TB of data. So, if I postpone the purchase of the 3 additional drives, I can save money. It turns out that a year and a half later, I’m still below 70% of the 2 TB.

Miscellaneous

I knew that one of the on-board LAN ports was not yet supported by Illumos, and so I threw a PCI-e Gigabit ethernet card into the shopping cart. I went with an Intel gigabit card. Illumos has since gained support for 82579-based NICs, but I’m lazy and so I’m still using the PCI-e NIC.

Base System

As the ordered components started showing up, I started assembling them. Thankfully, the CPU, RAM, motherboard, and case showed up at the same time preventing me from going crazy. The CPU came with a stock Intel heatsink.

The system started up fine. I went into the BIOS and did the usual new-system tweaking — make sure SATA ports are in AHCI mode, stagger the disk spinup to prevent unnecessary load peaks at boot, change the boot order to skip PXE, etc. While roaming around the menu options, I discovered that the motherboard can boot from iSCSI. Pretty neat, but useless for me on this system.

The BIOS has a menu screen that displays the fan speeds and the system and processor temperatures. With the fan on the heatsink and only one midplane fan connected the system ran at about 1°C higher than room temperature and the CPU was about 7°C higher than room temperature.

OS Installation

Anyway, it was time to install OpenIndiana. I put my desktop’s DVD+RW in the case and then realized that the motherboard doesn’t have any IDE ports! Oh well, time to use a USB flash drive instead. At this point, I had only the 2 system drives. I connected one to the first SATA port, put a 151 development snapshot (text installer) on my only USB flash drive. The installer booted just fine. Installation was uneventful. The one potentially out of the ordinary thing I did was to not configure any networking. Instead, I set it up manually after the first boot, but more about that later.

With OI installed on one disk, it was time to set up the rpool mirror. I used Constantin’s Mirroring Your ZFS Root Pool as the general guide even though it is pretty straight forward — duplicate the partition (and slice) scheme on the second disk, add the new slice to the root pool, and then install grub on it. Everything worked out nicely.

# zpool status rpool
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0 in 0h5m with 0 errors on Sun Sep 18 14:15:24 2011
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c2t0d0s0  ONLINE       0     0     0
            c2t1d0s0  ONLINE       0     0     0

errors: No known data errors

Networking

Since I wanted this box to act as a router, the network setup was a bit more…complicated (and quite possibly over-engineered). This is why I elected to do all the network setup by hand later than having to “fix” whatever damage the installer did. :)

I powered it off, put in the extra ethernet card I got, and powered it back on. To my surprise, the new device didn’t show up in dladm. I remembered that I should trigger the device reconfiguration. A short touch /reconfigure && reboot later, dladm listed two physical NICs.

network diagram

As you can see, I decided that the routing should be done in a zone. This way, all the routing settings are nicely contained in a single place that does nothing else.

Setting up the virtual interfaces was pretty easy thanks to dladm. Setting the static IP on the global zone was equally trivial.

# dladm create-vlan -l e1000g0 -v 11 vlan11
# dladm create-vnic -l e1000g0 vlan0
# dladm create-vnic -l e1000g0 internal0
# dladm create-vnic -l e1000g1 isp0
# dladm create-etherstub zoneswitch0
# dladm create-vnic -l zoneswitch0 zone_router0

# ipadm create-if internal0
# ipadm create-addr -T static -a local=10.0.0.2/24 internal/v4

You might be wondering about the vlan11 interface that’s on a separate Wikipedia article: VLAN. The idea was to have my WRT54G continue serving as a wifi access point, but have all the traffic end up on VLAN #11. The router zone would then get to decide whether the user is worthy of LAN or Internet access. I never finished poking around the WRT54G to figure out how to have it dump everything on a VLAN #11 instead of the default #0.

Router zone

OpenSolaris (and therefore all Illumos derivatives) has a wonderful feature called Wikipedia article: zones. It is essentially a super-lightweight virtualization mechanism. While talking to a couple of people on IRC, I decided that I, like them, would use a dedicated zone as a router.

Just before I set up the router zone, the storage disks arrived. The router zone ended up being stored on this array. See the storage section below for details about this storage pool.

After installing the zone via zonecfg and zoneadm, it was time to set up the routing and firewalling. First, install the ipfilter package (pkg install pkg:/network/ipfilter). Now, it is time to configure the NAT and filter rules.

NAT is easy to set up. Just plop a couple of lines into /etc/ipf/ipnat.conf:

map isp0 10.0.0.0/24 -> 0/32 proxy port ftp ftp/tcp
map isp0 10.0.0.0/24 -> 0/32 portmap tcp/udp auto
map isp0 10.0.0.0/24 -> 0/32

map isp0 10.11.0.0/24 -> 0/32 proxy port ftp ftp/tcp
map isp0 10.11.0.0/24 -> 0/32 portmap tcp/udp auto
map isp0 10.11.0.0/24 -> 0/32

map isp0 10.1.0.0/24 -> 0/32 proxy port ftp ftp/tcp
map isp0 10.1.0.0/24 -> 0/32 portmap tcp/udp auto
map isp0 10.1.0.0/24 -> 0/32

IPFilter is a bit trickier to set up. The rules need to handle more cases. In general, I tried to be a bit paranoid about the rules. For example, I drop all traffic for IP addresses that don’t belong on that interface (I should never see 10.0.0.0/24 addresses on my ISP interface). The only snag was in the defaults for the ipfilter Wikipedia article: SMF service. By default, it expects you to put your rules into SMF properties. I wanted to use the more old-school approach of using a config file. Thankfully, I quickly found a blog post which hepled me with it.

Storage, part 2

As the list of components implies, I wanted to make two arrays. I already mentioned the rpool mirror. Once the three 2 TB disks arrived, I hooked them up and created a 3-way mirror (zpool create storage mirror c2t3d0 c2t4d0 c2t5d0).

# zpool status storage
  pool: storage
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Sep 18 14:10:22 2011
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0

errors: No known data errors

Deduplication & Compression

I suspected that there would be enough files that would be stored several times — system binaries for zones, clones of source trees, etc. ZFS has built-in online Wikipedia article: deduplication. This stores each unique block only once. It’s easy enough to turn on: zfs set dedup=on storage.

Additionally, ZFS has transparent data (and metadata) compression featuring Wikipedia article: LZJB and gzip algorithms.

I enabled dedup and kept compression off. Dedup did take care of the duplicate binaries between all the zones. It even took care of duplicates in my photo stash. (At some point, I managed to end up with several diverged copies of my photo stash. One of the first things I did with Isis, was to dump all of them in the same place and start sorting them. Adobe Lightroom helped here quite a bit.)

After a while, I came to the realization that for most workloads I run, dedup was wasteful and I would be better off disabling dedup and enabling light compression (i.e., LZJB).

$HOME

The installer puts the non-privileged user’s home directory onto the root pool. I did not want to keep it there since I now had the storage pool. After a bit of thought, I decided to zfs create storage/home and then transfer over the current home directory. I could have used cp(1) or rsync(1), but I thought it would be more fun (and a learning experience) to use zfs send and zfs recv. It went something like this:

# zfs snapshot rpool/export/home/jeffpc@snap
# zfs send rpool/export/home/jeffpc@snap | zfs recv storage/home/jeffpc

In theory, any modifications to my home directory after the snapshot got lost, but since I was just ssh’d in there wasn’t much that changed. (I am ok with losing the last update to .bash_history this one time.) The last thing that needed changing is /etc/auto_home — which tells the automounter where my $HOME really is. This is the resulting file after the change (without the copyright comment):

jeffpc	localhost:/storage/home/&
+auto_home

For good measure, I rebooted to make sure things would come up properly — they did.

Since the server is not intended just for me, I created the other user account with a home directory in storage/home/holly.

Zones

I intend to use zones extensively. To keep their files out of the way, I decided on storage/zones/$ZONE_NAME. I’ll talk more about the zones I set up later in the Zones section.

CIFS

Local storage is great, but there is only so much you can do with it. Sooner or later, you will want to access it from a different computer. There are many different ways to “export” your data, but as one might expert, they all have their benefits and drawbacks. ZFS makes it really easy to export data via NFS and CIFS. After a lot of thought, I decided that CIFS would work a bit better. The major benefit of CIFS over NFS is that it Just Works™ on all the major operating systems. That’s not to say that NFS does not work, but rather that it needs a bit more…convincing at times. This is especially true on Windows.

I followed the documentation for enabling CIFS on Solaris 11. Yes, I know, OpenIndiana isn’t Solaris 11, but this aspect was the same. This ended with me enabling sharing of several datasets like this:

# zfs set sharesmb=name=photos storage/photos

ACLs

The home directory shares are all done. The photos share, however, needs a bit more work. Specifically, it should be fully accessible to the users that are supposed to have access (i.e., jeffpc & holly). The easiest way I can find is to use ZFS ACLs.

First, I set the aclmode to passthrough (zfs set aclmode=passthough storage). This will prevent a chmod(1) on a file or directory from blowing away all the ACEs (Access Control Entries?). Then on the share directory, I added two ACL entries that allow everything.

# /usr/bin/ls -dV /share/photos
drwxr-xr-x   2 jeffpc   root           4 Sep 23 09:12 /share/photos
                 owner@:rwxp--aARWcCos:-------:allow
                 group@:r-x---a-R-c--s:-------:allow
              everyone@:r-x---a-R-c--s:-------:allow
# /usr/bin/chmod A+user:jeffpc:rwxpdDaARWcCos:fd:allow /share/photos
# /usr/bin/chmod A+user:holly:rwxpdDaARWcCos:fd:allow /share/photos
# /usr/bin/chmod A2- /share/photos # get rid of user
# /usr/bin/chmod A2- /share/photos # get rid of group
# /usr/bin/chmod A2- /share/photos # get rid of everyone
# /usr/bin/ls -dV /share/photos
drwx------+  2 jeffpc   root           4 Sep 23 09:12 /share/photos
            user:jeffpc:rwxpdDaARWcCos:fd-----:allow
             user:holly:rwxpdDaARWcCos:fd-----:allow

The first two chmod commands prepend two ACEs. The next three remove ACE number 2 (the third entry). Since the directory started of with three ACEs (representing the standard Unix permissions), the second set of chmods removes those, leaving only the two user ACEs behind.

Clients

That was easy! In case you are wondering, the Solaris/Illumos CIFS service does not allow guest access. You must login to use any of the shares.

Anyway, here’s the end result:

Pretty neat, eh?

Zones

Aside from the router zone, there were a number of other zones. Most of them were for Illumos and OpenIndiana development.

I don’t remember much of the details since this predates the SmartOS conversion.

Power

When I first measured the system, it was drawing about 40-45 Watts while idle. Now, I have Isis along with the WRT54G and a gigabit switch on a UPS that tells me that I’m using about 60 Watts when idle. The load can spike up quite a bit if I put load on the 4 Xeon cores and give the disks something to do. (Afterall, it is an 80 Watt CPU!) While this is by no means super low-power, it is low enough and at the same time I have the capability to actually get work done instead of waiting for hours for something to compile.

SmartOS

As I already mentioned, I ended up rebuilding the system with SmartOS. SmartOS is not a general purpose distro. Rather, it strives to be a hypervisor with utilities that make guest management trivial. Guests can either be zones, or KVM-powered virtual machines. Here are the major changes from the OpenIndiana setup.

Storage — pools

SmartOS is one of those distros you do not install. It always netboots, boots from a USB stick or a CD. As a result, you do not need a system drive. This immediately obsoleted the two laptop drives. Conveniently, around the same time, Holly’s laptop suffered from a disk failure so Isis got to donate one of the unused 2.5" system disks.

SmartOS calls its data pool “zones”, which took a little bit of getting used to. There’s a way to import other pools, but wanted to keep the settings as vanilla as possible.

At some point, I threw in a Intel 160 GB SSD to use for L2ARC and Wikipedia article: ZIL.

Here’s what the pool looks like:

# zpool status
  pool: zones
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not
support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0 in 2h59m with 0 errors on Sun Jan 13 08:37:37 2013
config:

        NAME        STATE     READ WRITE CKSUM
        zones       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
        logs
          c1t1d0s0  ONLINE       0     0     0
        cache
          c1t1d0s1  ONLINE       0     0     0

errors: No known data errors

In case you are wondering about the features related status message, I created the zones pool way back when Illumos (and therefore SmartOS) had only two ZFS features. Since then, Illumos added one and Joyent added one to SmartOS.

# zpool get all zones | /usr/xpg4/bin/grep -E '(PROP|feature)'
NAME   PROPERTY                   VALUE                      SOURCE
zones  feature@async_destroy      enabled                    local
zones  feature@empty_bpobj        active                     local
zones  feature@lz4_compress       disabled                   local
zones  feature@filesystem_limits  disabled                   local

I haven’t experimented with either enough to enable it on a production system I rely on so much.

Storage — deduplication & compression

The rebuild gave me a chance to start with a clean slate. Specifically, it gave me a chance to get rid off the dedup table. (The dedup table, DDT, is built as writes happen to the filesystem with dedup enabled.) Data deduplication relies on some form of data structure (the most trivial one is a hash table) that maps the hash of the data to the data. In ZFS, the DDT maps the Wikipedia article: SHA-256 of the block to the block address.

The reason I stopped using dedup on my systems was pretty straight forward (and not specific to ZFS). Every entry in the DDT has an overhead. So, ideally, every entry in the DDT is referenced at least twice. If a block is referenced only once, then one would be better off without the block taking up an entry in the DDT. Additionally, every time a reference is taken or released, the DDT needs to be updated. This causes very nasty random I/O under which spinning disks want to weep. It turns out, that a “normal” user will have mostly unique data rendering deduplication impractical.

That’s why I stopped using dedup. Instead, I became convinced that most of the time light compression is the way to go. Lightly compressing the data will result in I/O bandwidth savings as well as capacity savings with little overhead given today’s processor speeds versus I/O latencies. Since I haven’t had time to experiment with the recently integrated LZ4, I still use LZJB.

by JeffPC at June 29, 2013 03:22 PM

June 22, 2013

Josef "Jeff" Sipek

First Solo Cross-Country

A week ago (June 15), I went on my first solo cross country flight. The plan was to fly KARBKMBSKAMN → KARB. In case you don’t happen to have the Detroit sectional chart in front of you, this might help you visualize the scope of the flight.

leg distance time
KARB → KMBS 79 nm 47 min
KMBS → KAMN 29 nm 20 min
KAMN → KARB 79 nm 46 min
Total 187 nm 113 min

Here’s the ground track (as recorded by the G1000) along with red dots for each of my checkpoints and a pink line connecting them. (Sadly, there’s no convenient zoom level that covers the entire track without excessive waste.)

ground track

As you can see, I didn’t quite overfly all the checkpoints. In my defense, the forecast winds were about 40 degrees off from reality during the first half of the flight. :)

Let’s examine each leg separately.

KARB → KMBS

ground track

My checkpoint by I-69 (southwest of Flint) was supposed to be a I-69 and Pontiac VORTAC (PSI) radial 311 intersection. However when I called up the FSS briefer, I found out that it was out of service. Thankfully, Salem VORTAC (SVM) is very close so I just used its radial 339 instead. Next time I’m using a VOR for any part of my planning, I’m going to check for any NOTAMs before I make it part of my plan — redoing portions of the plan is tedious and not fun.

On the way to Saginaw, I was planning to go at 3500. (Yes, I know, it is a westerly direction and the rule (FAR 91.159) says even thousand + 500, but the clouds were not high enough to fly at 4500 and the rule only applies 3000 AGL and above — the ground around these parts is 700-1000 feet MSL.)

altitude

Right when I entered the downwind for runway 23, the tower cleared me to land. My clearance was quickly followed by the tower instructing a commuter jet to hold short of 23 because of landing traffic — me! Somehow, it is very satisfying to see a real plane (CRJ-200) have to wait for little ol’ me to land. (FlightAware tells me that it was FLG3903 flight to KDTW.)

While I was on taxiway C, they got cleared to take off. I couldn’t help it but to snap a photo.

CRJ-200

It was a pretty slow day for Saginaw. The whole time I was on the radio with Saginaw approach, I got to hear maybe 5 planes total. The tower was even less busy. There were no planes around except for me and the commuter jet.

KMBS → KAMN

This leg of the flight was the hardest. First of all, it was only 29 nm. This equated to about 25 minutes of flying. The first four-ish and the last five-ish were spent climbing and descending, so really there was about 15 minutes of cruising. Not much time to begin with. I flew this leg by following the MBS VOR radial 248. My one and only checkpoint on this leg was mid way — the beginning of a wind turbine farm. It took about 2 minutes longer to get there than planned, but the wind turbines were easy to see from distance so no problems there.

ground track

Following the VOR wasn’t difficult, but you can see in the ground track that I was meandering across it. As expected, it got easier the farther away from the station I got. Here’s the plot of the CDI deflection for this leg. The CSV file says that the units are “fsd” — I have no idea what that means.

CDI deflection

I can’t really draw any conclusions because…well, I don’t know what the graph is telling me. Sure, it seems to get closer and closer to zero (which I assume is a good thing), but I can’t honestly say that I understand what the graph is saying.

The most difficult part was trying to stay at 2500 feet. For whatever reason, it felt like I was flying in sizable thermals. Since there were no thunderstorms in the area, I flew on fighting the updrafts. That was the difficult part. I suspect the wind turbines were built there because the area is windy.

altitude

KAMN is a decent size airport. Two plenty long runways for a 172 even on a hot day (5004x75 feet and 3197x75 feet). I didn’t stop by the FBO, so I have no idea how they are. I did not notice anyone else around during the couple of minutes I spent on the ground taxiing and getting ready for the next leg. Maybe it was just the overcast that made people stay indoors. Oh well. It is a nice airport, and I wouldn’t mind stopping there in the future if the need arose.

KAMN → KARB

Flying back to Ann Arbor was the easy part of the trip. The air calmed down enough that once trimmed, the plane more or less stayed at 3500 feet.

altitude

It apparently was a slow day for Lansing approach as well, as I got to hear a controller chatting with a pilot of a skydiving plane about how fast the skydivers fell to the ground. Sadly, I didn’t get to hear the end of the conversation since the controller told me to contact Detroit approach.

As far as the ground track is concerned, you can see two places where I stopped flying current heading and instead flew toward the next checkpoint visually. The first instance is a few miles north of KOZW. I spotted the airport, and since I knew I was supposed to overfly it, I turned to it and flew right over it. The second instance is by Whitmore Lake — there I looked into the distance and saw Ann Arbor. Knowing that the airport is on the south side, I just headed right toward it ignoring the planned heading. As I mentioned before in both cases, the planned course was slightly off because the winds weren’t quite like the forecast said they would be.

ground track

You can’t tell from the rather low resolution of the map, but I got to fly right over the Wikipedia article: Michigan stadium. Sadly, I was a bit too busy flying the plane to take a photo of the field below me.

Next

With one solo cross country out of the way, I’m still trying to figure out where I want to go next. Currently, I am considering one of these flights (in no particular order):

path distance time
KARB KGRR KMOP KARB 239 nm 2h19m
KARB KBIV KJXN KARB 220 nm 2h08m
KARB KFWA KTOL KARB 210 nm 2h03m
KARB CRUXX KFWA KTOL KARB 210 nm 2h06m
KARB LFD KFWA KTOL KARB 225 nm 2h12m
KARB KAZO KOEB KTOL KARB 207 nm 2h01m
KARB KMBS KGRR KARB 243 nm 2h21m
KARB KGRR KEKM KARB 266 nm 2h40m

by JeffPC at June 22, 2013 05:23 PM

Benchmark Assumptions

Today I came across a blog post about Running PostgreSQL on Compression-enabled ZFS. I found the article because (1) I am a fan of Wikipedia article: ZFS, and (2) transparent storage compression interests me. (Maybe I’ll talk about the later in the future.)

Whoever ran the benchmark decided to compare ZFS with Wikipedia article: lzjb, ZFS with gzip, against ext3. Their analysis states that ZFS-gzip is faster than ZFS-lzjb, which is faster than ext3. They admit that the benchmark is I/O bound. Then they state that compression effectively speeds up the disk I/O by making every byte transfered contain more information. The analysis goes down the drain right after that.

While doing background research for this blog post we also got a chance to investigate some of the other features besides compression that differentiate ZFS from older file system architectures like ext3. One of the biggest differences is ZFS’s approach to scheduling disk IOs which employs explicit IO priorities, IOP reordering, and deadline scheduling in order to avoid flooding the request queues of disk controllers with pending requests.

Anyone who’s benchmarked a system should have a red flag going off after reading those sentences. My reaction was something along the lines: “What?! You know that there are at least three major differences between ZFS and ext3 in addition to compression and you still try to draw conclusions about compression effectiveness by comparing ZFS with compression against ext3?!”

All they had to do to make their analysis so much more interesting and keep me quiet was to include another set of numbers — ZFS without compression. That way, one can compare ext3 with ZFS-uncompressed to see how much difference the radically different filesystem design makes. Then one could compare ZFS-uncompressed with the lzjb and gzip data to see if compression helps. Based on the data presented, we have no idea if compression helps — we just know that compression and ZFS outperform ext3. What if ZFS without compression is 5x faster than ext3? Then using gzip (~4x faster than ext3) is actually not the fastest.

To be fair, knowing how modern disk drives behave, chances are that compressed ZFS is faster than uncompressed ZFS. Since CPU cycles are so plentiful these days, all my systems have lzjb compression enabled everywhere. I do this mostly to conserve space, but also in hopes of transferring less data to disk. Yes, this is exactly what their benchmark attempts to show. (I haven’t had a chance to experiment with the new-ish lz4 compression algorithm in ZFS.) My point here is solely about benchmark analysis and unfounded (or at least unstated) assumptions found in just about every benchmark out there.

by JeffPC at June 22, 2013 02:15 AM

June 09, 2013

Josef "Jeff" Sipek

Plotting G1000 EGT

It would seem that my two recent posts are getting noticed. On one of them, someone asked for the EGT R code I used.

After I get the CSV file of the SD card, I first clean it up. Currently, I just do it manually using Vim, but in the future I will probably script it. It turns out that Garmin decided to put a header of sorts at the beginning of each CSV. The header includes version and part numbers. I delete it. The next line appears to have units for each of the columns. I delete it as well. The remainder of the file is an almost normal CSV. I say almost normal, because there’s an inordinate number of spaces around the values and commas. I use the power of Vim to remove all the spaces in the whole file by using :%s/ //g. Then I save and quit.

Now that I have a pretty standard looking CSV, I let R do its thing.

> data <- read.csv("munged.csv")
> names(data)
 [1] "LclDate"   "LclTime"   "UTCOfst"   "AtvWpt"    "Latitude"  "Longitude"
 [7] "AltB"      "BaroA"     "AltMSL"    "OAT"       "IAS"       "GndSpd"   
[13] "VSpd"      "Pitch"     "Roll"      "LatAc"     "NormAc"    "HDG"      
[19] "TRK"       "volt1"     "volt2"     "amp1"      "amp2"      "FQtyL"    
[25] "FQtyR"     "E1FFlow"   "E1OilT"    "E1OilP"    "E1RPM"     "E1CHT1"   
[31] "E1CHT2"    "E1CHT3"    "E1CHT4"    "E1EGT1"    "E1EGT2"    "E1EGT3"   
[37] "E1EGT4"    "AltGPS"    "TAS"       "HSIS"      "CRS"       "NAV1"     
[43] "NAV2"      "COM1"      "COM2"      "HCDI"      "VCDI"      "WndSpd"   
[49] "WndDr"     "WptDst"    "WptBrg"    "MagVar"    "AfcsOn"    "RollM"    
[55] "PitchM"    "RollC"     "PichC"     "VSpdG"     "GPSfix"    "HAL"      
[61] "VAL"       "HPLwas"    "HPLfd"     "VPLwas"   

As you can see, there are lots of columns. Before doing any plotting, I like to convert the LclDate, LclTime, and UTCOfst columns into a single Time column. I also get rid of the three individual columns.

> data$Time <- as.POSIXct(paste(data$LclDate, data$LclTime, data$UTCOfst))
> data$LclDate <- NULL
> data$LclTime <- NULL
> data$UTCOfst <- NULL

Now, let’s focus on the EGT values — E1EGT1 through E1EGT4. E1 refers to the first engine (the 172 has only one), I suspect that a G1000 on a twin would have E1 and E2 values. I use the ggplot2 R package to do my graphing. I could pick colors for each of the four EGT lines, but I’m way too lazy and the color selection would not look anywhere near as nice as it should. (Note, if you have only two values to plot, R will use a red-ish and a blue-ish/green-ish color for the lines. Not exactly the smartest selection if your audience may include someone color-blind.) So, instead I let R do the hard work for me. First, I make a new data.frame that contains the time and the EGT values.

> tmp <- data.frame(Time=data$Time, C1=data$E1EGT1, C2=data$E1EGT2,
                    C3=data$E1EGT3, C4=data$E1EGT4)
> head(tmp)
                 Time      C1      C2      C3      C4
1 2013-06-01 14:24:54 1029.81 1016.49 1019.08 1098.67
2 2013-06-01 14:24:54 1029.81 1016.49 1019.08 1098.67
3 2013-06-01 14:24:55 1030.94 1017.57 1019.88 1095.38
4 2013-06-01 14:24:56 1031.92 1019.05 1022.81 1095.84
5 2013-06-01 14:24:57 1033.16 1020.23 1022.82 1092.38
6 2013-06-01 14:24:58 1034.54 1022.33 1023.72 1085.82

Then I use the reshape2 package to reorganize the data.

> library(reshape2)
> tmp <- melt(tmp, "Time", variable.name="Cylinder")
> head(tmp)
                 Time Cylinder   value
1 2013-06-01 14:24:54       C1 1029.81
2 2013-06-01 14:24:54       C1 1029.81
3 2013-06-01 14:24:55       C1 1030.94
4 2013-06-01 14:24:56       C1 1031.92
5 2013-06-01 14:24:57       C1 1033.16
6 2013-06-01 14:24:58       C1 1034.54

The melt function takes a data.frame along with a name of a column (I specified “Time”), and reshapes the data.frame. For each row, in the original data.frame, it takes all the columns not specified (e.g., not Time), and produces a row for each with a variable name being the column name and the value being that column’s value in the original row. Here’s a small example:

> df <- data.frame(x=c(1,2,3),y=c(4,5,6),z=c(7,8,9))
> df
  x y z
1 1 4 7
2 2 5 8
3 3 6 9
> melt(df, "x")
  x variable value
1 1        y     4
2 2        y     5
3 3        y     6
4 1        z     7
5 2        z     8
6 3        z     9

As you can see, the x values got duplicated since there were two other columns. Anyway, the one difference in my call to melt is the variable.name argument. I don’t want my variable name column to be called “variable” — I want it to be called “Cylinder.”

At this point, the data is ready to be plotted.

> library(ggplot2)
> p <- ggplot(tmp)
> p <- p + ggtitle("Exhaust Gas Temperature")
> p <- p + ylab(expression(Temperature~(degree*F)))
> p <- p + geom_line(aes(x=Time, y=value, color=Cylinder))
> print(p)

That’s all there is to it! There may be a better way to do it, but this works for me. I use the same approach to plot the different altitude numbers, the speeds (TAS, IAS, GS), CHT, and fuel quantity.

You can download an R script with the above code here.

by JeffPC at June 09, 2013 07:38 PM

June 02, 2013

Josef "Jeff" Sipek

Garmin G1000 Data Logging: Cross-Country Edition

About a week ago, I talked about G1000 data logging. In that post, I mentioned that cross-country flying would be interesting to visualize. Well, on Friday I got to do a mock pre-solo cross country phase check. I had the G1000 logging the trip.

First of all, the plan was to fly from KARB to KFPK. It’s a 51nm trip. I had four checkpoints. For the purposes of plotting the flight, I had to convert the pencil marks on my sectional chart to latitude and longitude.

> xc_checkpoints
          Name Latitude Longitude
1      Chelsea 42.31667 -84.01667
2       Munith 42.37500 -84.20833
3       Leslie 42.45000 -84.43333
4 Eaton Rapids 42.51667 -84.65833

First of all, let’s take a look at the ground track.

ground track

In addition to just the ground track, I plotted here the first three checkpoints in red, the location of the plane every 5 minutes in blue (excluding all the data points near the airport), and some other places of interest in green.

As you can see, I was always a bit north of where I was supposed to be. Right after passing Leslie, I was told to divert to 69G. I figured out the true course, and tried to take the wind into account, but as you can see it didn’t go all that well at first. When I found myself next to some oil tanks way north of where I wanted to be, I turned southeast…a little bit too much. Eventually, I made it to Richmond which was, much like all grass fields, way too hard to spot. (I’m pretty sure that I will avoid all grass fields while on my solo cross countries.)

So, how about the altitude? The plan was to fly at 4500 feet, but due to clouds being at about 3500, Wikipedia article: pilotage being the purpose of this exercise, and not planning on going all the way to KFPK anyway, we just decided to stay at 3000. At one point, 3000 seemed like a bit too close to the clouds, so I ended up at 2900. Below is the altitude graph. For your convenience, I plotted horizontal lines at 2800, 2900, 3000, and 3100 feet. (Near the end, you can see 4 touch and gos and a full stop at KARB.)

altitude

While approaching my second checkpoint, Munith, I realized that it will be pretty hard to find. It’s a tiny little town, but sadly it is the biggest “landmark” around. So, I tuned in the JXN Wikipedia article: VOR and estimated that the 50 degree radial would go through Munith. While that wouldn’t give me my location, it would tell me when I was abeam Munith. Shortly after, I changed my estimate to the 60 degree radial. (It looks like 65 is the right answer.)

> summary(factor(data$NAV1))
109.6 114.3 
 3192  1406 
> summary(factor(data$CRS))
  36   37   42   44   47   48   49   50   52   57   59   60 
1444    1    1    1    1    1    1  135    1    1    1 3010 
> head(subset(data, HSIS=="NAV1")$Time, 1)
[1] "2013-05-31 09:43:23 EDT"
> head(subset(data, NAV1==109.6)$Time, 1)
[1] "2013-05-31 09:43:42 EDT"
> head(subset(data, CRS==50)$Time, 1)
[1] "2013-05-31 09:44:26 EDT"
> head(subset(data, CRS==60)$Time, 1)
[1] "2013-05-31 09:46:48 EDT"

When I got the plane, the NAV1 radio was tuned to 114.3 (SVM) with the 36 degree radial set. At 9:43:25, I switched the input for the HSI from GPS to NAV1; at 9:43:42, I tuned into 109.6 (JXN). 44 seconds later, I had the 50 degree radial set. Over two minutes later, I changed my mind and set the 60 degree radial, which stayed there for the remainder of the flight.

In my previous post about the G1000 data logging abilities, I mentioned that the engine related variables would be more interesting on a cross-country. Let’s take a look.

engine RPM

As you can see, when reaching 3000 feet (cf. the altitude graph) I pulled the power back to a cruise setting. Then I started leaning the mixture.

fuel flow

Interestingly, just pulling the power back causes a large saving of fuel. Leaning helped save about one gallon/hour. While that’s not bad (~11%), it is not as significant as I thought it would be.

fuel

Since there was nowhere near as much maneuvering as previously, the fuel quantity graphs look way more useful. Again, we can see that the left tank is being used more.

The cylinder head temperature and exhaust gas temperature graphs are mostly boring. Unlike the previous graphs of CHT and EGT these clearly show a nice 30 minute long period of cruising. To be honest, I thought these graphs would be more interesting. I’ll probably keep plotting them in the future but not share them unless they show something interesting.

cylinder head temperature exhaust gas temperature

Same goes for the oil pressure and temperature graphs. They are kind of dull.

oil pressure oil temperature

Anyway, that’s it for today. Hopefully, next time I’ll try to look at how close the plan was to reality.

by JeffPC at June 02, 2013 07:52 PM

May 26, 2013

Josef "Jeff" Sipek

Garmin G1000 Data Logging

About a month ago I talked about using R for plotting GPS coordinates. Recently I found out that the Wikipedia article: Cessna 172 I fly in has had its G1000 avionics updated. Garmin has added the ability to store various flight data to a CSV file on an SD card every second. Aside from the obvious things such as date, time and GPS latitude/longitude/altitude it stores a ton of other variables. Here is a subset: indicated airspeed, vertical speed, outside air temperature, pitch attitude angle, roll attitude angle, lateral and vertical G forces, the NAV and COM frequencies tuned, wind direction and speed, fuel quantity (for each tank), fuel flow, volts and amps for the two buses, engine RPM, cylinder head temperature, and exhaust gas temperature. Neat, eh? I went for a short flight that was pretty boring as far as a number of these variables are concerned. Logs for cross-country flights will be much more interesting to examine.

With that said, I’m going to have fun with the 1-hour recording I have. If you don’t find plotting time series data interesting, you might want to stop reading now. :)

First of all, let’s take a look at the COM1 and COM2 radio settings.

> unique(data$COM1)
[1] 120.3
> unique(data$COM2)
[1] 134.55 120.30 121.60

Looks like I had 3 unique frequencies tuned into COM2 and only one for COM1. I always try to get the Wikipedia article: ATIS on COM2 (134.55 at KARB), then I switch to the ground frequency (121.6 at KARB). This way, I know that COM2 both receives and transmits. Let’s see how long I’ve been on the ATIS frequency…

> summary(factor(data$COM2))
 120.3  121.6 134.55 
     1   3303     70 

It makes sense, between listening to the ATIS and tuning in the ground, I spend 70 seconds listening to 134.55. The tower frequency (120.3 at KARB) showed up for a second because I switched away from the ATIS only to realize that I didn’t tune in the ground yet. Graphing these values doesn’t make sense.

I didn’t use the NAV radios, so they stayed tuned to 114.3 and 109.6. Those are the Salem and Jackson VORs, respectively. (Whoever used the NAV radios last left these tuned in.)

To keep track of one’s altitude, one must set the Wikipedia article: altimeter to what a nearby weather station says. The setting is in Inches of Mercury. The ATIS said that 30.38 was the setting to use. The altimeter was set to 30.31 when I got it. You can see that it took me a couple of seconds to turn the knob far enough. Again, graphing this variable is pointless. It would be more interesting during a longer flight where the barometric pressure changed a bit.

> summary(factor(data$BaroA))
30.31 30.32 30.36 30.38 
  262     1     1  3110 

Ok, ok… time to make some graphs… First up, let’s take a look at the outside air temperature (in °C).

> summary(data$OAT)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    4.0     6.8    12.2    11.5    16.0    18.5 

OAT

In case you didn’t know, the air temperature drops about 2°C every 1000 feet. Given that, you might be already guessing, after I took off, I climbed a couple of thousand feet.

altitude

Here, I plotted both the altitude given by the GPS (Wikipedia article: MSL as well as Wikipedia article: WGS84) and the altitude given by the altimeter. You can see that around 12:12, I set the altimeter which caused the indicated altitude to jump up a little bit. Let’s take a look at the difference between the them.

altitude difference

Again, we can see the altimeter setting changing with the sharp ~60 foot jump at about 12:12. The discrepancy between the indicated altitude and the actual (GPS) altitude may be alarming at first, but keep in mind that even though the altimeter may be off from where you truly are, the whole air traffic system plays the same game. In other words, every aircraft and every controller uses the altimeter-based altitudes so there is no confusion. In yet other words, if everyone is off by the same amount, no one gets hurt. :)

Ok! It’s time to look at all the various speeds. The G1000 reports indicated airspeed (IAS), true airspeed (TAS), and ground speed (GS).

speed

We can see the taxiing to and from the runway — ground speed around 10 Wikipedia article: kts. (Note to self, taxi slower.) The ground speed is either more or less than the airspeed depending on the wind speed.

Moving along, let’s examine the lateral and normal accelerations. The normal acceleration is seat pushing “up”, while the lateral acceleration is the side-to-side “sliding in the seat side to side” acceleration. (Note: I am not actually sure which way the G1000 considers negative lateral acceleration.)

acceleration

Ideally, there is no lateral acceleration. (See Wikipedia article: coordinated flight.) I’m still learning. :)

As you can see, there are several outliers. So, why not look at them! Let’s consider an outlier any point with more than 0.1 G of lateral acceleration. (I chose this values arbitrarily.)

> nrow(subset(data, abs(LatAc) > 0.1))
[1] 41
> nrow(subset(data, abs(LatAc) > 0.1 & AltB < 2000))
[1] 28

As far as lateral acceleration goes, there were only 41 points beyond 0.1 Gs 30 of which were below 2000 feet. (KARB’s pattern altitude is 1800 feet so 2000 should be enough to easily cover any deviation.) Both of these counts however include all the taxiing. A turn during a taxi will result in a lateral acceleration, so let’s ignore all the points when we’re going below 25 kts.

> nrow(subset(data, abs(LatAc) > 0.1 & GndSpd > 25))
[1] 26
> nrow(subset(data, abs(LatAc) > 0.1 & AltB < 2000 & GndSpd > 25))
[1] 13

Much better! Only 26 points total, 13 below 2000 feet. Where did these points happen? (Excuse the low-resolution of the map.) You can also see the path I flew — taking off from runway 6, making a left turn to fly west to the practice area.

acceleration

The moment I took off, I noticed that the Wikipedia article: thermals were not going to make this a nice smooth ride. I think that’s why there are at least three points right by the highway while I was still climbing out of KARB. The air did get smoother higher up, but it still wasn’t a nice calm flight like the ones I’ve gotten used to during the winter. Looking at the map, I wonder if some of these points were due to abrupt power changes.

Here’s a close-up on the airport. This time, the point color indicates the amount of acceleration.

acceleration

There are only 4 points displayed. Interestingly, three of the four points are negative. Let’s take a look.

                    Time LatAc  AltB  E1RPM
2594 2013-05-25 12:52:10 -0.11 879.6 2481.1
2846 2013-05-25 12:56:31 -0.13 831.6  895.8
2847 2013-05-25 12:56:32  0.18 831.6  927.4
2865 2013-05-25 12:56:50 -0.13 955.6 2541.5

The middle two are a second apart. Based on the altitude, it looks like the plane was on the ground. Based on the engine RPMs, it looks like it was within a second or two of touchdown. Chances are that it was just nose not quite aligned with the direction of travel. The other two points are likely thermals tossing the plane about a bit — the first point is from about 50 feet above ground the last is from about 120 feet. Ok, I’m curious…

> data[c(2835:2850),c("Time","LatAc","AltB","E1RPM","GndSpd")]
                    Time LatAc  AltB  E1RPM GndSpd
2835 2013-05-25 12:56:20 -0.02 876.6 1427.9  66.71
2836 2013-05-25 12:56:21  0.01 873.6 1077.1  65.71
2837 2013-05-25 12:56:22  0.01 864.6  982.4  64.21
2838 2013-05-25 12:56:23  0.04 861.6  994.1  62.77
2839 2013-05-25 12:56:24  0.01 858.6  982.6  61.54
2840 2013-05-25 12:56:25  0.01 852.6  988.2  60.18
2841 2013-05-25 12:56:26 -0.02 845.6  959.0  58.91
2842 2013-05-25 12:56:27  0.00 846.6  945.5  57.73
2843 2013-05-25 12:56:28  0.01 844.6  930.9  56.53
2844 2013-05-25 12:56:29  0.10 834.6  908.0  55.16
2845 2013-05-25 12:56:30 -0.01 827.6  886.6  54.16
2846 2013-05-25 12:56:31 -0.13 831.6  895.8  52.71
2847 2013-05-25 12:56:32  0.18 831.6  927.4  51.49
2848 2013-05-25 12:56:33 -0.06 831.6  982.0  50.21
2849 2013-05-25 12:56:34  0.05 840.6 1494.0  49.39
2850 2013-05-25 12:56:35 -0.07 833.6 2249.7  48.76

The altitudes look a little out of whack, but otherwise it makes sense. #2835 was probably the time throttle was pulled to idle. Between #2848 and #2849 throttle went full in. Ground was most likely around 832 feet and touchdown was likely at #2846 as I guessed earlier.

Let’s plot the engine related values. First up, engine RPMs.

RPM

It is pretty boring. You can see the ~800 during taxi; the 1800 during the runup; the 2500 during takeoff; 2200 during cruise; and after 12:50 you can see the go-around, touch-n-go, and full stop.

Next up, cylinder head temperature (in °F) and exhaust gas temperature (also in °F). Since the plane has a 4 cylinder engine, there are four lines on each graph. As I was maneuvering most of the time, I did not get a chance to try to lean the engine. On a cross country, it be pretty interesting to see the temperature go up as a result of leaning.

CHT

EGT

Moving on, let’s look at fuel consumption.

fuel quantity

This is really weird. For the longest time, I knew that the plane used more fuel from the left tank, but this is the first time I have solid evidence. (Yes, the fuel selector was on “Both”.) The fuel flow graph is rather boring — it very closely resembles the RPM graph.

fuel flow

Ok, two more engine related plots.

oil temperature

oil pressure

It is mildly interesting that the temperature never really goes down while the pressure seems to be correlated with the RPMs.

There are two variables with the vertical speed — one is GPS based while the other is barometer based.

vertical speed

As you can see, the two appear to be very similar. Let’s take a look at the delta. In addition to just a plain old subtraction, you can see the 60-second moving average.

vertical speed: GPS vs. Barometer

Not very interesting. Even though the two sometimes are off by as much as 560 feet/minute, the differences are very short-lived. Furthermore, the differences are pretty well distributed with half of them being within 50 feet.

> summary(data$VSpd - data$VSpdG)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-559.8000  -49.2800    0.4950    0.8252   53.0600  563.4000 
> summary(SMA(data$VSpd - data$VSpdG),2)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
-240.2000  -22.2200    0.6940    0.8226   25.4700  226.7000         9 

Ok, last but not least the CSV contains the pitch and roll angles. I’ll have to think about what sort of creative analysis I can do. The only thing that jumps to mind is the mediocre S-turn around 12:40 where the roll changed from about 20 degrees to -25 degrees.

Roll

Pitch

I completely ignored the volts and amps variables (for each of the two busses), all the navigation related variables (waypoint identifier, bearing, and distance, Wikipedia article: HSI source, course, Wikipedia article: CDI/Wikipedia article: GS deflection), wind (direction and speed), as well as ground track, magnetic heading and Wikipedia article: variation, GPS fix (it was always 3D), GPS horizontal/vertical alert limit, and WAAS GPS horizontal/vertical protection level (I don’t think the avionics can handle WAAS — the columns were always empty). Additionally, since I wasn’t using the autopilot, a number of the fields are blank (Autopilot On/Off, mode, commands).

Ideas

A while ago I learned about CloudAhoy. Their iPhone/iPad app uses the GPS to record your flight. Then, they do some number crunching to figure out what kind of maneuvers you were doing. (I contacted them a while ago to see if one could upload a GPS trace instead of using their app, sadly it was not possible. I do not know if that has changed since.) I think it’d be kind of cool to write a (R?) script that’d take the G1000 recording and do similar analysis. The big difference is the ability to use the great number of other variables to evaluate the pilot’s control of the airplane — ranging from coordinated flight and dangerous maneuvers (banking too aggressively while slow), to “did you forget to lean?”.

by JeffPC at May 26, 2013 07:56 PM

May 19, 2013

Justin Dearing

Creating a minimally viable Centos instance for SSH X11 Forwarding

I recently need to setup a CentOS 6.4 vm for development Java development. I wanted to be able to run Eclipse STS and on said vm and display the X11 Windows remotely on my Windows 7 desktop via XMing. I saw no reason for the CentOS VM to have a local X11 server. I’m quite comfortable with the Linux command line. I decided to share briefly on how to go from a CentOS minimal install to something actually useful for getting work done.

  • /usr/bin/man The minimal install installs man pages, but not the man command. This is an odd choice. yum install man will fix that.
  • vim There is a bare bones install of vim included by default that is only accessible via vi. If  you want a more robust version of vim, yum install vim.
  • X11 forwarding You need the xauth package and fonts. yum install xauth will allow X11 forwarding to work. yum groupinstall fonts will install a set of fonts.
  • A terminal for absolute minimal viability yum install xterm will give  you a terminal. I prefer terminator, which is available through rpmforge.
  • RpmForge (now repoforge) Centos is based on Red Hat Enterprise Linux. Therefore it focuses on being a good production server, not a developer environment. You will probably need rpmforge to get some of the packages you want. The directions for adding Rpmforge to your yum repositories are here.
  • terminator This is my terminal emulator of choice. One you added rpmforge, yum install rpmforge
  • gcc, glibc, etc Honestly, you can usually live without these if you stick to precompiled rpms, and you’re not using gcc for development. If you need to build a kernel module, yum install kernel-devel gcc make should get you what out need.

From here, you can install the stuff you need for your development environment for your language, framework, and scm of choice.

by Justin at May 19, 2013 03:35 PM

May 11, 2013

Justin Dearing

When your PowerShell cmdlet doesn’t return anything, use -PassThru

The other day I was mounting an ISO in Windows 8 via the Mount-DiskImage command. Since I was mounting the disk image in a script, I needed to know the drive letter it was mounted to so the script could access the files contained within. However, Mount-DiskImage was not returning anything. I didn’t want to go through the hack of listing drives before and after I mounted the disk image, or explicitly assigning the drive letter. Both would leave me open to race conditions if another drive was mounted by another process while my script ran. I was at a loss for what to do.

Then, I remembered the -PassThru parameter, which I am quite fond of using with Add-Type. See certain cmdlets, like Mount-DiskImage, and Add-Type don’t return pipeline output by default. For Add-Type, this makes sense. You rarely want to see a list of the types you just added, unless your exploring the classes in a DLL from the command like. However, for Mount-DiskImage, defaulting to no output was a questionable decision IMHO.

Now in the case of Mount-DiskImage, -PassThru doesn’t return the drive letter. However, it does return an object that you can pipe to Get-Volume which does return an object with a DriveLetter property. To figure that out, I had to ask on stackoverflow.

tl;dr: If your PowerShell cmdlet doesn’t return any output, try -PassThru. If you need the drive letter of a disk image mounted with Mount-DiskImage, pipe the output through Get-Volume.

For a more in depth treatise of -PassThru, check out this script guy article by Ed Wilson(blog|twitter).

by Justin at May 11, 2013 12:50 AM

Getting the Drive Letter of a disk image mounted with WinCdEmu

In my last post, I talked about mounting disk images in Windows 8. Both Windows 8 and 2012 include native support for mounting ISO images as drives. However, in prior versions of Windows you needed a third party tool to do this. Since I have a preference for open source, my tool of choice before Windows 8 was WinCdEmu. Today, I decided to see if it was possible to determine the drive letter of an ISO mounted by WinCdEMu with PowerShell.

A quick search of the internet revealed that WinCdEmu contained a 32 bit command line tool called batchmnt.exe, and a 64 bit counterpart called batchmnt64.exe. These tools were meant for command line automation. While I knew there would be no .NET libraries in WinCdEmu, I did have hope there would be a COM object I could use with New-Object. Unfortunately, all the COM objects were for Windows Explorer integration and popped up GUIs, so they were inappropriate for automation.

Next I needed to figure out how to use batchmnt. For this I used batchmnt64 /?.

C:\Users\Justin>"C:\Program Files (x86)\WinCDEmu\batchmnt64.exe" /?
BATCHMNT.EXE - WinCDEmu batch mounter.
Usage:
batchmnt <image file> [<drive letter>] [/wait] - mount image file
batchmnt /unmount <image file>         - unmount image file
batchmnt /unmount <drive letter>:      - unmount image file
batchmnt /check   <image file>         - return drive letter as ERORLEVEL
batchmnt /unmountall                   - unmount all images
batchmnt /list                         - list mounted

C:\Users\Justin>

Mounting and unmounting are trivial. The /list switch produces some output that I could parse into a PSObject if I so desired. However, what I really found interesting was batchmnt /check. The process returned the drive letter as ERORLEVEL. That means the ExitCode of the batchmnt process. If you ever programmed in a C like language, you know your main function can return an integer. Traditionally 0 means success and a number means failure. However, in this case 0 means the image is not mounted, and a non zero number is the ASCII code of the drive letter. To get that code in PowerShell is simple:

$proc = Start-Process  -Wait `
    "C:\Program Files (x86)\WinCDEmu\batchmnt64.exe" `
    -ArgumentList '/check', '"C:\Users\Justin\SQL Server Media\2008R2\en_sql_server_2008_r2_developer_x86_x64_ia64_dvd_522665.iso"' `
    -PassThru;
[char] $proc.ExitCode

The Start-Process cmdlet normally returns immediately without output. The -PassThru switch makes it return information about the process it created, and -Wait make the cmdlet wait for the process to exit, so that information includes the exit code. Finally to turn that ASCII code to the drive letter we cast with [char].

by Justin at May 11, 2013 12:47 AM

May 05, 2013

Josef "Jeff" Sipek

Instrument Flying

I was paging through a smart collection in Lightroom, when I came across a batch of photos from early December that I did not share yet. (A smart collection is filter that will only show you photos satisfying a predicate.)

On December 2nd, one of the people I work with (the same person that told me exactly how easy it is to sign up for lessons) told me that he was going up to do a couple of practice instrument approaches to Jackson (KJXN) in the club’s Cessna 182. He then asked if I wanted to go along. I said yes. It was a warm, overcast day…you know, the kind when the weather seems to sap all the motivation out of you. I was going to sit in the back (the other front seat was occupied by another person I work with — also a pilot) and play with my camera. Below are the some of the better shots; there are more in the gallery.

Getting ready to take off:

US-127 and W Berry Rd:

The pilot:

The co-pilot:

On the way back to Ann Arbor (KARB), we climbed to five thousand feet, which took us out of the clouds. Since I was sitting in the back, I was able to swivel around and enjoy the sunset on a completely overcast day. The experience totally made my day. After I get my private pilot certificate, I am definitely going to consider getting instrument rated.

The clouds were very fluffy.

by JeffPC at May 05, 2013 01:41 AM

May 03, 2013

Justin Dearing

Setting the Visual Studio TFS diff and merge tools with PowerShell

I recently wrote this script to let me quickly change the diff and merge tools TFS uses from PowerShell. I plan to make it a module and add it to the StudioShell Contrib package by Jim Christopher (blog|twitter). For now, I share it as a gist and place it on this blog.

The script supports Visual Studio 2008-2012 and the following diff tools:

Enjoy!

by Justin at May 03, 2013 02:09 AM

April 28, 2013

Eitan Adler

Pre-Interview NDAs Are Bad

I get quite a few emails from business folk asking me to interview with them or forward their request to other coders I know. Given the volume it isn't feasible to respond affirmatively to all these requests.

If you want to get a coder's attention there are a lot of things you could do, but there is one thing you shouldn't do: require them to sign an NDA before you interview them.

From the candidates point of view:

  1. There are a lot more ideas than qualified candidates.
  2. Its unlikely your idea is original. It doesn't mean anyone else is working on it, just that someone else probably thought of it.
  3. Lets say the candidate was working on a similar, if not identical project. If the candidate fails to continue with you now they have to consult a lawyer to make sure you can't sue them for a project they were working on before
  4. NDAs are hard legal documents and shouldn't be signed without consulting a lawyer. Does the candidate really want to find a lawyer before interviewing with you?
  5. An NDA puts the entire obligation on the candidate. What does the candidate get from you?
From a company founders point of view:
  1. Everyone talks about the companies they interview with to someone. Do you want to be that strange company which made them sign an NDA? It can harm your reputation easily.
  2. NDAs do not stop leaks. They serve to create liability when a leak occurs. Do you want to be the company that sues people that interview with them?

There are some exceptions; for example government and security jobs may require security clearance and an NDA. For more jobs it is possible to determine if a coder is qualified and a good fit without disclosing confidential company secrets.

by Eitan Adler (noreply@blogger.com) at April 28, 2013 10:37 PM

Josef "Jeff" Sipek

Change Ringing - The Changes

We have seen what a bell tower set up for change ringing looks like; we have looked at the mechanics of ringing a single bell and what it sounds like if you ring the bells in what is called rounds (all bells ring one after each other in order of pitch, starting with the treble and ending with the tenor).

Ringing rounds is good practice, but ringing would be really boring if that was all there was. Someone at some point decided that it’d be fun for one of the ringers to be a conductor, and direct the other ringers to do the most obvious thing — swap around. So, for example, suppose we have 6 bells, the treble is the first, and the tenor is the last. First, we get rounds by ringing all of them in numerical order:

123456

Then, the conductor makes a call telling two bells to change around. For example, say that the conductor says: 5 to 3. This tells the person ringing bell number 5 that the next hand stroke (I completely skipped over this part in the previous post, but bell strikes come in pairs: hand stroke, and back stroke) he should follow the bell number 3. In other words, the new order will be:

123546

You can see that in addition to the 5 changing place, the 4 had to move too! Now, it is following the 5.

Until the next call, the bells go in this order. Then the conductor may say something like: 3 to 1, or 3 to treble. Just as before, 2 bells move. This time, it is the 2 and the 3, yielding:

132546

Let’s have another call…5 to 3. Now, we have:

135246

This pattern (all odd bells in increasing order, followed by all even bells in increasing order) is called Queens. There are many such patterns.

Ringing traditionally starts in rounds, and ends in rounds. So, let’s have a few calls and return the bells to rounds.

3 to the lead (this means that 3 will be the first bell) 315246
4 to 5 315426
4 to the treble 314526
2 to 4 314256
treble lead 134256
2 to 3 132456
rounds next 123456

There we have it. We’re back in rounds. There was nothing special about the order of these changes. Well, there is one rule: the bells that are changing places must be adjacent. So, for example, if we start in rounds, we can’t do 4 to the treble. Why is that? These bells are heavy, and especially the heavier ones (>10 Wikipedia article: cwt) will not move that far easily. Remember, this is bell ringing, not wrestling.

by JeffPC at April 28, 2013 01:00 PM

April 22, 2013

Josef "Jeff" Sipek

Plotting with ggmap

Recently, I came across ggmap package for R. It supposedly makes for some very easy plotting on top of Google Maps or OpenStreetMap. I grabbed a GPS recording I had laying around, and gave it a try.

You may recall my previous attempts at plotting GPS data. This time, the data file I was using was recorded with a USB GPS dongle. The data is much nicer than what a cheap smartphone GPS could produce.

> head(pts)
        time   ept      lat       lon   alt   epx    epy mode
1 1357826674 0.005 42.22712 -83.75227 297.7 9.436 12.755    3
2 1357826675 0.005 42.22712 -83.75227 297.9 9.436 12.755    3
3 1357826676 0.005 42.22712 -83.75227 298.1 9.436 12.755    3
4 1357826677 0.005 42.22712 -83.75227 298.4 9.436 12.755    3
5 1357826678 0.005 42.22712 -83.75227 298.6 9.436 12.755    3
6 1357826679 0.005 42.22712 -83.75227 298.8 9.436 12.755    3

For this test, I used only the latitude, longitude, and altitude columns. Since the altitude is in meters, I multiplied it by 3.2 to get a rough altitude in feet. Since the data file is long and goes all over, I truncated it to only the last 33 minutes.

The magical function is the get_map function. You feed it a location, a zoom level, and the type of map and it returns the image. Once you have the map data, you can use it with the ggmap function to make a plot. ggmap behaves a lot like ggplot2’s ggplot function and so I felt right at home.

Since the data I am trying to plot is a sequence of latitude and longitude observations, I’m going to use the geom_path function to plot them. Using geom_line would not produce a path since it reorders the data points. Second, I’m plotting the altitude as the color.

Here are the resulting images:

If you are wondering why the line doesn’t follow any roads… Roads? Where we’re going, we don’t need roads. (Hint: flying)

Here’s the entire script to get the plots:

#!/usr/bin/env Rscript

library(ggmap)

pts <- read.csv("gps.csv")

/* get the bounding box... left, bottom, right, top */
loc <- c(min(x$lon), min(x$lat), max(x$lon), max(x$lat))

for (type in c("roadmap","hybrid","terrain")) {
	print(type)
	map <- get_map(location=loc, zoom=13, maptype=type)
	p <- ggmap(map) + geom_path(aes(x=lon, y=lat, color=alt*3.2), data=x)

	jpeg(paste(type, "-preview.jpg", sep=""), width=600, height=600)
	print(p)
	dev.off()

	jpeg(paste(type, ".jpg", sep=""), width=1024, height=1024)
	print(p)
	dev.off()
}

P.S. If you are going to use any of the maps for anything, you better read the terms of service.

by JeffPC at April 22, 2013 12:46 AM

April 20, 2013

Josef "Jeff" Sipek

Matthaei Botanical Gardens

Back in early February, Holly and I went to the Matthaei Botanical Gardens. I took my camera with me. After over two months of doing nothing with the photos, I finally managed to post-process some of them. I have no idea what the various plants are called — I probably should have made note of the signs next to each plant. (photo gallery)

This one didn’t turn out as nicely as I hoped. Specifically, it is a little blurry. Maybe I’ll go back at some point to retake the photo.

This one is just cool.

I think this is some kind of Aloe.

by JeffPC at April 20, 2013 09:20 PM

April 19, 2013

Josef "Jeff" Sipek

IPS: The Manifest

In the past, I have mentioned that IPS is great. I think it is about time I gave you more information about it. This time, I’ll talk about the manifest and some core IPS ideals.

IPS, Image Packaging System, has some really neat ideas. Each package contains a manifest. The manifest is a file which list actions. Some very common actions are “install a file at path X,” “create a symlink from X to Y,” as well as “create user account X.” The great thing about this, is that the manifest completely describes what needs to be done to the system to install a package. Uninstalling a package simply undoes the actions — delete files, symlinks, users. (This is where the “image” in IPS comes from — you can assemble the system image from the manifests.)

For example, here is the (slightly hand edited) manifest for OpenIndiana’s rsync package:

set name=pkg.fmri value=pkg://openindiana.org/network/rsync@3.0.9,5.11-0.151.1.7:20121003T221151Z
set name=org.opensolaris.consolidation value=sfw
set name=variant.opensolaris.zone value=global value=nonglobal
set name=description value="rsync - faster, flexible replacement for rcp"
set name=variant.arch value=i386
set name=pkg.summary value="rsync - faster, flexible replacement for rcp"
set name=pkg.description value="rsync - A utility that provides fast incremental file transfer and copy."
set name=info.classification value="org.opensolaris.category.2008:Applications/System Utilities"
dir group=sys mode=0755 owner=root path=usr
dir group=bin mode=0755 owner=root path=usr/bin
dir group=sys mode=0755 owner=root path=usr/share
dir group=bin mode=0755 owner=root path=usr/share/man
dir group=bin mode=0755 owner=root path=usr/share/man/man1
dir group=bin mode=0755 owner=root path=usr/share/man/man5
license 88142ae0b65e59112954efdf152bb2342e43f5e7
	chash=3b72b91c9315427c1994ebc5287dbe451c0731dc
	license=SUNWrsync.copyright pkg.csize=12402 pkg.size=35791
file 02f1be6412dd2c47776a62f6e765ad04d4eb328c
	chash=945deb12b17a9fd37461df4db7e2551ad814f88b
	elfarch=i386 elfbits=32
	elfhash=1d3feb5e8532868b099e8ec373dbe0bea4f218f1
	group=bin mode=0555 owner=root path=usr/bin/rsync
	pkg.csize=191690 pkg.size=395556
file 7bc01c64331c5937d2d552fd93268580d5dd7c66
	chash=328e86655be05511b2612c7b5504091756ef7e61
	group=bin mode=0444 owner=root
	path=usr/share/man/man1/rsync.1 pkg.csize=50628
	pkg.size=165934
file 006fa773e1be3fecf7bbfb6c708ba25ddcb0005e
	chash=9e403b4965ec233c5e734e6fcf829a034d22aba9
	group=bin mode=0444 owner=root
	path=usr/share/man/man5/rsyncd.conf.5
	pkg.csize=12610 pkg.size=37410
depend fmri=consolidation/sfw/sfw-incorporation type=require
depend fmri=system/library@0.5.11-0.151.1.7 type=require

The manifest is very easily readable. It is obvious that there are several sets of actions:

metadata
specifies the FMRI, description, and architecture among others
directories
lists all the directories that need to be created/deleted during installation/removal
license
specifies the file with the text of the license for the package
files
in general, most actions are file actions — each installs a file
dependencies
lastly, rsync depends on system/library and sfw-incorporation

The above example is missing symlinks, hardlinks, user accounts, services, and device driver related actions.

Many package management systems have the ability to execute arbitrary scripts after installation or prior to removal. IPS does not allow this since it would violate the idea that the manifest completely describes the package. This means (in theory), that one can tell IPS to install the base packages into a directory somewhere, and at the end one has a working system.

It all sounds good, doesn’t it? As always, the devil is in the details.

First of all, sometimes there’s just no clean way to perform all package setup at install time. One just needs a script to run to take care of the post-install configuration. Since IPS doesn’t support this, package developers often create a transient Wikipedia article: SMF manifest and let SMF run the script after the installation completes. This is just ugly, but not the end of the world.

Requests?

I’m going to try something new. Instead of posting a random thought every so often, I’m going to take requests. What do you want me to talk about next?

by JeffPC at April 19, 2013 02:28 AM

Math test

I decided to finally implement some math support. Here’s a test post.

$a^n + b^n = c^n$

$v(t) = \frac{at}{\sqrt{1 + (\frac{at}{c})^2}}$

I hope equation support will come in handy.

by JeffPC at April 19, 2013 01:41 AM

February 13, 2013

Josef "Jeff" Sipek

FAST 2013

Since FAST starts today, yesterday was dedicated to flying out to San Jose.

Once at KDTW, I spent most of my wait there watching planes at the gates as well as watching more planes take off on 22L. Somehow, it was fascinating to watch them land on 22L and see 22R in the background — the same 22R that I got to do touch and go’s on a couple of weeks ago. I think not having to aviate first let me enjoy the sights — planes large and small barrelling down the runway and then *poof* they gently lift off the runway. At about 500 feet the gear retracts. It’s magic!

At one point, I saw the plane at the adjacent gate being prepared for its next flight. I both enjoyed seeing and sympathized with one of the crew (I assume the first officer since I suspect the captain wanted to stay warm) walking around the plane visually inspecting it. I know how annoying it is to be outside when it is cold to make sure the plane is safe to fly, yet I find it comforting that the same rules apply not only to Cessna 172s but also to Airbus A320s.

The first leg of the trip took me to KSLC. I brought my copy of the FAR/AIM with me. I read a bunch. I looked out the window a bunch. After we got past Lake Michigan, the sky cleared up allowing me to watch the ground below instead of the layer of overcast. I was very surprised to discover that the snow covered landscape makes it very easy to spot airports. Well, it is easy to spot paved runways that have been plowed.

The approach to KSLC was pretty cool. I never thought about the landscape in Utah before, but it turns out that Salt Lake City is surrounded by some serious mountains. Now, throw in winter weather with overcast and you’ll end up with a sea of white except for a few places where the mountains are peaking through.

Learning to fly in southeastern Michigan doesn’t make you think about mountains — there just aren’t any. Seeing the mountains peeking through the clouds was a scary reminder that there are more things in the sky than just other airplanes and some towers. If one were flying VFR above the clouds (which is a bad idea), where would be a safe place to descend? Obviously not where the mountains peak through, but any other place might be just as bad. The best looking place could have a mountain or a ridge few hundred feet below the cloud tops. Granted, sectional charts would depict all the mountains but it is a dangerous game to play.

I knew we would end up descending through the overcast and so I played a little game I expected to lose. Once we were in the clouds, I tried to keep track of our attitude by just sensing the forces. I knew I would fail, but I thought it would be interesting to try my best. We spent maybe 90 to 120 seconds in the clouds. At the end, I definitely felt like we were in a right bank — Wikipedia article: Spatial disorentation. I knew that we probably weren’t, but without visual information to fix up my perception there was no way for me to know.

We landed. I watched all the airport signs and markings, following our progress on an airport diagram. Once people started getting off the plane, I decided to ask to see the airworthiness certificate. The first officer (I think) found all the paperwork in the cockpit and showed me. It was really cool to see the same form I see every time I fly the 172 but filled out for an A320. (Theirs was laminated!) We chatted for a little bit about what I fly, and how it’s a good plane. It was fun.

It was time to get to my connecting flight. Nothing interesting happened. I spent about half the flight watching the outside and half reading my book.

After arriving to KSJC, I got up from my seat in the small but comfy plane (CRJ200). I grabbed my backpack from the overhead bin with one hand since the other hand not only had my hoodie draped over but was holding the FAR/AIM. I started filing out. All that was left to do was give the thank-you-for-landing-safely-and-not-killing-me nod to the crew as I exited the plane. The captain or FO happened to be standing in the cockpit door saying good bye to passengers. I nodded as planned. He responded: “good book.” I smiled.

by JeffPC at February 13, 2013 08:10 PM

Eitan Adler

Don't Use Timing Functions for Profiling

One common technique for profiling programs is to use the gettimeofday system call (with code that looks something like this):

Example (incorrect) code that uses gettimeofday - click to view
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
void function(void)
{
struct timeval before;
struct timeval after;
gettimeofday(&before, NULL);
codetoprofile();
gettimeofday(&after, NULL);
time_t delta = after.tv_sec - before.tv_sec;
printf("%ld\n",delta);
}

However, using gettimeofday(2) or time(3) or any function designed to get a time of day to obtain profiling information is wrong for many reasons:

  1. Time can go backwards. In a virtualized environment this can happen quite often. In non-virtualized environments this can happen due to time zones. Even passing CLOCK_MONOTONIC to clock(3) doesn't help as it can go backwards during a leap second expansion.
  2. Time can change drastically for no reason. Systems with NTP enabled periodically sync their time with a time source. This can cause the system time to change by minutes, hours, or even days!
  3. These functions measure Wall Clock time. Time spent on entirely unrelated processes is going to be included in the profiling data!
  4. Even if you have disabled everything else on the system[1] the delta computed above includes both of User time and System Time. If your algorithm is very fast but the kernel has a slow implementation of some system call you won't learn much.
  5. gettimeofday relies on the cpu clock which may differ across cores resulting in time skew.

So what should be used instead?

There isn't a good, portable, function to obtain profiling information. However there are options for those not tied to a particular system (or those willing to maintain multiple implementations for different systems.

The getrusage(2) system call is one option for profiling data. This provides different fields for user time (ru_utime) and system time (ru_stime) at a relatively high level of precision and accuracy.

Using DTraces profiling provider also seems to be a decent choice although I limited experience with it.

Finally, using APIs meant to access hardware specific features such as FreeBSD's hwpmc is likely to provide the best results at the cost of being the least portable. Linux has similar features such as oprofile and perf. Using dedicated profilers such as Intel's vtunes[2] may also be worthwhile.

  1. Including networking, background process swapping, cron, etc.
  2. A FreeBSD version is available.
update 2012-11-26: Include note about clock skew across cores.
Update 2013-02-13: Update and fix a massive error I had w.r.t. clock(3)

by Eitan Adler (noreply@blogger.com) at February 13, 2013 12:26 AM

January 20, 2013

Nate Berry

Installing Cyanogenmod on ASUS Transformer TF101

ComputerLinux
editor’s note: I’ve updated this story many times since I first posted it. For the current status, scroll all the way to the end of the story as I’ve appended update notices to the end each time I upgraded or switched Roms. Back in December, 2011 when I first got my ASUS Transformer TF101 it […]

by Nate at January 20, 2013 07:10 PM

January 19, 2013

Josef "Jeff" Sipek

Useless reinterpret_cast in C++

A few months ago (for whatever reason, I didn’t publish this post earlier), I happened to stumble on some C++ code that I had to modify. While trying to make things work, I happened to get code that essentially was:

uintptr_t x = ...;
uintptr_t y = reinterpret_cast<uintptr_t>(x);

Yes, the cast is useless. The actual code I had was much more complicated and it wasn’t immediately obvious that ‘x’ was already a uintptr_t. Thinking about it now, I would expect GCC to give a warning about a useless cast. What I did not expect was what I got:

foo.cpp:189:3: error: invalid cast from type "uintptr_t {aka long unsigned int}"
    to type "uintptr_t {aka long unsigned int}"

Huh? To me it seems a bit silly that the compiler does not know how to convert from one type to the same type. (For what it’s worth, this is GCC 4.6.2.)

Can anyone who knows more about GCC and/or C++ shed some light on this?

by JeffPC at January 19, 2013 10:26 PM

Serial Console

Over the past couple of days, I’ve been testing my changes to the crashdump core in Illumos. (Here’s why.) I do most of my development on my laptop — either directly, or I use it to ssh into a dev box. For Illumos development, I use the ssh approach. Often, I end up using my ancient desktop (pre-HyperThreading era 2GHz Pentium 4) as a test machine. It gets pretty annoying to have a physical keyboard and monitor to deal with when the system crashes. The obvious solution is to use a serial console. Sadly, all the “Solaris serial console howtos” leave a lot to be desired. As a result, I am going to document the steps here. I’m connecting from Solaris to Solaris. If you use Linux on one of the boxes, you will have to do it a little differently.

Test Box

First, let’s change the console speed from the default 9600 to a more reasonable 115200. In /etc/ttydefs change the console line to:

console:115200 hupcl opost onlcr:115200::console

Second, we need to tell the kernel to use the serial port as a console. Here, I’m going to assume that you are using the first serial port (i.e., ttya). So, open up your Grub config (/rpool/boot/grub/menu.lst assuming your root pool is rpool) and find the currently active entry.

You’ll see something like this:

title openindiana-8
findroot (pool_rpool,0,a)
bootfs rpool/ROOT/openindiana-8
splashimage /boot/splashimage.xpm
foreground FF0000
background A8A8A8
kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS
module$ /platform/i86pc/$ISADIR/boot_archive

We need to add two options. One to tell the kernel to use the serial port as a console, and one to tell it the serial config (rate, parity, etc.).

You’ll want to change the kernel$ line to:

kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS,console=ttya,ttya-mode="115200,8,n,1,-" -k

Note that we appended the options with commas to the existing -B. If you do not already have a -B, just add it and the two new options. The -k will make the kernel drop into the debugger when bad things happen. You can omit it if you just want a serial console without the debugger getting loaded.

There’s one last thing left to do. Let’s tell grub to use the same serial port and not use a splash image. This can be done by adding these lines to the top of your menu.lst:

serial --unit=0 --speed=115200
terminal serial

and removing (commenting out) the splashimage line.

So, what happens if you make all these changes and then beadm creates a new BE? The right thing! beadm will copy over all the kernel options so your new BE will just work.

Dev Box

I use OpenIndiana on my dev box. I could have used minicom, but I find minicom to be a huge pain unless you have a modem you want to talk to. I’m told that screen can talk to serial ports as well. I decided to keep things super-simple and configured tip.

First, one edits /etc/remote. I just changed the definition for hardwire to point to the first serial port (/dev/term/a) and use the right speed (115200):

hardwire:\
	:dv=/dev/term/a:br#115200:el=^C^S^Q^U^D:ie=%$:oe=^D:

Then, I can just run a simple command to get the other system:

$ tip hardwire

by JeffPC at January 19, 2013 10:16 PM

January 11, 2013

Nate Berry

Youtube TV control from my android tablet

Linux
update 131202: I would disregard most of this post since I’ve picked up a chromecast which, at $35 has made queueing up Youtube, Netflix, or HBOgo videos to the TV stupid easy from the android tablet. I replaced the underpowered atom box with an Intel NUC for more serious gaming (minecraft mainly). ====================== Ive got […]

by Nate at January 11, 2013 02:48 AM

January 06, 2013

Justin Dearing

Announcing SevenZipCmdLine.MSBuild

This was a quick and dirty thing born out of necessity, and need to make zip files of PoshRunner so I could make its chocolatey package.

I made MSBuild tasks for creating 7zip and zip files out of the $(TargetDir) of an MSBuild project. There is a nuget package for it. Simply include it in your project via nuget and build it from the command line with the following command line:

%windir%\microsoft.net\framework\v4.0.30319\msbuild __PROJECT_FOLDER__\__PROJECT_FILE__ /t:SevenZipBin,ZipBin

This will create project.zip and project.7z in __PROJECT_FOLDER__\bin\Target. To see how to override some of the defaults, look at this msbuild file in PoshRunner.

Source code is available via a github repo, and patches are welcome!

by Justin at January 06, 2013 01:25 PM

PoshRunner now on SourceForge and Chocolatey

I’ve been periodically hacking away at PoshRunner. I have lots of plans for it. Some of these are rewriting some of it in C++, allowing you to log output to MongoDB and total world domination! However, today’s news is not as grand.

The first piece of news is I made a PoshRunner sourceforge project to distribute the binaries. To download the latest version, click here. Secondly, there is now a PoshRunner chocolatey package, so you can install it via chocolatey. Finally, there is not a lot of documentation on PoshRunner.exe, so here is the output of poshrunner -help.

Usage: poshrunner.exe [OPTION] [...]

Options:
   --appdomainname=NAME                                     Name to give the AppDomain the PowerShell script executes in.
   --config=CONFIGFILE                                      The name of the app.config file for the script. Default is scriptName.config
   -f SCRIPT, --script=SCRIPT                               Name of the script to run.
   -h, --help                                               Show help and exit
   --log4netconfig=LOG4NETCONFIGFILE                        Override the default config file for log4net.
   --log4netconfigtype=LOG4NETCONFIGTYPE                    The type of Log4Net configuration.
   --shadowcopy                                             Enable Assembly ShadowCopying.
   -v, --version                                            Show version info and exit

by Justin at January 06, 2013 02:31 AM

January 04, 2013

Eitan Adler

Correctly Verifying an Email Address

Some services that accept email addresses want to ensure that these email addresses are valid.

There are multiple aspects to an email being valid:
  1. The address is syntactically valid.
  2. An SMTP server accepts mail for the address.
  3. A human being reads mail at the address.
  4. The address belongs to the person submitting it.

How does one verify an email address? I'll start with the wrong solutions and build up the correct one.

Possibility #0 - The Regular Expression

Discussions on a correct regular expression to parse email addresses are endless. They are almost always wrong. Even really basic pattern matching such as *@*.* is wrong: it will reject the valid email address n@ai.[5]

Even a fully correct regular expression does not tell you if the mailbox is valid or reachable.

This scores 0/4 on the validity checking scale.

Possibility #1 - The VRFY Command

The oldest mechanism for verifying an email address is the VRFY mechanism in RFC821 section 4.1.1:

VERIFY (VRFY) This command asks the receiver to confirm that the argument identifies a user. If it is a user name, the full name of the user (if known) and the fully specified mailbox are returned.

However this isn't sufficient. Most SMTP servers disable this feature for security and anti-spam reasons. This feature could be used to enumerate every username on the server to perform more targeted password guessing attacks:

Both SMTP VRFY and EXPN provide means for a potential spammer to test whether the addresses on his list are valid (VRFY)... Therefore, the MTA SHOULD control who is is allowed to issue these commands. This may be "on/off" or it may use access lists similar to those mentioned previously.

This feature wasn't guaranteed to be useful at the time the RFC was written:[1]

The VRFY and EXPN commands are not included in the minimum implementation (Section 4.5.1), and are not required to work across relays when they are implemented.

Finally, even if VRFY was fully implemented there is no guarantee that a human being reads the mail sent to that particular mailbox.

All of this makes VRFY useless as a validity checking mechanism so it scores 1/4 on the validity checking scale.

Possibility #2 - Sending a Probe Message

With this method you try to connect with a mail server and pretends to send a real mail message but cut off before sending the message content. This is wrong for a for the following reasons:

A system administrator that disabled VRFY has a policy of not allowing for the testing for email addresses. Therefore the ability to test the email address by sending a probe should be considered a bug and must not be used.

The system might be set up to detect signs up of a probe such as cutting off early may rate limit or block the sender.

In addition, the SMTP may be temporarily down or the mailbox temporarily unavailable but this method provides no resilience against failure. This is especially true if this mechanism is attempting to provide real-time feedback to the user after submitting a form.

This scores 1/4 on the validity checking scale.

Possibility #3 - Sending a Confirmation Mail

If one cares about if a human is reading the mailbox the simplest way to do so is send a confirmation mail. In the email include a link to a website (or set a special reply address) with some indication of what is being confirmed. For example, to confirm "user@example.com" is valid the link might be http://example.com/verify?email=user@example.com or http://example.com/verify?account=12345[2].

This method is resilient against temporary failures and forwarders. Temporary failures could be retried like a normal SMTP conversation.

This way it is unlikely that a non-human will trigger the verification email[3]. This approach solves some of the concerns, it suffers from a fatal flaw:

It isn't secure. It is usually trivial to guess the ID number, email account, other identifier. An attacker could sign up with someone else's email account and then go to the verification page for that user's account. It might be tempting to use a random ID but randomness implementations are usually not secure.

This scores 3/4 on the validity checking scale

Possibility #4 - Sending a Confirmation Mail + HMAC

The correct solution is to send a confirmation, but include a MAC of the identifier in the verification mechanism (reply, or url) as well. A MAC is a construction used to authenticate a message by combining a secret key and the message contents. One family of constructions, HMAC, is a particularly good choice. This way the url might become http://example.com/verify?email=user@example.com&mac=74e6f7298a9c2d168935f58c001bad88[4]

Remember that the HMAC is a specific construction, not a naive hash. It would be wise to use a framework native function such as PHP's hash_hmac. Failing to include a secret into the construction would make the MAC trivially defeated by brute force.

This scores 4/4 on the validity checking scale

Closing Notes

Getting email validation right is doable, but not as trivial as many of the existing solutions make it seem.

  1. Note that RFC1123 more specifically spells out that VRFY MUST be implemented but MAY be disabled.

  • This is not my luggage password.
  • It is still possible for a auto-reply bot to trigger reply based verification schemes. Bots that click every link in received email are uncommon.
  • This is HMAC-MD5. It isn't insecure as collisions aren't important for HMAC. I chose it because it is short.
  • n@ai is a in-use email address by a person named Ian:
    %dig +short ai MX
    10 mail.offshore.ai.
  • Thank you to bd for proofreading and reviewing this blog post.

    by Eitan Adler (noreply@blogger.com) at January 04, 2013 11:40 AM

    December 27, 2012

    Justin Dearing

    “Forking” a long running command to a new tab with ConEmu. The magic of -new_console:c

    Here’s a quick tip I’d thought I’d share after being quite rightly told to RTFM by the author of ConEmu.

    Suppose you are running FarManager from ConEmu and want to update all your chocolatey packages. You can do so with the command cup all. However, that will block your FarManager session until the cup all completes. You have four options to fix this:

    1. You can start a new tab in ConEmu with the menu. This is undesirable because you’re obviously a command line guy.
    2. You press Shift+Enter after the cup all command. This is undesirable because unless you configure ConEmu to intercept every new command window, a regular console window will appear. Also, the console will close automatically upon completion.
    3. You can type cup all & pause and hit Shift+Enter to allow the window to stay open. Or
    4. You can type cup all -new_console:c to open a new tab that will execute the command, and not close upon completion.

    Obviously I recommend option 4.

    by Justin at December 27, 2012 03:14 AM

    December 24, 2012

    Eitan Adler

    #!/bin/bash considered harmful

    When one writes a shell script there are a variety of shebang lines that could be used:

    • #!/bin/sh
    • #!/usr/bin/env bash
    • #!/bin/bash

    or one of many other options.

    Of these only the first two are possibly correct.

    Using #!/bin/bash is wrong because:

    • Sometimes bash isn't installed.
    • If it is installed, it may not be in /bin
    • If it is in /bin, the user may have decided to set PATH to use a different installation of bash. Using an absolute path like this overrides the user's choices.
    • bash shouldn't be used for scripts intended for portability

    If you have bash specific code use #!/usr/bin/env bash. If you want more portable code try using Debian's checkbashism to find instances of non-POSIX compliant shell scripting.

    by Eitan Adler (noreply@blogger.com) at December 24, 2012 06:17 PM

    December 22, 2012

    Justin Dearing

    How to reference the registry in MSBuild4.0 (Visual Studio 2010) and later on a 64 bit OS

    In the past I’ve written about using the Windows Registry to reference  assembly paths in Visual Studio. In it I made reference to the seminal article New Registry syntax in MSBuild v3.5, which is the dialect Visual Studio 2008 speaks. That syntax has served me well until recently.

    See fate lead me to writing a small C++/CLI program. In it I had to refer to some .NET assemblies that were not installed in the GAC. They were however installed as part of a software package that wrote its install path to the registry. So I figured out which value it wrote the install directory to and referenced it in the .vcxproj file using the $(Registry:HKEY_LOCAL_MACHINE\Software\Company\Product@TargetDir). Unfortunately, it didn’t work!

    I did some troubleshooting and discovered it worked when I build the vcxproj from the command line with msbuild.exe. It seemed logical to blame it one the fact that I was using C++. Devenv.exe (the Visual Studio executable) must have been treating .vcxproj files differently than csproj and vbproj files. Then suddenly it dawned it me, the problem was I was running on a 64 bit version of Windows! This was a relief, because it meant that .vcxproj were not special or subject to unique bugs.

    To make a long story short, Visual Studio is a 32 bit application, and by default when a 32 bit process interacts with the registry on a 64 bit version of Windows, HKEY_LOCAL_MACHINE\Software gets redirected to HKEY_LOCAL_MACHINE\Software\Wow6432Node. This MSDN article explains the gory details.

    At first it seemed the only workaround was a custom MSBuild task line the MSBuild Extension Pack. I complained on twitter to Scott Hanselman (blog|twitter).  He replied with this article talking about how the page faults, addressable memory space, etc was not an issue. That article made some good points. However, it didn’t address my (at the time) very real and legitimate concern. Scott said he’d ask around internally if I filed a connect bug and got David Kean (blog|twitter) involved in the conversation.  I filed a connect bug. Then David pointed out a link to the MSBuild 4.0 key GetRegistryValueFromView.

    Here is a comparison of the old and new syntax using msbuild <Message/> msbuild tasks, the printf() of msbuild.

      <Target Name="BeforeBuild">
        <!-- Read the registry using the native MSBUILD 3.5 method: http://blogs.msdn.com/b/msbuild/archive/2007/05/04/new-registry-syntax-in-msbuild-v3-5.aspx -->
        <PropertyGroup>
          <MsBuildNativeProductId>$(Registry:HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion@ProductId)</MsBuildNativeProductId>
          <MsBuildNativeProductName>$(Registry:HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion@ProductName)</MsBuildNativeProductName>
          <MsBuild4NativeProductId>$([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion', 'ProductId', null, RegistryView.Registry64))</MsBuild4NativeProductId>
          <MsBuild4NativeProductName>$([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion', 'ProductName', null, RegistryView.Registry64))</MsBuild4NativeProductName>
        </PropertyGroup>
        <!-- Lets use the MSBuild Extension Pack (still no joy) http://www.msbuildextensionpack.com/help/4.0.5.0/html/9c8ecf24-3d8d-2b2d-e986-3e026dda95fe.htm -->
        <MSBuild.ExtensionPack.Computer.Registry TaskAction="Get" RegistryHive="LocalMachine" Key="SOFTWARE\Microsoft\Windows NT\CurrentVersion" Value="ProductId">
          <Output PropertyName="MsBuildExtProductId" TaskParameter="Data" />
        </MSBuild.ExtensionPack.Computer.Registry>
        <MSBuild.ExtensionPack.Computer.Registry TaskAction="Get" RegistryHive="LocalMachine" Key="SOFTWARE\Microsoft\Windows NT\CurrentVersion" Value="ProductName">
          <Output PropertyName="MsBuildExtProductName" TaskParameter="Data" />
        </MSBuild.ExtensionPack.Computer.Registry>
        <!-- And now RegistryView: http://msdn.microsoft.com/en-us/library/microsoft.win32.registryview.aspx -->
        <MSBuild.ExtensionPack.Computer.Registry TaskAction="Get" RegistryHive="LocalMachine" Key="SOFTWARE\Microsoft\Windows NT\CurrentVersion" Value="ProductId" RegistryView="Registry64">
          <Output PropertyName="MsBuildExt64ProductId" TaskParameter="Data" />
        </MSBuild.ExtensionPack.Computer.Registry>
        <MSBuild.ExtensionPack.Computer.Registry TaskAction="Get" RegistryHive="LocalMachine" Key="SOFTWARE\Microsoft\Windows NT\CurrentVersion" Value="ProductName" RegistryView="Registry64">
          <Output PropertyName="MsBuildExt64ProductName" TaskParameter="Data" />
        </MSBuild.ExtensionPack.Computer.Registry>
        <!-- All messages are of high importance so Visual Studio will display them by default. See: http://stackoverflow.com/questions/7557562/how-do-i-get-the-message-msbuild-task-that-shows-up-in-the-visual-studio-proje -->
        <Message Importance="High" Text="Using Msbuild Native: ProductId: $(MsBuildNativeProductId) ProductName: $(MsBuildNativeProductName)" />
        <Message Importance="High" Text="Using Msbuild v4  Native: ProductId: $(MsBuild4NativeProductId) ProductName: $(MsBuild4NativeProductName)" />
        <Message Importance="High" Text="Using Msbuild Extension Pack: ProductId: $(MsBuildExtProductId) ProductName: $(MsBuildExtProductName)" />
        <Message Importance="High" Text="Using Msbuild Extension Pack: ProductId: $(MsBuildExt64ProductId) ProductName: $(MsBuildExt64ProductName)" />
      </Target>

    That MSBuild code has been tested via this github project on two machines running Visual Studio 2010 SP1. One has Windows XP3 32 bit and the other runs Windows 8 64 bit. I’ve verified that $([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\whatever', 'value', null, RegistryView.Registry64)) will give you the same value as you see in regedit.exe

    Yes MSBuild 4.0, and therefore Visual Studio 2010 solved this problem and I simply didn’t google hard enough for the answer. However, I googled pretty hard, and I’m pretty good at googling. I didn’t think I was particularly rash in “pulling the Hanselman card.” The best I can do is write this blog post, comment on other blogs and answer questions on StackOverflow to fill the internet with references to the MSBuild syntax.

    by Justin at December 22, 2012 06:24 AM

    December 21, 2012

    Eitan Adler

    Cormen on Algorithms: Blogging my way through [1/?]

    Two of my good friends recently started reading Introduction to Algorithms by Thomas H. Cormen, et. al. Being unable to resist peer pressure I decided to follow and read along.

    I plan on blogging my way through the chapters writing my answers to the questions as I go through the book. Like most of my plans they don't always work out, but one could try.

    Here it goes!





    1.1-1: Give a real-world example in which each of the following computational problems appears: (a)Sorting, (b) Determining the best order for multiplying matrices, (c) finding the convex hull of a set of points.
    Sorting - Sorting comes up in virtually every algorithm one could think of. Everything from optimizing monetary investments to efficient compression algorithms has to sort data at some point or another. A harder question might be: Name one non-trivial algorithm that doesn't require sorting.
    Multiplying Matrices - graphics and scientific problems frequently require matrix operations.
    Convex Hull - Collision detection for use in games, modeling biological systems, or other related work could make use of this
    1.1-2: Other than speed what other measures of efficiency might one use in a real-world setting?
    It is possible to optimize for (and against) every limited resource. For example minimizing the amount of memory usage is important for embedded applications (and desktop ones too). Reducing total disk I/O is important to increase the longevity of hard drives. On a less technical note optimizing for monetary cost or man hours expended is important too.
    1.1-3: Select a data structure you have seen previously and discuss its strengths and limitations
    One of the most interesting data structures I know is the Bloom Filter. It is a probabilistic data structure that can determine if an element is NOT in a set but can't determine definitively if an element is in a set. It works by hashing each element in a set to a fixed size bit array. It then ORs the hash with itself (which starts at all zeros). One can test to see if an element is in a set by generating the hash and testing to see if every bit set to 1 in the queried element is set to 1 in the filter. If it is then you have some degree of confidence that the element is in the set. Any negative means that what you are querying for has not been added.
    While most probabilistic structures have certain properties in common, bloom filters have a number of interesting pros and cons.
    1. A negative result is definitive - if a query returns that an element has not been added then one knows this to be 100% true.
    2. Since hashes are fixed size the amount of memory a Bloom Filter uses is known and bounded.
    3. Bloom filters can quickly become useless with large amounts of data. It is possible that every bit will be set to 1 which effectively makes the query a NOP.
    4. It is impossible to remove data from a bloom filter. One can't just set all the bits of the hash to a zero because that might be removing other elements as well.
    5. Without a second set of data there is no way to deterministically list all elements (unlike other probabilistic data structures such as Skip Lists).
    1.1-4: How are the shortest path and traveling salesmen problems similar? How are they different?
    The shortest path problem is
    Given a weighted (undirected) graph G:, a start vertex $V_0$ and an end vertex $V_e$, find a path between $V_0$ and $V_e$ such that the sum of the weights is minimized. This could be expanded to $Given a weighted graph G:, find a path between every pair such that the sum of the weights for each path is minimized.
    Traveling salesman is defined as:
    Given a weighted, undirected, graph G: and a start vertex $V_0$ find a path starting and ending at $V_0$ such that it passes through every other vertex exactly once and the sum of the weights is minimized.
    The traveling salesman problem might make use of the shortest path problem repeatedly in order to come up with the correct solution.
    1.1-5: Come up with a real-world problem in which only the best solution will do. Then come up with a problem in which a solution that is "approximately" the best will do?
    There are very few problems where one needs the objectively optimal solution. Mathematical questions are the only problems I could think of that need that level of accuracy. Virtually every problem needs a good enough solution. Some examples include finding a fast route for packets on the internet or locating a piece of data in a database.
    update 2011-06-30: modified text of answers 1.1-3 and 1.1-5 to be more clear.

    by Eitan Adler (noreply@blogger.com) at December 21, 2012 09:14 AM

    December 14, 2012

    Justin Dearing

    Trouble purchasing an MSDN subscription

    Recently I’ve decided to purchase a Visual Studio 2012 Professional  MSDN subscription. There are several reasons for this. First of all, my Visual Studio 2012 30 day trial ran out and I absolutely need the non-express edition of it for a side project. Secondly, I’d like to be able to test poshrunner in older versions of Windows. Thirdly, Having access to checked builds of Windows would allow me to lean more in my Windows Internals study group.

    I started my journey to an MSDN subscription on Saturday December 8th 2012. I was able to access my benefits Thursday December 12th. The four day journey was not pleasant.

    On Saturday I sat down credit card in hand and placed my order. I didn’t save the receipt (stupid I know). I got no confirmation email, and I did not see an authorization on my credit card. I waited. On Sunday I got notification that my order was pending. Perhaps they wanted to verify I wasn’t a software pirate. It seemed annoying that this wasn’t an instant process, but I remained patient and understanding. Then Tuesday I woke up to an email stating that my order was canceled.

    MSDN customer support hours are from 5:30PST to 17:30PST. I am on EST so I had to wait until 8:30 to call. I was already in the office at that time. I was told the bank did not accept my charge, but that if I placed the order again in 48 hours, the security check would be overridden and I would be able to download the software instantaneously  I tried buying the MSDN license again. It failed, but instantaneously.  I called my bank. I was told both authorizations were successful on their end. So I called Microsoft again. They claimed a system glitch prevented them from accepting the payment. The specific phrase “system glitch” was used consistently by several MSDN customer support representatives over several phone calls to describe instances when my bank authorized a charge but Microsoft rejected it. I never uttered that phrase once. I’m suspicious this is a common enough occurrence that there are procedures and guidelines in place documenting the “system glitch”.

    At this point they asked if I placed the second order from a computer on the same network as the first. I said no. The first order was placed at home and the second order was placed in the office. I was told to try again from the same network. I don’t have remote access to my home computer (take away my geek card) so I had to wait till I got home. I asked what would happen if it didn’t work when I tried again. I was told the only other option was to place the order over the phone, and that phone orders take three business days to process. I didn’t get home until after midnight so I didn’t try Tuesday night.

    Wednesday

    Wednesday I awoke and attempted to place the order. It failed. I went into the office,  called customer support and attempted a phone order. It failed, because my bank decided three identical charges for $1,305.41 (Microsoft collects sales tax in NY on top of the $1199 base price) seemed suspicious. Luckily I am able to fix that by responding to a text message CitiBank sent me. A chat session and a call later and the purchase seems to have been resolved. I would have my subscription on Monday.

    Thursday

    Thursday I got a call saying my order was canceled. However, T-Mobile dropped the call before I could deal with it. When I had some free time I called CitiBank. The first operator gave me some free airline miles and transfered me to Ashley, the fraud department specialist. Ashley ensured me Microsoft could bang my credit card as often and as many times as they wanted to. I then called MSDN support and talked to Chris.

    I summarized the situation for Chris. I told him I didn’t want to wait another three days for a phone order. He said he had no power to deal with that. He determined my order from Wednesday was still going through. After putting me on hold a few times, he said he would get me a welcome email that would let me download my MSDN products in 30 minutes. I got his name and a case number and he did just that.  I got a call back to ensure I was able to access my download, and everything worked just fine. I’m a little curious as to why his tune changed and he was able to get me my subscription number in thirty minutes though.

    Conclusion

    First of all I have to thank CitiBank for their actions. At no point did they do anything wrong or fail to do anything. Secondly, the customer service staff at MSDN were very professional and understanding, despite my growing irateness. However, the fact is they were never able to tell me why my order was canceled. If they at some point explained that I was flagged as a pirate, or something else, I’d be a bit more understanding. Thirdly, why does the process take so long? I was able to buy a new car in about an hour. It took a few days for delivery because the package I wanted wasn’t on the lot. However, it took less than four days for the car to be driver off the lot (by someone else because it was the car I learned stick on).

    The MSDN subscription sales model seems to make sense for businesses purchasing volumne licenses. They take checks, you can talk to a real person. Its not at all optimized for the person that wants to buy one MSDN license “right now”. People like me are on the lower end of the income bracket for Microsoft, but we are also the ones that are either really passionate hobbyists, entrepreneurs, or the people on the fence. While I’m still going to develop on the Microsoft stack for years, this experience has left a bad taste in my mouth for their purchase process, compared with for example JetBrains or RedGate.

    In the end the real issue was the lack of transparency. Its generally safe to assume that when you are buying software for online delivery, you will have it within an hour. If Microsoft made it clear its not as simple for them, first time subscribers like me would be a little more understanding.

    by Justin at December 14, 2012 03:31 AM

    December 10, 2012

    Justin Dearing

    Announcing ILRepack-BuildTasks

    ILMerge is a great tool for creating a single executable out of multiple .NET assemblies. However, it has two limitations. The first is that its not open source, and you’re not supposed to include a copy in your public source code repos. The second is its an executable and therefore needs to be called from the MSBuild post build event as opposed to a proper MSBuild task. Each problem had its own mutually exclusive solution.

    For the first problem, Francois Valdy (blog|twitter) wrote IL-Repack, an open source clone of ILMerge. So now I could have an exe that could be included in github repos.  This allowed my projects (specifically poshrunner.exe) to have a merge step in the postbuild. ALthough this was still a clunky batch file embedded in the csproj, it just worked.

    For the second problem, Marcus Griep (blog|twitter) created ILMerge Tasks. Since the merging APIs in ILMerge are all exposed as public members, you can simply reference the exe as a dll. He did this in an MSBuild DLL. However, this dll still requires ILMerge.exe.

    These solutions are no longer mutually exclusive. I’ve forked ILMerge-tasks (and contacted Marcus to see if he wants to incorporate my changes). I had it reference ILRepack. The new project is called ILRepack-BuildTasks on github. Enjoy!

    by Justin at December 10, 2012 03:46 AM

    A misleading SQL Error Message Error: 18456, Severity: 14, State: 38

    On Friday I had to help a client out with an error that kept appearing in their event logs:

    Login failed for user ‘domain\user’. Reason: Failed to open the explicitly specified database. [CLIENT: 192.168.0.25]

    It took me a while to troubleshoot the error. The client’s internal system administrator (who was quite sharp) only had to call me in in the first place because the error was a little misleading. See the first thing I did when I saw that was audit login failures. In the trace, the database was listed as master. The user had full access to master. However, I later learned that the user was switching from master to a non-existent database, which was triggering this error. I figured this out thanks to Sadequl Hussain‘s article, SQL Server Error 18456: Finding the Missing Databases.

    Sadequl explains in detail the how and the why. However, the take home is you need to trace for User Error Message to get the message that tells you what database you are connecting to.

    This took me about an hour to solve. Honestly, it was a bit humbling of an experience. It took me an hour to figure out something a full time senior DBA would probably be able to solve in 15 minutes. However, I’ll probably be able to solve this error in 15 minutes myself go forward. Finally, the fact that it took me a while to find this one blog article that explained what the issue actually was proves how dependent I’ve become upon google.

    by Justin at December 10, 2012 03:25 AM

    December 05, 2012

    Justin Dearing

    The #MongoHelp twitter manifesto

    What is #mongohelp?

    #mongohelp is a hashtag on twitter that members of the mongo community use for support.

    What’s appropiate to tag #mongohelp?

    In order for something to be appropiate for appending the #mongohelp hash tag to one of the following two criteria must be met

    1. You are asking a question related to MongoDb
    2. You are @replying to a question #mongohelp with an answer or a request for clarification

    Those are the rules. You can reply with a recommendation for a commercial product or service, but please disclose if you work for partner with or own the product. You can’t make unsolicited promotions with this hash tag. You can’t post a link to your latest blog article to the tag, unless you are answering a question.

    Any other guidelines?

    Twitter is about instant gratification, so if you ask a question on #mongohelp, its expected you will be sticking around for 10-15 minutes for an answer. Also, if you have a long question to ask on #mongohelp, you should ask it in one of these forums, and link to the question.

    This seems awfully familar

    Your absolutely right! I borrowed the idea from Aaron Nelson’s (blog|twitter) proposal that was documented by Brent Ozar (blog|twitter) of creating the #sqlhelp hashtag. I’ve spent the last year speaking about MongoDb at SQL Saturdays and after observing both communities. Both communities are very self organized, and provide a lot of free help. The one thing I saw missing from the MongoDB community was a grassroots support tag to connect with others.

    by Justin at December 05, 2012 01:03 AM