# Planet LILUG

## December 01, 2013

### Josef "Jeff" Sipek

A couple of months ago, I decided to update my almost two and a half year old laptop. Twice.

First, I got more RAM. This upped it to 12 GB. While still on the low side for a box which actually gets to see some heavy usage (compiling illumos takes a couple of hours and generates a couple of GB of binaries), it was better than the 4 GB I used for way too long.

Second, I decided to bite the bullet and replaced the 320 GB disk with a 256 GB SSD (Samsung 840 Pro). Sadly, in the process I had the pleasure of reinstalling the system — both Windows 7 and OpenIndiana. Overall, the installation was uneventful as my Windows partition has no user data and my OI storage is split into two pools (one for system and one for my data).

The nice thing about reinstalling OI was getting back to a stock OI setup. A while ago, I managed to play with software packaging a bit too much and before I knew it I was using a customized fork of OI that I had no intention of maintaining. Of course, I didn’t realize this until it was too late to rollback. Oops. (Specifically, I had a custom pkg build which was incompatible with all versions OI ever released.)

One of the painful things about my messed-up-OI install was that I was running a debug build of illumos. This made some things pretty slow. One such thing was boot. The ZFS related pieces took about a minute alone to complete. The whole boot procedure took about 2.5 minutes. Currently, with a non-debug build and an SSD, my laptop goes from Grub prompt to gdm login in about 40 seconds. I realize that this is an apples to oranges comparison.

I knew SSDs were supposed to be blazing fast, but I resisted getting one for the longest time mostly due to reliability concerns. What changed my mind? I got to use a couple of SSDs in my workstation at work. I saw the performance and I figured that ZFS would take care of alerting me of any corruption. Since most of my work is version controlled, chances are that I wouldn’t lose anything. Lastly, SSDs got a fair amount of improvements over the past few years.

## November 29, 2013

### Josef "Jeff" Sipek

#### Biometrics

Last week I got to spend a bit of time in NYC with obiwan. He’s never been in New York, so he did the tourist thing. I got to tag along on Friday. We went to the Statue of Liberty, Ellis Island, and a pizza place.

You may have noticed that this post is titled “Biometrics,” so what’s NYC got to do with biometrics? Pretty simple. In order to get into the Statue of Liberty, you have to first surrender your bags to a locker and then you have to go through a metal detector. (This is the second time you go through a metal detector — the first is in Battery Park before you get on the boat to Liberty Island.) Once on Liberty Island, you go into a tent before the entrance where you get to leave your bags and $2. Among the maybe 500–600 lockers, there are two or three touch screen interfaces. You use these to rent a locker. After selecting the language you wish to communicate in and feeding in the money, a strobe light goes off blinding you — this is to indicate where you are supposed to place your finger to have your finger print scanned. Your desire to rent a locker aside, you want to put your finger on the scanner to make the strobe go away. Anyway, once the system is happy it pops a random (unused) locker open and tells you to use it. What could possibly go wrong. After visiting the statue, we got back to the tent to liberate the bags. At the same touch screen interface, we entered in the locker number and when prompted scanned the correct finger. The fingerprint did not get recognized. After repeating the process about a dozen times, it was time to talk to the people running the place about the malfunction. The person asked for the locker number, went to the same interface that we used, used what looked like a one-wire key fob near the top of the device to get an admin interface and then unlocked the locker. That’s it. No verification of if we actually owned the contents of the locker. I suppose this is no different from a (physical) key operated locker for which you lost the key. The person in charge of renting the lockers has no way to verify your claim to the contents of the locker. Physical keys, however, are extremely durable compared to the rather finicky fingerprint scanners that won’t recognize you if you look at them the wrong way (or have oily or dirty fingers in a different way than they expect). My guess the reason the park service went with a fingerprint based solution instead of a more traditional physical key based solution is simple: people can’t lose the locker keys if you don’t use them. Now, are cheap fingerprint readers accurate enough to not malfunction like this often? Are the people supervising the locker system generally this apathetic about opening a locker without any questions? I do not know, but my observations so far are not very positive. I suspect more expensive fingerprint readers will perform better. It just doesn’t make sense for something as cheap as a locker to use the more expensive readers. ## November 26, 2013 ### Nate Berry #### Increase disk size of Ubuntu guest running in VMware A while ago I created a virtual machine (VM) under VMware 5.1 with Ubuntu as the guest OS. I wasn’t giving the task my full attention and I made a couple choices without thinking when setting this one up. The problem I ended up with is that I only allocated about 10GB to the VM […] ## November 04, 2013 ### Eitan Adler #### Two Factor Authentication for SSH (with Google Authenticator) Two factor authentication is a method of ensuring that a user has a physical device in addition to their password when logging in to some service. This works by using a time (or counter) based code which is generated by the device and checked by the host machine. Google provides a service which allows one to use their phone as the physical device using a simple app. This service can be easily configured and greatly increases the security of your host. #### Installing Dependencies 1. There is only one: the Google-Authenticator software itself: # pkg install pam_google_authenticator 2. On older FreeBSD intallations you may use: # pkg_add -r pam_google_authenticator On Debian derived systems use: # apt-get install libpam-google-authenticator #### User configuration Each user must run "google-authenticator" once prior to being able to login with ssh. This will be followed by a series of yes/no prompts which are fairly self-explanatory. Note that the alternate to time-based is to use a counter. It is easy to lose track of which number you are at so most people prefer time-based. 1. $ google-authenticatorDo you want authentication tokens to be time-based (y/n)...
Make sure to save the URL or secret key generated here as it will be required later.

#### Host Configuration

To enable use of Authenticator the host must be set up to use PAM which must be configured to prompt for Authenticator.
1. Edit the file /etc/pam.d/sshd and add the following in the "auth" section prior to pam_unix:
2. Edit /etc/ssh/sshd_config and uncomment
ChallengeResponseAuthentication yes

1. Finally, the ssh server needs to reload its configuration:
# service sshd reload

#### Configure the device

1. Follow the instructions provided by Google to install the authentication app and setup the phone.

That is it. Try logging into your machine from a remote machine now

Thanks bcallah for proof-reading this post.

## October 21, 2013

### Josef "Jeff" Sipek

#### Private Pilot, Honeymooning, etc.

Early September was a pretty busy time for me. First, I got my private pilot certificate. Then, three days later, Holly and I got married. We used this as an excuse to take four weeks off and have a nice long honeymoon in Europe (mostly in Prague).

Our flight to Prague (LKPR) had a layover at KJFK. While waiting at the gate at KDTW, I decided to talk to the pilots. They said I should stop by and say hi after we land at JFK. So I did. Holly tagged along.

I am impressed with the types of displays they use. Even with direct sunlight you can easily read them.

After about a week in Prague, we rented a plane (a 1982 Cessna 172P) with an instructor and flew around Czech Republic looking at the castles.

I did all the flying, but I let the instructor do all the radio work, and since he was way more familiar with the area he ended up acting sort of like a tour guide. Holly sat behind me and had a blast with the cameras. The flight took us over  Bezděz,  Ještěd,  Bohemian Paradise, and  Jičín where we stopped for tea. Then we took off again, and headed south over  Konopiště,  Karlštejn, and  Křivoklát. Overall, I logged 3.1 hours in European airspace.

## October 05, 2013

### dotCOMmie

#### Debian GNU / Linux on Samsung ATIV Book 9 Plus

Samsung just recently released a new piece of kit, ATIV Book 9 plus. Its their top of the line Ultrabook. Being in on the market for a new laptop, when I heard of the specs, I was hooked. Sure it doesn't have the best CPU in a laptop or even amazing amount of ram, in that regard its kind of run of the mill. But that was enough for me. The really amazing thing is the screen, with 3200x1800 resolution and 275DPI. If you were to get a stand alone monitor with similar resolution you'd be forking over anywhere from 50-200% the value of the ATIV Book 9 Plus. Anyway this is not a marketing pitch. As a GNU / Linux user, buying bleeding edge hardware can be a bit intimidating. The problem is that it's not clear if the hardware will work without too much fuss. I couldn't find any reports or folks running GNU / Linux on it, but decided to order one anyway.

My distro of choice is Debian GNU / Linux. So when the machine arrived the first thing I did was, try Debian Live. It did get some tinkering of BIOS (press f2 on boot to enter config) to get it to boot. Mostly because the BIOS UI is horrendus. In the end disabling secure boot was pretty much all it took. Out of the box, most things worked, exception being Wi-Fi and brightness control. At this point I was more or less convinced that getting GNU / Linux running on it would not be too hard.

I proceeded to installing Debian from stable net-boot cd. At first with UEFI enabled but secure boot disabled, installation went over fine but when it came time to boot the machine, it would simply not work. Looked like boot loader wasn't starting properly. I didn't care too much about UEFI so I disabled it completely and re-installed Debian. This time things worked and Debian Stable booted up. I tweaked /etc/apt/sources.list switching from Stable to Testing. Rebooted the machine and noticed that on boot the screen went black. It was rather obvious that the problem was with KMS. Likely the root of the problem was the new kernel (linux-image-3.10-3-amd64) which got pulled in during upgrade to testing. The short term work around is simple, disable KMS (add nomodeset to kernel boot line in grub).

So now I had a booting base system but there was still the problem of Wi-Fi and KMS. I installed latest firmware-iwlwifi which had the required firmware for Intel Corporation Wireless 7260. However Wi-Fi still did not work, fortunately I came across this post on arch linux wiki which states that the Wi-Fi card is only supported in Linux Kernel >=3.11.

After an hour or so of tinkering with kernel configs I got the latest kernel (3.11.3) to boot with working KMS and Wi-Fi. Long story short, until Debian moves to kernel >3.11 you'll need to compile your own or install my custom compiled package. With the latest kernel pretty much everything works this machine. Including the things that are often tricky, like; suspend, backlight control, touchscreen, and obviously Wi-Fi. The only thing remaining thing to figure out, are the volume and keyboard backlight control keys. But for now I'm making due with a software sound mixer. And keyboard backlight can be adjusted with (values: 0-4):

echo "4" > /sys/class/leds/samsung\:\:kbd_backlight/brightness

So if you are looking to get Samsung ATIV Book 9 and wondering if it'll play nice with GNU / Linux. The answer is yes.

## August 28, 2013

### Josef "Jeff" Sipek

#### Optimizing for Failure

For the past two years, I’ve been working at Barracuda Networks on a key-value storage system called Moebius. As with any other software project, the development was more focused on stability and basic functionality at first. However lately, we managed to get some spare cycles to consider tackling some of the big features we’ve been wishing for as well as revisiting some of the initial decisions. This includes error handling — specifically how and what size of hardware failures should be handled. During this brainstorming, I made an interesting (in my opinion) observation regarding optimizing systems.

If you take any computer architecture or organization course, you will hear about  Amdahl’s law. Even if you never took an architecture course or just never heard of Amdahl, eventually you came to the realization that one should optimize for the common case. (Technically, Amdahl’s law is about parallel speedup but the idea of an upper bound on performance improvement applies here as well.) A couple of years ago, when I used to spend more time around architecture people, a day wouldn’t go by when I didn’t hear them focus on making the common case fast, and the uncommon case correct — as well as always guaranteeing forward progress.

My realization is that straightforward optimization for the common case is not sufficient. I’m not claiming that my realization is novel in any way. Simply that it surprised me more than it should.

Suppose you are writing a storage system. The common case (all hardware and software operate correctly) has been optimized and the whole storage system is performing great. Now, suppose that a hardware failure (or even a bug in other software!) occurs. Since this is a rare occurence, you did not optimize for it. The system is still operating, but you want to take some corrective action. Sadly, the failure has caused the system to no longer operate under the common case. So, you have a degraded system whose performance is hindering your corrective action! Ouch!

The answer is to optimize not just for the common case, but for some uncommon cases. Which uncommon cases? Well, the most common ones. :) The problem in the above scenario could have been (hopefully) avoided by not just optimizing for the common case, but also optimizing for the common failure! This is the weird bit… optimize for failures because you will see them.

In the case of a storage system, some failures to consider include:

• one or more disks failing
• random bit flips on one or more disks
• one or more disks responding slowly
• one or more disks temporarily disappearing and shortly after reappearing
• low memory conditions

This list is far from exhaustive. You may even decide that some of these failures are outside the scope of your storage system’s reliability guarantees. But no matter what you decide, you need to keep in mind that your system will see failures and it must still behave well enough to not be a hindrance.

None of what I have written here is ground breaking. I just found it sufficiently different from what one normally hears that I thought I would write it up. Sorry architecture friends, the uncommon case needs to be fast too :)

## August 03, 2013

### John Lutz

#### a theorical p2p dynamic messaging and/or voting system which is open sourced for the people.

The standard model of internet activity is client/server. One server to each client. Another paradigm which is much less often used is Peer to Peer (p2p).

Peer to Peer allows each client on the internet to serve and well as receive as a client. This allows for a self-administration and self corrective design. But what is most useful and trusted is that control is decentralized. This is good for many reasons; for both uptime and power abuse is *greatly* diminished.

Some common examples of p2p in action is Bitcoin, Tor and Bittorrent. With the exception of Tor all of these p2p systems are Open Source. Open Source provides the user with the optional ability to self compile and is critical is mediated control to every user involved. It also allows those black boxes called 'Apps', 'Programs','Systems' or 'Applications' the ability for peer review so that nothing suspicious happens without you knowing. (for example bluetooth and web cames automatically set to record and run as the default behaviour.) [I have band aids covering all my personal laptop cameras.]

There are many services with systems provided with Apple and most notable Windows that prevent us from knowing what traffic becomes transmitted from out personal technological devices. Open Source allows us to not only conserve our personal information but also to extend and share what we've done with those we see fit. Open Source is equated here with Power to The People. And services such as support, administration or development can also use the monetary model. It all depends on each individual situation. The possibilities of Open Source and indeed amazing.

There are many forms of Open Source software, but none so wordwide known as an operating system called Linux. There are even, in itself many forms of people and group modified Linux. I have with mixed success have used and administrated Debian, Ubuntu, Red Hat, CentOS and SuSE. Depending on any of these or many many other publically downloadable variations it can take as little as 25 minutes or 3 days to successfully fully install and configure these softwares and typical internet apps. You can literally change the source code (if you had a little swagger) to make them change their default behaviour.

A p2p system in which a bulletin board system (ala a variation of a standard non-p2p model phpBB or vBulletin) delivered messages with time stamps and backups to other nodes could theorically be created along the lines of how famous p2p systems like Bitcoin operate. Except in the case of this theoretical framework instead of virtual currency it would be messages, votes, blogs. Any kind of data. This framework would be a nonstop behemoth using potentially hundreds of thousands or even millions of clients, who in themselves, also acts as servers. Being free from a centralized control system in which the system changes according to the whims of the few are a vital success in producing, like all good policies, a check and balance system free from tyranny.

I would suggest if ISPS started to ban protocol ports (for example port XXXX where X=1 to 65526) like bitttorrent , a programmer could creatively reprogram this new theoretical p2p message system to alternate between different ports dynamically. That way each very powerful ISP could not ban the people's p2p messaging system as they have, which in my case, was bittorrent.

I hope you have fully understood what I have written here. If you have any more questions or would like any more indepth to what I've presented here please let me know on @john_t_lutz on twitter. Or here in this blog. Thank you.

Worldy Yours,
John Lutz

## July 16, 2013

### Josef "Jeff" Sipek

#### nftw(3)

I just found out about nftw — a libc function to walk a file tree. I did not realize that libc had such a high-level function. I bet it’ll end up saving me time (and code) at some point in the future.

int nftw(const char *path, int (*fn) (const char *, const struct stat *,
int, struct FTW *), int depth,
int flags);


Given a path, it executes the callback for each file and directory in that tree. Very cool.

Ok, as I write this post, I am told that nftw is a great way to write dangerous code. Aside from easily writing dangerous things equivalent to rm -rf, I could see not specifying the FTW_PHYS to be dangerous as symlinks will get followed without any notification.

I guess I’ll play around with it a little to see what the right way (if any?) to use it is.

## June 29, 2013

### Josef "Jeff" Sipek

#### Isis

After several years of having a desktop at home that’s been unplugged and unused I decided that it was time to make a home server to do some of my development on and just to keep files stored safely and redundantly. This was in August 2011. A lot has happened since then. First of all, I rebuilt the OpenIndiana (an Illumos-based distribution) setup with SmartOS (another Illumos-based distribution). Since I wrote most of this a long time ago, some of the information below is obsolete. I am sharing it anyway since others may find it useful. Toward the end of the post, I’ll go over SmartOS rebuild. As you may have guessed, the hostname for this box ended up being  Isis.

First of all, I should list my goals.

storage box
The obvious mix for digital photos, source code repositories, assorted documents, and email backup is easy enough to store. It however becomes a nightmare if you need to keep track where they are (i.e., which of the two external disks, public server (Odin), laptop drives, desktop drives they are on). Since none of them are explicitly public, it makes sense to keep them near home instead on my public server that’s in a data-center with a fairly slow uplink (1 Mbit/s burstable to 10 Mbits/s, billed at 95th percentile).
dev box
I have a fast enough laptop (Thinkpad T520), but a beefier system that I can let compile large amounts of code is always nice. It will also let me run several virtual machines and zones comfortably — for development, system administration experiments, and other fun stuff.
router
I have an old Linksys WRT54G (rev. 3) that has served me well for the years. Sadly, it is getting a bit in my way — IPv6 tunneling over IPv4 is difficult, the 100 Mbit/s switch makes it harder to transfer files between computers, etc. If I am making a server that will be always on, it should handle effortlessly NAT’ing my Comcast internet connection. Having a full-fledged server doing the routing will also let me do better traffic shaping & filtering to make the connection feel better.

Now that you know what sort of goals I have, let’s take a closer look at the requirments for the hardware.

1. reliable
2. friendly to OpenIndiana and ZFS
3. low-power
4. fast
5. virtualization assists (to support run virtual machines at reasonable speed)
6. cheap
7. quiet
8. spacious (storage-wise)

While each one of them is pretty easy to accomplish, their combination is much harder to achieve. Also note that is ordered from most to least important. As you will see, reliability dictated many of my choices.

#### The Shopping List

CPU
Intel Xeon E3-1230 Sandy Bridge 3.2GHz LGA 1155 80W Quad-Core Server Processor BX80623E31230
RAM
Kingston ValueRAM 4GB 240-Pin DDR3 SDRAM DDR3 1333 ECC Unbuffered Server Memory Model KVR1333D3E9S/4G
Motherboard
SUPERMICRO MBD-X9SCL-O LGA 1155 Intel C202 Micro ATX Intel Xeon E3 Server Motherboard
Case
SUPERMICRO CSE-743T-500B Black Pedestal Server Case
Data Drives (3)
Seagate Barracuda Green ST2000DL003 2TB 5900 RPM SATA 6.0Gb/s 3.5"
System Drives (2)
Western Digital WD1600BEVT 160 GB 5400RPM SATA 8 MB 2.5-Inch Notebook Hard Drive
Intel EXPI9301CT 10/100/1000Mbps PCI-Express Desktop Adapter Gigabit CT

To measure the power utilization, I got a P3 International P4400 Kill A Watt Electricity Usage Monitor. All my power usage numbers are based on watching the digital display.

#### Intel vs. AMD

I’ve read Constantin’s OpenSolaris ZFS Home Server Reference Design and I couldn’t help but agree that ECC should be a standard feature on all processors. Constantin pointed out that many more AMD processors support ECC and that as long as you got a motherboard that supported it as well you are set. I started looking around at AMD processors but my search was derailed by Joyent’s announcement that they ported KVM to Illumos — the core of OpenIndiana including the kernel. Unfortunately for AMD, this port supports only Intel CPUs. I switched gears and started looking at Intel CPUs.

In a way I wish I had a better reason for choosing Intel over AMD but that’s the truth. I didn’t want to wait for AMD’s processors to be supported by the KVM port.

So, why did I get a 3.2GHz Xeon (E3-1230)? I actually started by looking for motherboards. At first, I looked at desktop (read: cheap) motherboards. Sadly, none of the Intel-based boards I’ve seen supported ECC memory. Looking at server-class boards made the search for ECC support trivial. I was surprised to find a Supermicro motherboard (MBD-X9SCL-O) for $160. It supports up to 32 GB of ECC RAM (4x 8 GB DIMMs). Rather cheap, ECC memory, dual gigabit LAN (even though one of the LAN ports uses the Intel 82579 which was unsupported by OpenIndiana at the time), 6 SATA II ports — a nice board by any standard. This motherboard uses the LGA 1155 socket. That more or less means that I was “stuck” with getting a Sandy Bridge processor. :-D The E3-1230 is one of the slower E3 series processors, but it is still very fast compared to most of the other processors in the same price range. Additionally, it’s “only” 80 Watt chip compared to many 95 or even 130 Watt chips from the previous series. There you have it. The processor was more or less determined by the motherboard choice. Well, that’s being rather unfair. It just ended up being a good combination of processor and motherboard — a cheap server board and near-bottom-of-the-line processor that happens to be really sweet. Now that I had a processor and a motherboard picked out, it was time to get RAM. In the past, I’ve had good luck with Kingston, and since it happened to be the cheapest ECC 4 GB DIMMs on NewEgg, I got 4 — for a grand total of 16 GB. #### Case I will let you know a secret. I love hotswap drive bays. They just make your life easier — from being able to lift a case up high to put it on a shelf without having to lift all those heavy drives at the same time, to quickly replacing a dead drive without taking the whole system down. I like my public server’s case (Supermicro CSE-743T-645B) but the 645 Watt power supply is really an overkill for my needs. The four 5000 RPM fans on the midplane are pretty loud when they go full speed. I looked around, and I found a 500 Watt (80%+ efficiency) variant of the case (CSE-743-500B). Still a beefy power supply but closer to what one sees in high end desktops. With this case, I get eight 3.5" hot-swap bays, and three 5.25" external (non-hotswap) bays. This case shouldn’t be a limiting factor in any way. I intended to move my DVD+RW drive from my desktop but that didn’t work out as well as I hoped. #### Storage At the time I was constructing Isis, I was experimenting with ZFS on OpenIndiana. I was more than impressed, and I wanted it to manage the storage on my home sever. ZFS is more than just a filesystem, it is also a volume manager. In other words, you can give it multiple disks and tell it to put your data on them in several different ways that closely resemble RAID levels. It can stripe, mirror, or calculate one to three parities. Wikipedia has a nice article outlining ZFS’s features. Anyway, I strongly support ZFS’s attitude toward losing data — do everything to prevent it in the first place. Hard drives are very interesting devices. Their reliability varies with so many variables (e.g., manufacturing defects, firmware bugs). In general, manufacturers give you fairly meaningless looking, yet impressive sounding numbers about their drives reliability. Richard Elling made a great blog post where he analyzed ZFS RAID space versus Mean-Time-To-Data-Loss, or MTTDL for short. (Later, he analyzed a different MTTDL model.) The short version of the story is nicely summed up by this graph (taken from Richard’s blog): While this scatter plot is for a specific model of a high-end server, it applies to storage in general. I like how the various types of redundancy clump up. Anyway, how much do I care about my files? Most of my code lives in distributed version control systems, so losing one machine wouldn’t be a problem for those. The other files would be a bigger problem. While it wouldn’t be a complete end of the world if I lost all my photos, I’d rather not lose them. This goes back to the requirements list — I prefer reliable over spacious. That’s why I went with 3-way mirror of 2 TB Seagate Barracuda Green drives. It gets me only 2 TB of usable space, but at the same time I should be able to keep my files forever. These are the data drives. I also got two 2.5" 160 GB Western Digital laptop drives to hold the system files — mirrored of course. Around the same time I was discovering that the only sane way to keep your files was mirroring, I stumbled across Constantin’s RAID Greed post. He basically says the same thing — use 3-way mirror and your files will be happy. Now, you might be asking… 2 TB, that’s not a lot of space. What if you out grow it? My answer is simple: ZFS handles that for me. I can easily buy three more drives, plug them in and add them as a second 3-way mirror and ZFS will happily stripe across the two mirrors. I considered buying 6 disks right away, but realized that it’ll probably be at least 6-9 months before I’ll have more than 2 TB of data. So, if I postpone the purchase of the 3 additional drives, I can save money. It turns out that a year and a half later, I’m still below 70% of the 2 TB. #### Miscellaneous I knew that one of the on-board LAN ports was not yet supported by Illumos, and so I threw a PCI-e Gigabit ethernet card into the shopping cart. I went with an Intel gigabit card. Illumos has since gained support for 82579-based NICs, but I’m lazy and so I’m still using the PCI-e NIC. #### Base System As the ordered components started showing up, I started assembling them. Thankfully, the CPU, RAM, motherboard, and case showed up at the same time preventing me from going crazy. The CPU came with a stock Intel heatsink. The system started up fine. I went into the BIOS and did the usual new-system tweaking — make sure SATA ports are in AHCI mode, stagger the disk spinup to prevent unnecessary load peaks at boot, change the boot order to skip PXE, etc. While roaming around the menu options, I discovered that the motherboard can boot from iSCSI. Pretty neat, but useless for me on this system. The BIOS has a menu screen that displays the fan speeds and the system and processor temperatures. With the fan on the heatsink and only one midplane fan connected the system ran at about 1°C higher than room temperature and the CPU was about 7°C higher than room temperature. #### OS Installation Anyway, it was time to install OpenIndiana. I put my desktop’s DVD+RW in the case and then realized that the motherboard doesn’t have any IDE ports! Oh well, time to use a USB flash drive instead. At this point, I had only the 2 system drives. I connected one to the first SATA port, put a 151 development snapshot (text installer) on my only USB flash drive. The installer booted just fine. Installation was uneventful. The one potentially out of the ordinary thing I did was to not configure any networking. Instead, I set it up manually after the first boot, but more about that later. With OI installed on one disk, it was time to set up the rpool mirror. I used Constantin’s Mirroring Your ZFS Root Pool as the general guide even though it is pretty straight forward — duplicate the partition (and slice) scheme on the second disk, add the new slice to the root pool, and then install grub on it. Everything worked out nicely. # zpool status rpool pool: rpool state: ONLINE scan: scrub repaired 0 in 0h5m with 0 errors on Sun Sep 18 14:15:24 2011 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t0d0s0 ONLINE 0 0 0 c2t1d0s0 ONLINE 0 0 0 errors: No known data errors  #### Networking Since I wanted this box to act as a router, the network setup was a bit more…complicated (and quite possibly over-engineered). This is why I elected to do all the network setup by hand later than having to “fix” whatever damage the installer did. :) I powered it off, put in the extra ethernet card I got, and powered it back on. To my surprise, the new device didn’t show up in dladm. I remembered that I should trigger the device reconfiguration. A short touch /reconfigure && reboot later, dladm listed two physical NICs. As you can see, I decided that the routing should be done in a zone. This way, all the routing settings are nicely contained in a single place that does nothing else. Setting up the virtual interfaces was pretty easy thanks to dladm. Setting the static IP on the global zone was equally trivial. # dladm create-vlan -l e1000g0 -v 11 vlan11 # dladm create-vnic -l e1000g0 vlan0 # dladm create-vnic -l e1000g0 internal0 # dladm create-vnic -l e1000g1 isp0 # dladm create-etherstub zoneswitch0 # dladm create-vnic -l zoneswitch0 zone_router0 # ipadm create-if internal0 # ipadm create-addr -T static -a local=10.0.0.2/24 internal/v4  You might be wondering about the vlan11 interface that’s on a separate VLAN. The idea was to have my WRT54G continue serving as a wifi access point, but have all the traffic end up on VLAN #11. The router zone would then get to decide whether the user is worthy of LAN or Internet access. I never finished poking around the WRT54G to figure out how to have it dump everything on a VLAN #11 instead of the default #0. ##### Router zone OpenSolaris (and therefore all Illumos derivatives) has a wonderful feature called zones. It is essentially a super-lightweight virtualization mechanism. While talking to a couple of people on IRC, I decided that I, like them, would use a dedicated zone as a router. Just before I set up the router zone, the storage disks arrived. The router zone ended up being stored on this array. See the storage section below for details about this storage pool. After installing the zone via zonecfg and zoneadm, it was time to set up the routing and firewalling. First, install the ipfilter package (pkg install pkg:/network/ipfilter). Now, it is time to configure the NAT and filter rules. NAT is easy to set up. Just plop a couple of lines into /etc/ipf/ipnat.conf: map isp0 10.0.0.0/24 -> 0/32 proxy port ftp ftp/tcp map isp0 10.0.0.0/24 -> 0/32 portmap tcp/udp auto map isp0 10.0.0.0/24 -> 0/32 map isp0 10.11.0.0/24 -> 0/32 proxy port ftp ftp/tcp map isp0 10.11.0.0/24 -> 0/32 portmap tcp/udp auto map isp0 10.11.0.0/24 -> 0/32 map isp0 10.1.0.0/24 -> 0/32 proxy port ftp ftp/tcp map isp0 10.1.0.0/24 -> 0/32 portmap tcp/udp auto map isp0 10.1.0.0/24 -> 0/32  IPFilter is a bit trickier to set up. The rules need to handle more cases. In general, I tried to be a bit paranoid about the rules. For example, I drop all traffic for IP addresses that don’t belong on that interface (I should never see 10.0.0.0/24 addresses on my ISP interface). The only snag was in the defaults for the ipfilter SMF service. By default, it expects you to put your rules into SMF properties. I wanted to use the more old-school approach of using a config file. Thankfully, I quickly found a blog post which hepled me with it. #### Storage, part 2 As the list of components implies, I wanted to make two arrays. I already mentioned the rpool mirror. Once the three 2 TB disks arrived, I hooked them up and created a 3-way mirror (zpool create storage mirror c2t3d0 c2t4d0 c2t5d0). # zpool status storage pool: storage state: ONLINE scan: scrub repaired 0 in 0h0m with 0 errors on Sun Sep 18 14:10:22 2011 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 errors: No known data errors  ##### Deduplication & Compression I suspected that there would be enough files that would be stored several times — system binaries for zones, clones of source trees, etc. ZFS has built-in online deduplication. This stores each unique block only once. It’s easy enough to turn on: zfs set dedup=on storage. Additionally, ZFS has transparent data (and metadata) compression featuring LZJB and gzip algorithms. I enabled dedup and kept compression off. Dedup did take care of the duplicate binaries between all the zones. It even took care of duplicates in my photo stash. (At some point, I managed to end up with several diverged copies of my photo stash. One of the first things I did with Isis, was to dump all of them in the same place and start sorting them. Adobe Lightroom helped here quite a bit.) After a while, I came to the realization that for most workloads I run, dedup was wasteful and I would be better off disabling dedup and enabling light compression (i.e., LZJB). #####$HOME

The installer puts the non-privileged user’s home directory onto the root pool. I did not want to keep it there since I now had the storage pool. After a bit of thought, I decided to zfs create storage/home and then transfer over the current home directory. I could have used cp(1) or rsync(1), but I thought it would be more fun (and a learning experience) to use zfs send and zfs recv. It went something like this:

# zfs snapshot rpool/export/home/jeffpc@snap
# zfs send rpool/export/home/jeffpc@snap | zfs recv storage/home/jeffpc


#### CIFS

Local storage is great, but there is only so much you can do with it. Sooner or later, you will want to access it from a different computer. There are many different ways to “export” your data, but as one might expert, they all have their benefits and drawbacks. ZFS makes it really easy to export data via NFS and CIFS. After a lot of thought, I decided that CIFS would work a bit better. The major benefit of CIFS over NFS is that it Just Works™ on all the major operating systems. That’s not to say that NFS does not work, but rather that it needs a bit more…convincing at times. This is especially true on Windows.

I followed the documentation for enabling CIFS on Solaris 11. Yes, I know, OpenIndiana isn’t Solaris 11, but this aspect was the same. This ended with me enabling sharing of several datasets like this:

# zfs set sharesmb=name=photos storage/photos


##### ACLs

The home directory shares are all done. The photos share, however, needs a bit more work. Specifically, it should be fully accessible to the users that are supposed to have access (i.e., jeffpc & holly). The easiest way I can find is to use ZFS ACLs.

First, I set the aclmode to passthrough (zfs set aclmode=passthough storage). This will prevent a chmod(1) on a file or directory from blowing away all the ACEs (Access Control Entries?). Then on the share directory, I added two ACL entries that allow everything.

# /usr/bin/ls -dV /share/photos
drwxr-xr-x   2 jeffpc   root           4 Sep 23 09:12 /share/photos
owner@:rwxp--aARWcCos:-------:allow
group@:r-x---a-R-c--s:-------:allow
everyone@:r-x---a-R-c--s:-------:allow
# /usr/bin/chmod A+user:jeffpc:rwxpdDaARWcCos:fd:allow /share/photos
# /usr/bin/chmod A+user:holly:rwxpdDaARWcCos:fd:allow /share/photos
# /usr/bin/chmod A2- /share/photos # get rid of user
# /usr/bin/chmod A2- /share/photos # get rid of group
# /usr/bin/chmod A2- /share/photos # get rid of everyone
# /usr/bin/ls -dV /share/photos
drwx------+  2 jeffpc   root           4 Sep 23 09:12 /share/photos
user:jeffpc:rwxpdDaARWcCos:fd-----:allow
user:holly:rwxpdDaARWcCos:fd-----:allow


The first two chmod commands prepend two ACEs. The next three remove ACE number 2 (the third entry). Since the directory started of with three ACEs (representing the standard Unix permissions), the second set of chmods removes those, leaving only the two user ACEs behind.

##### Clients

That was easy! In case you are wondering, the Solaris/Illumos CIFS service does not allow guest access. You must login to use any of the shares.

Anyway, here’s the end result:

Pretty neat, eh?

#### Zones

Aside from the router zone, there were a number of other zones. Most of them were for Illumos and OpenIndiana development.

I don’t remember much of the details since this predates the SmartOS conversion.

#### Power

When I first measured the system, it was drawing about 40-45 Watts while idle. Now, I have Isis along with the WRT54G and a gigabit switch on a UPS that tells me that I’m using about 60 Watts when idle. The load can spike up quite a bit if I put load on the 4 Xeon cores and give the disks something to do. (Afterall, it is an 80 Watt CPU!) While this is by no means super low-power, it is low enough and at the same time I have the capability to actually get work done instead of waiting for hours for something to compile.

#### SmartOS

As I already mentioned, I ended up rebuilding the system with SmartOS. SmartOS is not a general purpose distro. Rather, it strives to be a hypervisor with utilities that make guest management trivial. Guests can either be zones, or KVM-powered virtual machines. Here are the major changes from the OpenIndiana setup.

##### Storage — pools

SmartOS is one of those distros you do not install. It always netboots, boots from a USB stick or a CD. As a result, you do not need a system drive. This immediately obsoleted the two laptop drives. Conveniently, around the same time, Holly’s laptop suffered from a disk failure so Isis got to donate one of the unused 2.5" system disks.

SmartOS calls its data pool “zones”, which took a little bit of getting used to. There’s a way to import other pools, but wanted to keep the settings as vanilla as possible.

At some point, I threw in a Intel 160 GB SSD to use for L2ARC and  ZIL.

Here’s what the pool looks like:

# zpool status
pool: zones
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not
support
the features. See zpool-features(5) for details.
scan: scrub repaired 0 in 2h59m with 0 errors on Sun Jan 13 08:37:37 2013
config:

zones       ONLINE       0     0     0
mirror-0  ONLINE       0     0     0
c1t5d0  ONLINE       0     0     0
c1t4d0  ONLINE       0     0     0
c1t3d0  ONLINE       0     0     0
logs
c1t1d0s0  ONLINE       0     0     0
cache
c1t1d0s1  ONLINE       0     0     0

errors: No known data errors


In case you are wondering about the features related status message, I created the zones pool way back when Illumos (and therefore SmartOS) had only two ZFS features. Since then, Illumos added one and Joyent added one to SmartOS.

# zpool get all zones | /usr/xpg4/bin/grep -E '(PROP|feature)'
NAME   PROPERTY                   VALUE                      SOURCE
zones  feature@async_destroy      enabled                    local
zones  feature@empty_bpobj        active                     local
zones  feature@lz4_compress       disabled                   local
zones  feature@filesystem_limits  disabled                   local


I haven’t experimented with either enough to enable it on a production system I rely on so much.

##### Storage — deduplication & compression

The rebuild gave me a chance to start with a clean slate. Specifically, it gave me a chance to get rid off the dedup table. (The dedup table, DDT, is built as writes happen to the filesystem with dedup enabled.) Data deduplication relies on some form of data structure (the most trivial one is a hash table) that maps the hash of the data to the data. In ZFS, the DDT maps the  SHA-256 of the block to the block address.

The reason I stopped using dedup on my systems was pretty straight forward (and not specific to ZFS). Every entry in the DDT has an overhead. So, ideally, every entry in the DDT is referenced at least twice. If a block is referenced only once, then one would be better off without the block taking up an entry in the DDT. Additionally, every time a reference is taken or released, the DDT needs to be updated. This causes very nasty random I/O under which spinning disks want to weep. It turns out, that a “normal” user will have mostly unique data rendering deduplication impractical.

That’s why I stopped using dedup. Instead, I became convinced that most of the time light compression is the way to go. Lightly compressing the data will result in I/O bandwidth savings as well as capacity savings with little overhead given today’s processor speeds versus I/O latencies. Since I haven’t had time to experiment with the recently integrated LZ4, I still use LZJB.

## June 22, 2013

### Josef "Jeff" Sipek

#### First Solo Cross-Country

A week ago (June 15), I went on my first solo cross country flight. The plan was to fly KARBKMBSKAMN → KARB. In case you don’t happen to have the Detroit sectional chart in front of you, this might help you visualize the scope of the flight.

 leg distance time KARB → KMBS 79 nm 47 min KMBS → KAMN 29 nm 20 min KAMN → KARB 79 nm 46 min Total 187 nm 113 min

Here’s the ground track (as recorded by the G1000) along with red dots for each of my checkpoints and a pink line connecting them. (Sadly, there’s no convenient zoom level that covers the entire track without excessive waste.)

As you can see, I didn’t quite overfly all the checkpoints. In my defense, the forecast winds were about 40 degrees off from reality during the first half of the flight. :)

Let’s examine each leg separately.

#### KARB → KMBS

My checkpoint by I-69 (southwest of Flint) was supposed to be a I-69 and Pontiac VORTAC (PSI) radial 311 intersection. However when I called up the FSS briefer, I found out that it was out of service. Thankfully, Salem VORTAC (SVM) is very close so I just used its radial 339 instead. Next time I’m using a VOR for any part of my planning, I’m going to check for any NOTAMs before I make it part of my plan — redoing portions of the plan is tedious and not fun.

On the way to Saginaw, I was planning to go at 3500. (Yes, I know, it is a westerly direction and the rule (FAR 91.159) says even thousand + 500, but the clouds were not high enough to fly at 4500 and the rule only applies 3000 AGL and above — the ground around these parts is 700-1000 feet MSL.)

Right when I entered the downwind for runway 23, the tower cleared me to land. My clearance was quickly followed by the tower instructing a commuter jet to hold short of 23 because of landing traffic — me! Somehow, it is very satisfying to see a real plane (CRJ-200) have to wait for little ol’ me to land. (FlightAware tells me that it was FLG3903 flight to KDTW.)

While I was on taxiway C, they got cleared to take off. I couldn’t help it but to snap a photo.

It was a pretty slow day for Saginaw. The whole time I was on the radio with Saginaw approach, I got to hear maybe 5 planes total. The tower was even less busy. There were no planes around except for me and the commuter jet.

#### KMBS → KAMN

This leg of the flight was the hardest. First of all, it was only 29 nm. This equated to about 25 minutes of flying. The first four-ish and the last five-ish were spent climbing and descending, so really there was about 15 minutes of cruising. Not much time to begin with. I flew this leg by following the MBS VOR radial 248. My one and only checkpoint on this leg was mid way — the beginning of a wind turbine farm. It took about 2 minutes longer to get there than planned, but the wind turbines were easy to see from distance so no problems there.

Following the VOR wasn’t difficult, but you can see in the ground track that I was meandering across it. As expected, it got easier the farther away from the station I got. Here’s the plot of the CDI deflection for this leg. The CSV file says that the units are “fsd” — I have no idea what that means.

I can’t really draw any conclusions because…well, I don’t know what the graph is telling me. Sure, it seems to get closer and closer to zero (which I assume is a good thing), but I can’t honestly say that I understand what the graph is saying.

The most difficult part was trying to stay at 2500 feet. For whatever reason, it felt like I was flying in sizable thermals. Since there were no thunderstorms in the area, I flew on fighting the updrafts. That was the difficult part. I suspect the wind turbines were built there because the area is windy.

KAMN is a decent size airport. Two plenty long runways for a 172 even on a hot day (5004x75 feet and 3197x75 feet). I didn’t stop by the FBO, so I have no idea how they are. I did not notice anyone else around during the couple of minutes I spent on the ground taxiing and getting ready for the next leg. Maybe it was just the overcast that made people stay indoors. Oh well. It is a nice airport, and I wouldn’t mind stopping there in the future if the need arose.

#### KAMN → KARB

Flying back to Ann Arbor was the easy part of the trip. The air calmed down enough that once trimmed, the plane more or less stayed at 3500 feet.

It apparently was a slow day for Lansing approach as well, as I got to hear a controller chatting with a pilot of a skydiving plane about how fast the skydivers fell to the ground. Sadly, I didn’t get to hear the end of the conversation since the controller told me to contact Detroit approach.

As far as the ground track is concerned, you can see two places where I stopped flying current heading and instead flew toward the next checkpoint visually. The first instance is a few miles north of KOZW. I spotted the airport, and since I knew I was supposed to overfly it, I turned to it and flew right over it. The second instance is by Whitmore Lake — there I looked into the distance and saw Ann Arbor. Knowing that the airport is on the south side, I just headed right toward it ignoring the planned heading. As I mentioned before in both cases, the planned course was slightly off because the winds weren’t quite like the forecast said they would be.

You can’t tell from the rather low resolution of the map, but I got to fly right over the  Michigan stadium. Sadly, I was a bit too busy flying the plane to take a photo of the field below me.

#### Next

With one solo cross country out of the way, I’m still trying to figure out where I want to go next. Currently, I am considering one of these flights (in no particular order):

 path distance time KARB KGRR KMOP KARB 239 nm 2h19m KARB KBIV KJXN KARB 220 nm 2h08m KARB KFWA KTOL KARB 210 nm 2h03m KARB CRUXX KFWA KTOL KARB 210 nm 2h06m KARB LFD KFWA KTOL KARB 225 nm 2h12m KARB KAZO KOEB KTOL KARB 207 nm 2h01m KARB KMBS KGRR KARB 243 nm 2h21m KARB KGRR KEKM KARB 266 nm 2h40m

#### Benchmark Assumptions

Today I came across a blog post about Running PostgreSQL on Compression-enabled ZFS. I found the article because (1) I am a fan of  ZFS, and (2) transparent storage compression interests me. (Maybe I’ll talk about the later in the future.)

Whoever ran the benchmark decided to compare ZFS with  lzjb, ZFS with gzip, against ext3. Their analysis states that ZFS-gzip is faster than ZFS-lzjb, which is faster than ext3. They admit that the benchmark is I/O bound. Then they state that compression effectively speeds up the disk I/O by making every byte transfered contain more information. The analysis goes down the drain right after that.

While doing background research for this blog post we also got a chance to investigate some of the other features besides compression that differentiate ZFS from older file system architectures like ext3. One of the biggest differences is ZFS’s approach to scheduling disk IOs which employs explicit IO priorities, IOP reordering, and deadline scheduling in order to avoid flooding the request queues of disk controllers with pending requests.

Anyone who’s benchmarked a system should have a red flag going off after reading those sentences. My reaction was something along the lines: “What?! You know that there are at least three major differences between ZFS and ext3 in addition to compression and you still try to draw conclusions about compression effectiveness by comparing ZFS with compression against ext3?!”

All they had to do to make their analysis so much more interesting and keep me quiet was to include another set of numbers — ZFS without compression. That way, one can compare ext3 with ZFS-uncompressed to see how much difference the radically different filesystem design makes. Then one could compare ZFS-uncompressed with the lzjb and gzip data to see if compression helps. Based on the data presented, we have no idea if compression helps — we just know that compression and ZFS outperform ext3. What if ZFS without compression is 5x faster than ext3? Then using gzip (~4x faster than ext3) is actually not the fastest.

To be fair, knowing how modern disk drives behave, chances are that compressed ZFS is faster than uncompressed ZFS. Since CPU cycles are so plentiful these days, all my systems have lzjb compression enabled everywhere. I do this mostly to conserve space, but also in hopes of transferring less data to disk. Yes, this is exactly what their benchmark attempts to show. (I haven’t had a chance to experiment with the new-ish lz4 compression algorithm in ZFS.) My point here is solely about benchmark analysis and unfounded (or at least unstated) assumptions found in just about every benchmark out there.

## June 09, 2013

### Josef "Jeff" Sipek

#### Plotting G1000 EGT

It would seem that my two recent posts are getting noticed. On one of them, someone asked for the EGT R code I used.

After I get the CSV file of the SD card, I first clean it up. Currently, I just do it manually using Vim, but in the future I will probably script it. It turns out that Garmin decided to put a header of sorts at the beginning of each CSV. The header includes version and part numbers. I delete it. The next line appears to have units for each of the columns. I delete it as well. The remainder of the file is an almost normal CSV. I say almost normal, because there’s an inordinate number of spaces around the values and commas. I use the power of Vim to remove all the spaces in the whole file by using :%s/ //g. Then I save and quit.

Now that I have a pretty standard looking CSV, I let R do its thing.

> data <- read.csv("munged.csv")
> names(data)
[1] "LclDate"   "LclTime"   "UTCOfst"   "AtvWpt"    "Latitude"  "Longitude"
[7] "AltB"      "BaroA"     "AltMSL"    "OAT"       "IAS"       "GndSpd"
[13] "VSpd"      "Pitch"     "Roll"      "LatAc"     "NormAc"    "HDG"
[19] "TRK"       "volt1"     "volt2"     "amp1"      "amp2"      "FQtyL"
[25] "FQtyR"     "E1FFlow"   "E1OilT"    "E1OilP"    "E1RPM"     "E1CHT1"
[31] "E1CHT2"    "E1CHT3"    "E1CHT4"    "E1EGT1"    "E1EGT2"    "E1EGT3"
[37] "E1EGT4"    "AltGPS"    "TAS"       "HSIS"      "CRS"       "NAV1"
[43] "NAV2"      "COM1"      "COM2"      "HCDI"      "VCDI"      "WndSpd"
[49] "WndDr"     "WptDst"    "WptBrg"    "MagVar"    "AfcsOn"    "RollM"
[55] "PitchM"    "RollC"     "PichC"     "VSpdG"     "GPSfix"    "HAL"
[61] "VAL"       "HPLwas"    "HPLfd"     "VPLwas"


As you can see, there are lots of columns. Before doing any plotting, I like to convert the LclDate, LclTime, and UTCOfst columns into a single Time column. I also get rid of the three individual columns.

> data$Time <- as.POSIXct(paste(data$LclDate, data$LclTime, data$UTCOfst))
> data$LclDate <- NULL > data$LclTime <- NULL
> data$UTCOfst <- NULL  Now, let’s focus on the EGT values — E1EGT1 through E1EGT4. E1 refers to the first engine (the 172 has only one), I suspect that a G1000 on a twin would have E1 and E2 values. I use the ggplot2 R package to do my graphing. I could pick colors for each of the four EGT lines, but I’m way too lazy and the color selection would not look anywhere near as nice as it should. (Note, if you have only two values to plot, R will use a red-ish and a blue-ish/green-ish color for the lines. Not exactly the smartest selection if your audience may include someone color-blind.) So, instead I let R do the hard work for me. First, I make a new data.frame that contains the time and the EGT values. > tmp <- data.frame(Time=data$Time, C1=data$E1EGT1, C2=data$E1EGT2,
C3=data$E1EGT3, C4=data$E1EGT4)
Time      C1      C2      C3      C4
1 2013-06-01 14:24:54 1029.81 1016.49 1019.08 1098.67
2 2013-06-01 14:24:54 1029.81 1016.49 1019.08 1098.67
3 2013-06-01 14:24:55 1030.94 1017.57 1019.88 1095.38
4 2013-06-01 14:24:56 1031.92 1019.05 1022.81 1095.84
5 2013-06-01 14:24:57 1033.16 1020.23 1022.82 1092.38
6 2013-06-01 14:24:58 1034.54 1022.33 1023.72 1085.82


Then I use the reshape2 package to reorganize the data.

> library(reshape2)
> tmp <- melt(tmp, "Time", variable.name="Cylinder")
Time Cylinder   value
1 2013-06-01 14:24:54       C1 1029.81
2 2013-06-01 14:24:54       C1 1029.81
3 2013-06-01 14:24:55       C1 1030.94
4 2013-06-01 14:24:56       C1 1031.92
5 2013-06-01 14:24:57       C1 1033.16
6 2013-06-01 14:24:58       C1 1034.54


The melt function takes a data.frame along with a name of a column (I specified “Time”), and reshapes the data.frame. For each row, in the original data.frame, it takes all the columns not specified (e.g., not Time), and produces a row for each with a variable name being the column name and the value being that column’s value in the original row. Here’s a small example:

> df <- data.frame(x=c(1,2,3),y=c(4,5,6),z=c(7,8,9))
> df
x y z
1 1 4 7
2 2 5 8
3 3 6 9
> melt(df, "x")
x variable value
1 1        y     4
2 2        y     5
3 3        y     6
4 1        z     7
5 2        z     8
6 3        z     9


As you can see, the x values got duplicated since there were two other columns. Anyway, the one difference in my call to melt is the variable.name argument. I don’t want my variable name column to be called “variable” — I want it to be called “Cylinder.”

At this point, the data is ready to be plotted.

> library(ggplot2)
> p <- ggplot(tmp)
> p <- p + ggtitle("Exhaust Gas Temperature")
> p <- p + ylab(expression(Temperature~(degree*F)))
> p <- p + geom_line(aes(x=Time, y=value, color=Cylinder))
> print(p)


That’s all there is to it! There may be a better way to do it, but this works for me. I use the same approach to plot the different altitude numbers, the speeds (TAS, IAS, GS), CHT, and fuel quantity.

## June 02, 2013

### Josef "Jeff" Sipek

#### Garmin G1000 Data Logging: Cross-Country Edition

About a week ago, I talked about G1000 data logging. In that post, I mentioned that cross-country flying would be interesting to visualize. Well, on Friday I got to do a mock pre-solo cross country phase check. I had the G1000 logging the trip.

First of all, the plan was to fly from KARB to KFPK. It’s a 51nm trip. I had four checkpoints. For the purposes of plotting the flight, I had to convert the pencil marks on my sectional chart to latitude and longitude.

> xc_checkpoints
Name Latitude Longitude
1      Chelsea 42.31667 -84.01667
2       Munith 42.37500 -84.20833
3       Leslie 42.45000 -84.43333
4 Eaton Rapids 42.51667 -84.65833


First of all, let’s take a look at the ground track.

In addition to just the ground track, I plotted here the first three checkpoints in red, the location of the plane every 5 minutes in blue (excluding all the data points near the airport), and some other places of interest in green.

As you can see, I was always a bit north of where I was supposed to be. Right after passing Leslie, I was told to divert to 69G. I figured out the true course, and tried to take the wind into account, but as you can see it didn’t go all that well at first. When I found myself next to some oil tanks way north of where I wanted to be, I turned southeast…a little bit too much. Eventually, I made it to Richmond which was, much like all grass fields, way too hard to spot. (I’m pretty sure that I will avoid all grass fields while on my solo cross countries.)

So, how about the altitude? The plan was to fly at 4500 feet, but due to clouds being at about 3500,  pilotage being the purpose of this exercise, and not planning on going all the way to KFPK anyway, we just decided to stay at 3000. At one point, 3000 seemed like a bit too close to the clouds, so I ended up at 2900. Below is the altitude graph. For your convenience, I plotted horizontal lines at 2800, 2900, 3000, and 3100 feet. (Near the end, you can see 4 touch and gos and a full stop at KARB.)

While approaching my second checkpoint, Munith, I realized that it will be pretty hard to find. It’s a tiny little town, but sadly it is the biggest “landmark” around. So, I tuned in the JXN  VOR and estimated that the 50 degree radial would go through Munith. While that wouldn’t give me my location, it would tell me when I was abeam Munith. Shortly after, I changed my estimate to the 60 degree radial. (It looks like 65 is the right answer.)

> summary(factor(data$NAV1)) 109.6 114.3 3192 1406 > summary(factor(data$CRS))
36   37   42   44   47   48   49   50   52   57   59   60
1444    1    1    1    1    1    1  135    1    1    1 3010
> head(subset(data, HSIS=="NAV1")$Time, 1) [1] "2013-05-31 09:43:23 EDT" > head(subset(data, NAV1==109.6)$Time, 1)
[1] "2013-05-31 09:43:42 EDT"
> head(subset(data, CRS==50)$Time, 1) [1] "2013-05-31 09:44:26 EDT" > head(subset(data, CRS==60)$Time, 1)
[1] "2013-05-31 09:46:48 EDT"


When I got the plane, the NAV1 radio was tuned to 114.3 (SVM) with the 36 degree radial set. At 9:43:25, I switched the input for the HSI from GPS to NAV1; at 9:43:42, I tuned into 109.6 (JXN). 44 seconds later, I had the 50 degree radial set. Over two minutes later, I changed my mind and set the 60 degree radial, which stayed there for the remainder of the flight.

In my previous post about the G1000 data logging abilities, I mentioned that the engine related variables would be more interesting on a cross-country. Let’s take a look.

As you can see, when reaching 3000 feet (cf. the altitude graph) I pulled the power back to a cruise setting. Then I started leaning the mixture.

Interestingly, just pulling the power back causes a large saving of fuel. Leaning helped save about one gallon/hour. While that’s not bad (~11%), it is not as significant as I thought it would be.

Since there was nowhere near as much maneuvering as previously, the fuel quantity graphs look way more useful. Again, we can see that the left tank is being used more.

The cylinder head temperature and exhaust gas temperature graphs are mostly boring. Unlike the previous graphs of CHT and EGT these clearly show a nice 30 minute long period of cruising. To be honest, I thought these graphs would be more interesting. I’ll probably keep plotting them in the future but not share them unless they show something interesting.

Same goes for the oil pressure and temperature graphs. They are kind of dull.

Anyway, that’s it for today. Hopefully, next time I’ll try to look at how close the plan was to reality.

## May 26, 2013

### Josef "Jeff" Sipek

#### Garmin G1000 Data Logging

About a month ago I talked about using R for plotting GPS coordinates. Recently I found out that the  Cessna 172 I fly in has had its G1000 avionics updated. Garmin has added the ability to store various flight data to a CSV file on an SD card every second. Aside from the obvious things such as date, time and GPS latitude/longitude/altitude it stores a ton of other variables. Here is a subset: indicated airspeed, vertical speed, outside air temperature, pitch attitude angle, roll attitude angle, lateral and vertical G forces, the NAV and COM frequencies tuned, wind direction and speed, fuel quantity (for each tank), fuel flow, volts and amps for the two buses, engine RPM, cylinder head temperature, and exhaust gas temperature. Neat, eh? I went for a short flight that was pretty boring as far as a number of these variables are concerned. Logs for cross-country flights will be much more interesting to examine.

With that said, I’m going to have fun with the 1-hour recording I have. If you don’t find plotting time series data interesting, you might want to stop reading now. :)

First of all, let’s take a look at the COM1 and COM2 radio settings.

> unique(data$COM1) [1] 120.3 > unique(data$COM2)
[1] 134.55 120.30 121.60


Looks like I had 3 unique frequencies tuned into COM2 and only one for COM1. I always try to get the  ATIS on COM2 (134.55 at KARB), then I switch to the ground frequency (121.6 at KARB). This way, I know that COM2 both receives and transmits. Let’s see how long I’ve been on the ATIS frequency…

> summary(factor(data$COM2)) 120.3 121.6 134.55 1 3303 70  It makes sense, between listening to the ATIS and tuning in the ground, I spend 70 seconds listening to 134.55. The tower frequency (120.3 at KARB) showed up for a second because I switched away from the ATIS only to realize that I didn’t tune in the ground yet. Graphing these values doesn’t make sense. I didn’t use the NAV radios, so they stayed tuned to 114.3 and 109.6. Those are the Salem and Jackson VORs, respectively. (Whoever used the NAV radios last left these tuned in.) To keep track of one’s altitude, one must set the altimeter to what a nearby weather station says. The setting is in Inches of Mercury. The ATIS said that 30.38 was the setting to use. The altimeter was set to 30.31 when I got it. You can see that it took me a couple of seconds to turn the knob far enough. Again, graphing this variable is pointless. It would be more interesting during a longer flight where the barometric pressure changed a bit. > summary(factor(data$BaroA))
30.31 30.32 30.36 30.38
262     1     1  3110


Ok, ok… time to make some graphs… First up, let’s take a look at the outside air temperature (in °C).

> summary(dataOAT) Min. 1st Qu. Median Mean 3rd Qu. Max. 4.0 6.8 12.2 11.5 16.0 18.5  In case you didn’t know, the air temperature drops about 2°C every 1000 feet. Given that, you might be already guessing, after I took off, I climbed a couple of thousand feet. Here, I plotted both the altitude given by the GPS ( MSL as well as WGS84) and the altitude given by the altimeter. You can see that around 12:12, I set the altimeter which caused the indicated altitude to jump up a little bit. Let’s take a look at the difference between the them. Again, we can see the altimeter setting changing with the sharp ~60 foot jump at about 12:12. The discrepancy between the indicated altitude and the actual (GPS) altitude may be alarming at first, but keep in mind that even though the altimeter may be off from where you truly are, the whole air traffic system plays the same game. In other words, every aircraft and every controller uses the altimeter-based altitudes so there is no confusion. In yet other words, if everyone is off by the same amount, no one gets hurt. :) Ok! It’s time to look at all the various speeds. The G1000 reports indicated airspeed (IAS), true airspeed (TAS), and ground speed (GS). We can see the taxiing to and from the runway — ground speed around 10 kts. (Note to self, taxi slower.) The ground speed is either more or less than the airspeed depending on the wind speed. Moving along, let’s examine the lateral and normal accelerations. The normal acceleration is seat pushing “up”, while the lateral acceleration is the side-to-side “sliding in the seat side to side” acceleration. (Note: I am not actually sure which way the G1000 considers negative lateral acceleration.) Ideally, there is no lateral acceleration. (See coordinated flight.) I’m still learning. :) As you can see, there are several outliers. So, why not look at them! Let’s consider an outlier any point with more than 0.1 G of lateral acceleration. (I chose this values arbitrarily.) > nrow(subset(data, abs(LatAc) > 0.1)) [1] 41 > nrow(subset(data, abs(LatAc) > 0.1 & AltB < 2000)) [1] 28  As far as lateral acceleration goes, there were only 41 points beyond 0.1 Gs 30 of which were below 2000 feet. (KARB’s pattern altitude is 1800 feet so 2000 should be enough to easily cover any deviation.) Both of these counts however include all the taxiing. A turn during a taxi will result in a lateral acceleration, so let’s ignore all the points when we’re going below 25 kts. > nrow(subset(data, abs(LatAc) > 0.1 & GndSpd > 25)) [1] 26 > nrow(subset(data, abs(LatAc) > 0.1 & AltB < 2000 & GndSpd > 25)) [1] 13  Much better! Only 26 points total, 13 below 2000 feet. Where did these points happen? (Excuse the low-resolution of the map.) You can also see the path I flew — taking off from runway 6, making a left turn to fly west to the practice area. The moment I took off, I noticed that the thermals were not going to make this a nice smooth ride. I think that’s why there are at least three points right by the highway while I was still climbing out of KARB. The air did get smoother higher up, but it still wasn’t a nice calm flight like the ones I’ve gotten used to during the winter. Looking at the map, I wonder if some of these points were due to abrupt power changes. Here’s a close-up on the airport. This time, the point color indicates the amount of acceleration. There are only 4 points displayed. Interestingly, three of the four points are negative. Let’s take a look.  Time LatAc AltB E1RPM 2594 2013-05-25 12:52:10 -0.11 879.6 2481.1 2846 2013-05-25 12:56:31 -0.13 831.6 895.8 2847 2013-05-25 12:56:32 0.18 831.6 927.4 2865 2013-05-25 12:56:50 -0.13 955.6 2541.5  The middle two are a second apart. Based on the altitude, it looks like the plane was on the ground. Based on the engine RPMs, it looks like it was within a second or two of touchdown. Chances are that it was just nose not quite aligned with the direction of travel. The other two points are likely thermals tossing the plane about a bit — the first point is from about 50 feet above ground the last is from about 120 feet. Ok, I’m curious… > data[c(2835:2850),c("Time","LatAc","AltB","E1RPM","GndSpd")] Time LatAc AltB E1RPM GndSpd 2835 2013-05-25 12:56:20 -0.02 876.6 1427.9 66.71 2836 2013-05-25 12:56:21 0.01 873.6 1077.1 65.71 2837 2013-05-25 12:56:22 0.01 864.6 982.4 64.21 2838 2013-05-25 12:56:23 0.04 861.6 994.1 62.77 2839 2013-05-25 12:56:24 0.01 858.6 982.6 61.54 2840 2013-05-25 12:56:25 0.01 852.6 988.2 60.18 2841 2013-05-25 12:56:26 -0.02 845.6 959.0 58.91 2842 2013-05-25 12:56:27 0.00 846.6 945.5 57.73 2843 2013-05-25 12:56:28 0.01 844.6 930.9 56.53 2844 2013-05-25 12:56:29 0.10 834.6 908.0 55.16 2845 2013-05-25 12:56:30 -0.01 827.6 886.6 54.16 2846 2013-05-25 12:56:31 -0.13 831.6 895.8 52.71 2847 2013-05-25 12:56:32 0.18 831.6 927.4 51.49 2848 2013-05-25 12:56:33 -0.06 831.6 982.0 50.21 2849 2013-05-25 12:56:34 0.05 840.6 1494.0 49.39 2850 2013-05-25 12:56:35 -0.07 833.6 2249.7 48.76  The altitudes look a little out of whack, but otherwise it makes sense. #2835 was probably the time throttle was pulled to idle. Between #2848 and #2849 throttle went full in. Ground was most likely around 832 feet and touchdown was likely at #2846 as I guessed earlier. Let’s plot the engine related values. First up, engine RPMs. It is pretty boring. You can see the ~800 during taxi; the 1800 during the runup; the 2500 during takeoff; 2200 during cruise; and after 12:50 you can see the go-around, touch-n-go, and full stop. Next up, cylinder head temperature (in °F) and exhaust gas temperature (also in °F). Since the plane has a 4 cylinder engine, there are four lines on each graph. As I was maneuvering most of the time, I did not get a chance to try to lean the engine. On a cross country, it be pretty interesting to see the temperature go up as a result of leaning. Moving on, let’s look at fuel consumption. This is really weird. For the longest time, I knew that the plane used more fuel from the left tank, but this is the first time I have solid evidence. (Yes, the fuel selector was on “Both”.) The fuel flow graph is rather boring — it very closely resembles the RPM graph. Ok, two more engine related plots. It is mildly interesting that the temperature never really goes down while the pressure seems to be correlated with the RPMs. There are two variables with the vertical speed — one is GPS based while the other is barometer based. As you can see, the two appear to be very similar. Let’s take a look at the delta. In addition to just a plain old subtraction, you can see the 60-second moving average. Not very interesting. Even though the two sometimes are off by as much as 560 feet/minute, the differences are very short-lived. Furthermore, the differences are pretty well distributed with half of them being within 50 feet. > summary(dataVSpd - data$VSpdG) Min. 1st Qu. Median Mean 3rd Qu. Max. -559.8000 -49.2800 0.4950 0.8252 53.0600 563.4000 > summary(SMA(data$VSpd - data$VSpdG),2) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's -240.2000 -22.2200 0.6940 0.8226 25.4700 226.7000 9  Ok, last but not least the CSV contains the pitch and roll angles. I’ll have to think about what sort of creative analysis I can do. The only thing that jumps to mind is the mediocre S-turn around 12:40 where the roll changed from about 20 degrees to -25 degrees. I completely ignored the volts and amps variables (for each of the two busses), all the navigation related variables (waypoint identifier, bearing, and distance, HSI source, course, CDI/ GS deflection), wind (direction and speed), as well as ground track, magnetic heading and variation, GPS fix (it was always 3D), GPS horizontal/vertical alert limit, and WAAS GPS horizontal/vertical protection level (I don’t think the avionics can handle WAAS — the columns were always empty). Additionally, since I wasn’t using the autopilot, a number of the fields are blank (Autopilot On/Off, mode, commands). #### Ideas A while ago I learned about CloudAhoy. Their iPhone/iPad app uses the GPS to record your flight. Then, they do some number crunching to figure out what kind of maneuvers you were doing. (I contacted them a while ago to see if one could upload a GPS trace instead of using their app, sadly it was not possible. I do not know if that has changed since.) I think it’d be kind of cool to write a (R?) script that’d take the G1000 recording and do similar analysis. The big difference is the ability to use the great number of other variables to evaluate the pilot’s control of the airplane — ranging from coordinated flight and dangerous maneuvers (banking too aggressively while slow), to “did you forget to lean?”. ## May 19, 2013 ### Justin Dearing #### Creating a minimally viable Centos instance for SSH X11 Forwarding I recently need to setup a CentOS 6.4 vm for development Java development. I wanted to be able to run Eclipse STS and on said vm and display the X11 Windows remotely on my Windows 7 desktop via XMing. I saw no reason for the CentOS VM to have a local X11 server. I’m quite comfortable with the Linux command line. I decided to share briefly on how to go from a CentOS minimal install to something actually useful for getting work done. • /usr/bin/man The minimal install installs man pages, but not the man command. This is an odd choice. yum install man will fix that. • vim There is a bare bones install of vim included by default that is only accessible via vi. If you want a more robust version of vim, yum install vim. • X11 forwarding You need the xauth package and fonts. yum install xauth will allow X11 forwarding to work. yum groupinstall fonts will install a set of fonts. • A terminal for absolute minimal viability yum install xterm will give you a terminal. I prefer terminator, which is available through rpmforge. • RpmForge (now repoforge) Centos is based on Red Hat Enterprise Linux. Therefore it focuses on being a good production server, not a developer environment. You will probably need rpmforge to get some of the packages you want. The directions for adding Rpmforge to your yum repositories are here. • terminator This is my terminal emulator of choice. One you added rpmforge, yum install rpmforge • gcc, glibc, etc Honestly, you can usually live without these if you stick to precompiled rpms, and you’re not using gcc for development. If you need to build a kernel module, yum install kernel-devel gcc make should get you what out need. From here, you can install the stuff you need for your development environment for your language, framework, and scm of choice. ## May 11, 2013 ### Justin Dearing #### When your PowerShell cmdlet doesn’t return anything, use -PassThru The other day I was mounting an ISO in Windows 8 via the Mount-DiskImage command. Since I was mounting the disk image in a script, I needed to know the drive letter it was mounted to so the script could access the files contained within. However, Mount-DiskImage was not returning anything. I didn’t want to go through the hack of listing drives before and after I mounted the disk image, or explicitly assigning the drive letter. Both would leave me open to race conditions if another drive was mounted by another process while my script ran. I was at a loss for what to do. Then, I remembered the -PassThru parameter, which I am quite fond of using with Add-Type. See certain cmdlets, like Mount-DiskImage, and Add-Type don’t return pipeline output by default. For Add-Type, this makes sense. You rarely want to see a list of the types you just added, unless your exploring the classes in a DLL from the command like. However, for Mount-DiskImage, defaulting to no output was a questionable decision IMHO. Now in the case of Mount-DiskImage, -PassThru doesn’t return the drive letter. However, it does return an object that you can pipe to Get-Volume which does return an object with a DriveLetter property. To figure that out, I had to ask on stackoverflow. tl;dr: If your PowerShell cmdlet doesn’t return any output, try -PassThru. If you need the drive letter of a disk image mounted with Mount-DiskImage, pipe the output through Get-Volume. For a more in depth treatise of -PassThru, check out this script guy article by Ed Wilson(blog|twitter). #### Getting the Drive Letter of a disk image mounted with WinCdEmu In my last post, I talked about mounting disk images in Windows 8. Both Windows 8 and 2012 include native support for mounting ISO images as drives. However, in prior versions of Windows you needed a third party tool to do this. Since I have a preference for open source, my tool of choice before Windows 8 was WinCdEmu. Today, I decided to see if it was possible to determine the drive letter of an ISO mounted by WinCdEMu with PowerShell. A quick search of the internet revealed that WinCdEmu contained a 32 bit command line tool called batchmnt.exe, and a 64 bit counterpart called batchmnt64.exe. These tools were meant for command line automation. While I knew there would be no .NET libraries in WinCdEmu, I did have hope there would be a COM object I could use with New-Object. Unfortunately, all the COM objects were for Windows Explorer integration and popped up GUIs, so they were inappropriate for automation. Next I needed to figure out how to use batchmnt. For this I used batchmnt64 /?. C:\Users\Justin>"C:\Program Files (x86)\WinCDEmu\batchmnt64.exe" /? BATCHMNT.EXE - WinCDEmu batch mounter. Usage: batchmnt <image file> [<drive letter>] [/wait] - mount image file batchmnt /unmount <image file> - unmount image file batchmnt /unmount <drive letter>: - unmount image file batchmnt /check <image file> - return drive letter as ERORLEVEL batchmnt /unmountall - unmount all images batchmnt /list - list mounted C:\Users\Justin> Mounting and unmounting are trivial. The /list switch produces some output that I could parse into a PSObject if I so desired. However, what I really found interesting was batchmnt /check. The process returned the drive letter as ERORLEVEL. That means the ExitCode of the batchmnt process. If you ever programmed in a C like language, you know your main function can return an integer. Traditionally 0 means success and a number means failure. However, in this case 0 means the image is not mounted, and a non zero number is the ASCII code of the drive letter. To get that code in PowerShell is simple: $proc = Start-Process  -Wait
"C:\Program Files (x86)\WinCDEmu\batchmnt64.exe" 
-ArgumentList '/check', '"C:\Users\Justin\SQL Server Media\2008R2\en_sql_server_2008_r2_developer_x86_x64_ia64_dvd_522665.iso"'
-PassThru;
[char] $proc.ExitCode The Start-Process cmdlet normally returns immediately without output. The -PassThru switch makes it return information about the process it created, and -Wait make the cmdlet wait for the process to exit, so that information includes the exit code. Finally to turn that ASCII code to the drive letter we cast with [char]. ## May 05, 2013 ### Josef "Jeff" Sipek #### Instrument Flying I was paging through a smart collection in Lightroom, when I came across a batch of photos from early December that I did not share yet. (A smart collection is filter that will only show you photos satisfying a predicate.) On December 2nd, one of the people I work with (the same person that told me exactly how easy it is to sign up for lessons) told me that he was going up to do a couple of practice instrument approaches to Jackson (KJXN) in the club’s Cessna 182. He then asked if I wanted to go along. I said yes. It was a warm, overcast day…you know, the kind when the weather seems to sap all the motivation out of you. I was going to sit in the back (the other front seat was occupied by another person I work with — also a pilot) and play with my camera. Below are the some of the better shots; there are more in the gallery. Getting ready to take off: US-127 and W Berry Rd: The pilot: The co-pilot: On the way back to Ann Arbor (KARB), we climbed to five thousand feet, which took us out of the clouds. Since I was sitting in the back, I was able to swivel around and enjoy the sunset on a completely overcast day. The experience totally made my day. After I get my private pilot certificate, I am definitely going to consider getting instrument rated. The clouds were very fluffy. ## May 03, 2013 ### Justin Dearing #### Setting the Visual Studio TFS diff and merge tools with PowerShell I recently wrote this script to let me quickly change the diff and merge tools TFS uses from PowerShell. I plan to make it a module and add it to the StudioShell Contrib package by Jim Christopher (blog|twitter). For now, I share it as a gist and place it on this blog. The script supports Visual Studio 2008-2012 and the following diff tools: Enjoy! ## April 28, 2013 ### Eitan Adler #### Pre-Interview NDAs Are Bad I get quite a few emails from business folk asking me to interview with them or forward their request to other coders I know. Given the volume it isn't feasible to respond affirmatively to all these requests. If you want to get a coder's attention there are a lot of things you could do, but there is one thing you shouldn't do: require them to sign an NDA before you interview them. From the candidates point of view: 1. There are a lot more ideas than qualified candidates. 2. Its unlikely your idea is original. It doesn't mean anyone else is working on it, just that someone else probably thought of it. 3. Lets say the candidate was working on a similar, if not identical project. If the candidate fails to continue with you now they have to consult a lawyer to make sure you can't sue them for a project they were working on before 4. NDAs are hard legal documents and shouldn't be signed without consulting a lawyer. Does the candidate really want to find a lawyer before interviewing with you? 5. An NDA puts the entire obligation on the candidate. What does the candidate get from you? From a company founders point of view: 1. Everyone talks about the companies they interview with to someone. Do you want to be that strange company which made them sign an NDA? It can harm your reputation easily. 2. NDAs do not stop leaks. They serve to create liability when a leak occurs. Do you want to be the company that sues people that interview with them? There are some exceptions; for example government and security jobs may require security clearance and an NDA. For more jobs it is possible to determine if a coder is qualified and a good fit without disclosing confidential company secrets. ### Josef "Jeff" Sipek #### Change Ringing - The Changes We have seen what a bell tower set up for change ringing looks like; we have looked at the mechanics of ringing a single bell and what it sounds like if you ring the bells in what is called rounds (all bells ring one after each other in order of pitch, starting with the treble and ending with the tenor). Ringing rounds is good practice, but ringing would be really boring if that was all there was. Someone at some point decided that it’d be fun for one of the ringers to be a conductor, and direct the other ringers to do the most obvious thing — swap around. So, for example, suppose we have 6 bells, the treble is the first, and the tenor is the last. First, we get rounds by ringing all of them in numerical order: 123456  Then, the conductor makes a call telling two bells to change around. For example, say that the conductor says: 5 to 3. This tells the person ringing bell number 5 that the next hand stroke (I completely skipped over this part in the previous post, but bell strikes come in pairs: hand stroke, and back stroke) he should follow the bell number 3. In other words, the new order will be: 123546  You can see that in addition to the 5 changing place, the 4 had to move too! Now, it is following the 5. Until the next call, the bells go in this order. Then the conductor may say something like: 3 to 1, or 3 to treble. Just as before, 2 bells move. This time, it is the 2 and the 3, yielding: 132546  Let’s have another call…5 to 3. Now, we have: 135246  This pattern (all odd bells in increasing order, followed by all even bells in increasing order) is called Queens. There are many such patterns. Ringing traditionally starts in rounds, and ends in rounds. So, let’s have a few calls and return the bells to rounds.  3 to the lead (this means that 3 will be the first bell) 315246 4 to 5 315426 4 to the treble 314526 2 to 4 314256 treble lead 134256 2 to 3 132456 rounds next 123456 There we have it. We’re back in rounds. There was nothing special about the order of these changes. Well, there is one rule: the bells that are changing places must be adjacent. So, for example, if we start in rounds, we can’t do 4 to the treble. Why is that? These bells are heavy, and especially the heavier ones (>10 cwt) will not move that far easily. Remember, this is bell ringing, not wrestling. ## April 22, 2013 ### Josef "Jeff" Sipek #### Plotting with ggmap Recently, I came across ggmap package for R. It supposedly makes for some very easy plotting on top of Google Maps or OpenStreetMap. I grabbed a GPS recording I had laying around, and gave it a try. You may recall my previous attempts at plotting GPS data. This time, the data file I was using was recorded with a USB GPS dongle. The data is much nicer than what a cheap smartphone GPS could produce. > head(pts) time ept lat lon alt epx epy mode 1 1357826674 0.005 42.22712 -83.75227 297.7 9.436 12.755 3 2 1357826675 0.005 42.22712 -83.75227 297.9 9.436 12.755 3 3 1357826676 0.005 42.22712 -83.75227 298.1 9.436 12.755 3 4 1357826677 0.005 42.22712 -83.75227 298.4 9.436 12.755 3 5 1357826678 0.005 42.22712 -83.75227 298.6 9.436 12.755 3 6 1357826679 0.005 42.22712 -83.75227 298.8 9.436 12.755 3  For this test, I used only the latitude, longitude, and altitude columns. Since the altitude is in meters, I multiplied it by 3.2 to get a rough altitude in feet. Since the data file is long and goes all over, I truncated it to only the last 33 minutes. The magical function is the get_map function. You feed it a location, a zoom level, and the type of map and it returns the image. Once you have the map data, you can use it with the ggmap function to make a plot. ggmap behaves a lot like ggplot2’s ggplot function and so I felt right at home. Since the data I am trying to plot is a sequence of latitude and longitude observations, I’m going to use the geom_path function to plot them. Using geom_line would not produce a path since it reorders the data points. Second, I’m plotting the altitude as the color. Here are the resulting images: If you are wondering why the line doesn’t follow any roads… Roads? Where we’re going, we don’t need roads. (Hint: flying) Here’s the entire script to get the plots: #!/usr/bin/env Rscript library(ggmap) pts <- read.csv("gps.csv") /* get the bounding box... left, bottom, right, top */ loc <- c(min(x$lon), min(x$lat), max(x$lon), max(x$lat)) for (type in c("roadmap","hybrid","terrain")) { print(type) map <- get_map(location=loc, zoom=13, maptype=type) p <- ggmap(map) + geom_path(aes(x=lon, y=lat, color=alt*3.2), data=x) jpeg(paste(type, "-preview.jpg", sep=""), width=600, height=600) print(p) dev.off() jpeg(paste(type, ".jpg", sep=""), width=1024, height=1024) print(p) dev.off() }  P.S. If you are going to use any of the maps for anything, you better read the terms of service. ## April 20, 2013 ### Josef "Jeff" Sipek #### Matthaei Botanical Gardens Back in early February, Holly and I went to the Matthaei Botanical Gardens. I took my camera with me. After over two months of doing nothing with the photos, I finally managed to post-process some of them. I have no idea what the various plants are called — I probably should have made note of the signs next to each plant. (photo gallery) This one didn’t turn out as nicely as I hoped. Specifically, it is a little blurry. Maybe I’ll go back at some point to retake the photo. This one is just cool. I think this is some kind of Aloe. ## April 19, 2013 ### Josef "Jeff" Sipek #### IPS: The Manifest In the past, I have mentioned that IPS is great. I think it is about time I gave you more information about it. This time, I’ll talk about the manifest and some core IPS ideals. IPS, Image Packaging System, has some really neat ideas. Each package contains a manifest. The manifest is a file which list actions. Some very common actions are “install a file at path X,” “create a symlink from X to Y,” as well as “create user account X.” The great thing about this, is that the manifest completely describes what needs to be done to the system to install a package. Uninstalling a package simply undoes the actions — delete files, symlinks, users. (This is where the “image” in IPS comes from — you can assemble the system image from the manifests.) For example, here is the (slightly hand edited) manifest for OpenIndiana’s rsync package: set name=pkg.fmri value=pkg://openindiana.org/network/rsync@3.0.9,5.11-0.151.1.7:20121003T221151Z set name=org.opensolaris.consolidation value=sfw set name=variant.opensolaris.zone value=global value=nonglobal set name=description value="rsync - faster, flexible replacement for rcp" set name=variant.arch value=i386 set name=pkg.summary value="rsync - faster, flexible replacement for rcp" set name=pkg.description value="rsync - A utility that provides fast incremental file transfer and copy." set name=info.classification value="org.opensolaris.category.2008:Applications/System Utilities" dir group=sys mode=0755 owner=root path=usr dir group=bin mode=0755 owner=root path=usr/bin dir group=sys mode=0755 owner=root path=usr/share dir group=bin mode=0755 owner=root path=usr/share/man dir group=bin mode=0755 owner=root path=usr/share/man/man1 dir group=bin mode=0755 owner=root path=usr/share/man/man5 license 88142ae0b65e59112954efdf152bb2342e43f5e7 chash=3b72b91c9315427c1994ebc5287dbe451c0731dc license=SUNWrsync.copyright pkg.csize=12402 pkg.size=35791 file 02f1be6412dd2c47776a62f6e765ad04d4eb328c chash=945deb12b17a9fd37461df4db7e2551ad814f88b elfarch=i386 elfbits=32 elfhash=1d3feb5e8532868b099e8ec373dbe0bea4f218f1 group=bin mode=0555 owner=root path=usr/bin/rsync pkg.csize=191690 pkg.size=395556 file 7bc01c64331c5937d2d552fd93268580d5dd7c66 chash=328e86655be05511b2612c7b5504091756ef7e61 group=bin mode=0444 owner=root path=usr/share/man/man1/rsync.1 pkg.csize=50628 pkg.size=165934 file 006fa773e1be3fecf7bbfb6c708ba25ddcb0005e chash=9e403b4965ec233c5e734e6fcf829a034d22aba9 group=bin mode=0444 owner=root path=usr/share/man/man5/rsyncd.conf.5 pkg.csize=12610 pkg.size=37410 depend fmri=consolidation/sfw/sfw-incorporation type=require depend fmri=system/library@0.5.11-0.151.1.7 type=require  The manifest is very easily readable. It is obvious that there are several sets of actions: metadata specifies the FMRI, description, and architecture among others directories lists all the directories that need to be created/deleted during installation/removal license specifies the file with the text of the license for the package files in general, most actions are file actions — each installs a file dependencies lastly, rsync depends on system/library and sfw-incorporation The above example is missing symlinks, hardlinks, user accounts, services, and device driver related actions. Many package management systems have the ability to execute arbitrary scripts after installation or prior to removal. IPS does not allow this since it would violate the idea that the manifest completely describes the package. This means (in theory), that one can tell IPS to install the base packages into a directory somewhere, and at the end one has a working system. It all sounds good, doesn’t it? As always, the devil is in the details. First of all, sometimes there’s just no clean way to perform all package setup at install time. One just needs a script to run to take care of the post-install configuration. Since IPS doesn’t support this, package developers often create a transient SMF manifest and let SMF run the script after the installation completes. This is just ugly, but not the end of the world. #### Requests? I’m going to try something new. Instead of posting a random thought every so often, I’m going to take requests. What do you want me to talk about next? #### Math test I decided to finally implement some math support. Here’s a test post. I hope equation support will come in handy. ## February 13, 2013 ### Josef "Jeff" Sipek #### FAST 2013 Since FAST starts today, yesterday was dedicated to flying out to San Jose. Once at KDTW, I spent most of my wait there watching planes at the gates as well as watching more planes take off on 22L. Somehow, it was fascinating to watch them land on 22L and see 22R in the background — the same 22R that I got to do touch and go’s on a couple of weeks ago. I think not having to aviate first let me enjoy the sights — planes large and small barrelling down the runway and then *poof* they gently lift off the runway. At about 500 feet the gear retracts. It’s magic! At one point, I saw the plane at the adjacent gate being prepared for its next flight. I both enjoyed seeing and sympathized with one of the crew (I assume the first officer since I suspect the captain wanted to stay warm) walking around the plane visually inspecting it. I know how annoying it is to be outside when it is cold to make sure the plane is safe to fly, yet I find it comforting that the same rules apply not only to Cessna 172s but also to Airbus A320s. The first leg of the trip took me to KSLC. I brought my copy of the FAR/AIM with me. I read a bunch. I looked out the window a bunch. After we got past Lake Michigan, the sky cleared up allowing me to watch the ground below instead of the layer of overcast. I was very surprised to discover that the snow covered landscape makes it very easy to spot airports. Well, it is easy to spot paved runways that have been plowed. The approach to KSLC was pretty cool. I never thought about the landscape in Utah before, but it turns out that Salt Lake City is surrounded by some serious mountains. Now, throw in winter weather with overcast and you’ll end up with a sea of white except for a few places where the mountains are peaking through. Learning to fly in southeastern Michigan doesn’t make you think about mountains — there just aren’t any. Seeing the mountains peeking through the clouds was a scary reminder that there are more things in the sky than just other airplanes and some towers. If one were flying VFR above the clouds (which is a bad idea), where would be a safe place to descend? Obviously not where the mountains peak through, but any other place might be just as bad. The best looking place could have a mountain or a ridge few hundred feet below the cloud tops. Granted, sectional charts would depict all the mountains but it is a dangerous game to play. I knew we would end up descending through the overcast and so I played a little game I expected to lose. Once we were in the clouds, I tried to keep track of our attitude by just sensing the forces. I knew I would fail, but I thought it would be interesting to try my best. We spent maybe 90 to 120 seconds in the clouds. At the end, I definitely felt like we were in a right bank — Spatial disorentation. I knew that we probably weren’t, but without visual information to fix up my perception there was no way for me to know. We landed. I watched all the airport signs and markings, following our progress on an airport diagram. Once people started getting off the plane, I decided to ask to see the airworthiness certificate. The first officer (I think) found all the paperwork in the cockpit and showed me. It was really cool to see the same form I see every time I fly the 172 but filled out for an A320. (Theirs was laminated!) We chatted for a little bit about what I fly, and how it’s a good plane. It was fun. It was time to get to my connecting flight. Nothing interesting happened. I spent about half the flight watching the outside and half reading my book. After arriving to KSJC, I got up from my seat in the small but comfy plane (CRJ200). I grabbed my backpack from the overhead bin with one hand since the other hand not only had my hoodie draped over but was holding the FAR/AIM. I started filing out. All that was left to do was give the thank-you-for-landing-safely-and-not-killing-me nod to the crew as I exited the plane. The captain or FO happened to be standing in the cockpit door saying good bye to passengers. I nodded as planned. He responded: “good book.” I smiled. ### Eitan Adler #### Don't Use Timing Functions for Profiling One common technique for profiling programs is to use the gettimeofday system call (with code that looks something like this): Example (incorrect) code that uses gettimeofday - click to view#include <time.h>#include <stdlib.h>#include <stdio.h>void function(void){ struct timeval before; struct timeval after; gettimeofday(&before, NULL); codetoprofile(); gettimeofday(&after, NULL); time_t delta = after.tv_sec - before.tv_sec; printf("%ld\n",delta);} However, using gettimeofday(2) or time(3) or any function designed to get a time of day to obtain profiling information is wrong for many reasons: 1. Time can go backwards. In a virtualized environment this can happen quite often. In non-virtualized environments this can happen due to time zones. Even passing CLOCK_MONOTONIC to clock(3) doesn't help as it can go backwards during a leap second expansion. 2. Time can change drastically for no reason. Systems with NTP enabled periodically sync their time with a time source. This can cause the system time to change by minutes, hours, or even days! 3. These functions measure Wall Clock time. Time spent on entirely unrelated processes is going to be included in the profiling data! 4. Even if you have disabled everything else on the system[1] the delta computed above includes both of User time and System Time. If your algorithm is very fast but the kernel has a slow implementation of some system call you won't learn much. 5. gettimeofday relies on the cpu clock which may differ across cores resulting in time skew. So what should be used instead? There isn't a good, portable, function to obtain profiling information. However there are options for those not tied to a particular system (or those willing to maintain multiple implementations for different systems. The getrusage(2) system call is one option for profiling data. This provides different fields for user time (ru_utime) and system time (ru_stime) at a relatively high level of precision and accuracy. Using DTraces profiling provider also seems to be a decent choice although I limited experience with it. Finally, using APIs meant to access hardware specific features such as FreeBSD's hwpmc is likely to provide the best results at the cost of being the least portable. Linux has similar features such as oprofile and perf. Using dedicated profilers such as Intel's vtunes[2] may also be worthwhile. 1. Including networking, background process swapping, cron, etc. 2. A FreeBSD version is available. update 2012-11-26: Include note about clock skew across cores. Update 2013-02-13: Update and fix a massive error I had w.r.t. clock(3) ## January 20, 2013 ### Nate Berry #### Installing Cyanogenmod on ASUS Transformer TF101 editor’s note: I’ve updated this story many times since I first posted it. For the current status, scroll all the way to the end of the story as I’ve appended update notices to the end each time I upgraded or switched Roms. Back in December, 2011 when I first got my ASUS Transformer TF101 it […] ## January 19, 2013 ### Josef "Jeff" Sipek #### Useless reinterpret_cast in C++ A few months ago (for whatever reason, I didn’t publish this post earlier), I happened to stumble on some C++ code that I had to modify. While trying to make things work, I happened to get code that essentially was: uintptr_t x = ...; uintptr_t y = reinterpret_cast<uintptr_t>(x);  Yes, the cast is useless. The actual code I had was much more complicated and it wasn’t immediately obvious that ‘x’ was already a uintptr_t. Thinking about it now, I would expect GCC to give a warning about a useless cast. What I did not expect was what I got: foo.cpp:189:3: error: invalid cast from type "uintptr_t {aka long unsigned int}" to type "uintptr_t {aka long unsigned int}"  Huh? To me it seems a bit silly that the compiler does not know how to convert from one type to the same type. (For what it’s worth, this is GCC 4.6.2.) Can anyone who knows more about GCC and/or C++ shed some light on this? #### Serial Console Over the past couple of days, I’ve been testing my changes to the crashdump core in Illumos. (Here’s why.) I do most of my development on my laptop — either directly, or I use it to ssh into a dev box. For Illumos development, I use the ssh approach. Often, I end up using my ancient desktop (pre-HyperThreading era 2GHz Pentium 4) as a test machine. It gets pretty annoying to have a physical keyboard and monitor to deal with when the system crashes. The obvious solution is to use a serial console. Sadly, all the “Solaris serial console howtos” leave a lot to be desired. As a result, I am going to document the steps here. I’m connecting from Solaris to Solaris. If you use Linux on one of the boxes, you will have to do it a little differently. #### Test Box First, let’s change the console speed from the default 9600 to a more reasonable 115200. In /etc/ttydefs change the console line to: console:115200 hupcl opost onlcr:115200::console  Second, we need to tell the kernel to use the serial port as a console. Here, I’m going to assume that you are using the first serial port (i.e., ttya). So, open up your Grub config (/rpool/boot/grub/menu.lst assuming your root pool is rpool) and find the currently active entry. You’ll see something like this: title openindiana-8 findroot (pool_rpool,0,a) bootfs rpool/ROOT/openindiana-8 splashimage /boot/splashimage.xpm foreground FF0000 background A8A8A8 kernel$ /platform/i86pc/kernel/$ISADIR/unix -B$ZFS-BOOTFS
module$/platform/i86pc/$ISADIR/boot_archive


We need to add two options. One to tell the kernel to use the serial port as a console, and one to tell it the serial config (rate, parity, etc.).

You’ll want to change the kernel$line to: kernel$ /platform/i86pc/kernel/$ISADIR/unix -B$ZFS-BOOTFS,console=ttya,ttya-mode="115200,8,n,1,-" -k


Note that we appended the options with commas to the existing -B. If you do not already have a -B, just add it and the two new options. The -k will make the kernel drop into the debugger when bad things happen. You can omit it if you just want a serial console without the debugger getting loaded.

There’s one last thing left to do. Let’s tell grub to use the same serial port and not use a splash image. This can be done by adding these lines to the top of your menu.lst:

serial --unit=0 --speed=115200
terminal serial


and removing (commenting out) the splashimage line.

So, what happens if you make all these changes and then beadm creates a new BE? The right thing! beadm will copy over all the kernel options so your new BE will just work.

#### Dev Box

I use OpenIndiana on my dev box. I could have used minicom, but I find minicom to be a huge pain unless you have a modem you want to talk to. I’m told that screen can talk to serial ports as well. I decided to keep things super-simple and configured tip.

First, one edits /etc/remote. I just changed the definition for hardwire to point to the first serial port (/dev/term/a) and use the right speed (115200):

hardwire:\
:dv=/dev/term/a:br#115200:el=^C^S^Q^U^D:ie=%$:oe=^D:  Then, I can just run a simple command to get the other system: $ tip hardwire


## January 11, 2013

### Nate Berry

update 131202: I would disregard most of this post since I’ve picked up a chromecast which, at $35 has made queueing up Youtube, Netflix, or HBOgo videos to the TV stupid easy from the android tablet. I replaced the underpowered atom box with an Intel NUC for more serious gaming (minecraft mainly). ====================== Ive got […] ## January 06, 2013 ### Justin Dearing #### Announcing SevenZipCmdLine.MSBuild This was a quick and dirty thing born out of necessity, and need to make zip files of PoshRunner so I could make its chocolatey package. I made MSBuild tasks for creating 7zip and zip files out of the $(TargetDir) of an MSBuild project. There is a nuget package for it. Simply include it in your project via nuget and build it from the command line with the following command line:

%windir%\microsoft.net\framework\v4.0.30319\msbuild __PROJECT_FOLDER__\__PROJECT_FILE__ /t:SevenZipBin,ZipBin

This will create project.zip and project.7z in __PROJECT_FOLDER__\bin\Target. To see how to override some of the defaults, look at this msbuild file in PoshRunner.

Source code is available via a github repo, and patches are welcome!

#### PoshRunner now on SourceForge and Chocolatey

I’ve been periodically hacking away at PoshRunner. I have lots of plans for it. Some of these are rewriting some of it in C++, allowing you to log output to MongoDB and total world domination! However, today’s news is not as grand.

The first piece of news is I made a PoshRunner sourceforge project to distribute the binaries. To download the latest version, click here. Secondly, there is now a PoshRunner chocolatey package, so you can install it via chocolatey. Finally, there is not a lot of documentation on PoshRunner.exe, so here is the output of poshrunner -help.

Usage: poshrunner.exe [OPTION] [...]

Options:
--appdomainname=NAME                                     Name to give the AppDomain the PowerShell script executes in.
--config=CONFIGFILE                                      The name of the app.config file for the script. Default is scriptName.config
-f SCRIPT, --script=SCRIPT                               Name of the script to run.
-h, --help                                               Show help and exit
--log4netconfig=LOG4NETCONFIGFILE                        Override the default config file for log4net.
--log4netconfigtype=LOG4NETCONFIGTYPE                    The type of Log4Net configuration.
-v, --version                                            Show version info and exit

## January 04, 2013

#### Correctly Verifying an Email Address

Some services that accept email addresses want to ensure that these email addresses are valid.

There are multiple aspects to an email being valid:
1. The address is syntactically valid.
2. An SMTP server accepts mail for the address.
4. The address belongs to the person submitting it.

How does one verify an email address? I'll start with the wrong solutions and build up the correct one.

### Possibility #0 - The Regular Expression

Discussions on a correct regular expression to parse email addresses are endless. They are almost always wrong. Even really basic pattern matching such as *@*.* is wrong: it will reject the valid email address n@ai.[5]

Even a fully correct regular expression does not tell you if the mailbox is valid or reachable.

This scores 0/4 on the validity checking scale.

### Possibility #1 - The VRFY Command

The oldest mechanism for verifying an email address is the VRFY mechanism in RFC821 section 4.1.1:

VERIFY (VRFY) This command asks the receiver to confirm that the argument identifies a user. If it is a user name, the full name of the user (if known) and the fully specified mailbox are returned.

However this isn't sufficient. Most SMTP servers disable this feature for security and anti-spam reasons. This feature could be used to enumerate every username on the server to perform more targeted password guessing attacks:

Both SMTP VRFY and EXPN provide means for a potential spammer to test whether the addresses on his list are valid (VRFY)... Therefore, the MTA SHOULD control who is is allowed to issue these commands. This may be "on/off" or it may use access lists similar to those mentioned previously.

This feature wasn't guaranteed to be useful at the time the RFC was written:[1]

The VRFY and EXPN commands are not included in the minimum implementation (Section 4.5.1), and are not required to work across relays when they are implemented.

Finally, even if VRFY was fully implemented there is no guarantee that a human being reads the mail sent to that particular mailbox.

All of this makes VRFY useless as a validity checking mechanism so it scores 1/4 on the validity checking scale.

### Possibility #2 - Sending a Probe Message

With this method you try to connect with a mail server and pretends to send a real mail message but cut off before sending the message content. This is wrong for a for the following reasons:

A system administrator that disabled VRFY has a policy of not allowing for the testing for email addresses. Therefore the ability to test the email address by sending a probe should be considered a bug and must not be used.

The system might be set up to detect signs up of a probe such as cutting off early may rate limit or block the sender.

In addition, the SMTP may be temporarily down or the mailbox temporarily unavailable but this method provides no resilience against failure. This is especially true if this mechanism is attempting to provide real-time feedback to the user after submitting a form.

This scores 1/4 on the validity checking scale.

### Possibility #3 - Sending a Confirmation Mail

If one cares about if a human is reading the mailbox the simplest way to do so is send a confirmation mail. In the email include a link to a website (or set a special reply address) with some indication of what is being confirmed. For example, to confirm "user@example.com" is valid the link might be http://example.com/verify?email=user@example.com or http://example.com/verify?account=12345[2].

This method is resilient against temporary failures and forwarders. Temporary failures could be retried like a normal SMTP conversation.

This way it is unlikely that a non-human will trigger the verification email[3]. This approach solves some of the concerns, it suffers from a fatal flaw:

It isn't secure. It is usually trivial to guess the ID number, email account, other identifier. An attacker could sign up with someone else's email account and then go to the verification page for that user's account. It might be tempting to use a random ID but randomness implementations are usually not secure.

This scores 3/4 on the validity checking scale

### Possibility #4 - Sending a Confirmation Mail + HMAC

The correct solution is to send a confirmation, but include a MAC of the identifier in the verification mechanism (reply, or url) as well. A MAC is a construction used to authenticate a message by combining a secret key and the message contents. One family of constructions, HMAC, is a particularly good choice. This way the url might become http://example.com/verify?email=user@example.com&mac=74e6f7298a9c2d168935f58c001bad88[4]

Remember that the HMAC is a specific construction, not a naive hash. It would be wise to use a framework native function such as PHP's hash_hmac. Failing to include a secret into the construction would make the MAC trivially defeated by brute force.

This scores 4/4 on the validity checking scale

### Closing Notes

Getting email validation right is doable, but not as trivial as many of the existing solutions make it seem.

1. Note that RFC1123 more specifically spells out that VRFY MUST be implemented but MAY be disabled.

• This is not my luggage password.
• It is still possible for a auto-reply bot to trigger reply based verification schemes. Bots that click every link in received email are uncommon.
• This is HMAC-MD5. It isn't insecure as collisions aren't important for HMAC. I chose it because it is short.
• n@ai is a in-use email address by a person named Ian:
%dig +short ai MX10 mail.offshore.ai.
• Thank you to bd for proofreading and reviewing this blog post.

## December 27, 2012

### Justin Dearing

#### “Forking” a long running command to a new tab with ConEmu. The magic of -new_console:c

Here’s a quick tip I’d thought I’d share after being quite rightly told to RTFM by the author of ConEmu.

Suppose you are running FarManager from ConEmu and want to update all your chocolatey packages. You can do so with the command cup all. However, that will block your FarManager session until the cup all completes. You have four options to fix this:

1. You can start a new tab in ConEmu with the menu. This is undesirable because you’re obviously a command line guy.
2. You press Shift+Enter after the cup all command. This is undesirable because unless you configure ConEmu to intercept every new command window, a regular console window will appear. Also, the console will close automatically upon completion.
3. You can type cup all & pause and hit Shift+Enter to allow the window to stay open. Or
4. You can type cup all -new_console:c to open a new tab that will execute the command, and not close upon completion.

Obviously I recommend option 4.

## December 24, 2012

#### #!/bin/bash considered harmful

When one writes a shell script there are a variety of shebang lines that could be used:

• #!/bin/sh
• #!/usr/bin/env bash
• #!/bin/bash

or one of many other options.

Of these only the first two are possibly correct.

Using #!/bin/bash is wrong because:

• Sometimes bash isn't installed.
• If it is installed, it may not be in /bin
• If it is in /bin, the user may have decided to set PATH to use a different installation of bash. Using an absolute path like this overrides the user's choices.
• bash shouldn't be used for scripts intended for portability

If you have bash specific code use #!/usr/bin/env bash. If you want more portable code try using Debian's checkbashism to find instances of non-POSIX compliant shell scripting.

## December 22, 2012

### Justin Dearing

#### How to reference the registry in MSBuild4.0 (Visual Studio 2010) and later on a 64 bit OS

In the past I’ve written about using the Windows Registry to reference  assembly paths in Visual Studio. In it I made reference to the seminal article New Registry syntax in MSBuild v3.5, which is the dialect Visual Studio 2008 speaks. That syntax has served me well until recently.

See fate lead me to writing a small C++/CLI program. In it I had to refer to some .NET assemblies that were not installed in the GAC. They were however installed as part of a software package that wrote its install path to the registry. So I figured out which value it wrote the install directory to and referenced it in the .vcxproj file using the $(Registry:HKEY_LOCAL_MACHINE\Software\Company\Product@TargetDir). Unfortunately, it didn’t work! I did some troubleshooting and discovered it worked when I build the vcxproj from the command line with msbuild.exe. It seemed logical to blame it one the fact that I was using C++. Devenv.exe (the Visual Studio executable) must have been treating .vcxproj files differently than csproj and vbproj files. Then suddenly it dawned it me, the problem was I was running on a 64 bit version of Windows! This was a relief, because it meant that .vcxproj were not special or subject to unique bugs. To make a long story short, Visual Studio is a 32 bit application, and by default when a 32 bit process interacts with the registry on a 64 bit version of Windows, HKEY_LOCAL_MACHINE\Software gets redirected to HKEY_LOCAL_MACHINE\Software\Wow6432Node. This MSDN article explains the gory details. At first it seemed the only workaround was a custom MSBuild task line the MSBuild Extension Pack. I complained on twitter to Scott Hanselman (blog|twitter). He replied with this article talking about how the page faults, addressable memory space, etc was not an issue. That article made some good points. However, it didn’t address my (at the time) very real and legitimate concern. Scott said he’d ask around internally if I filed a connect bug and got David Kean (blog|twitter) involved in the conversation. I filed a connect bug. Then David pointed out a link to the MSBuild 4.0 key GetRegistryValueFromView. Here is a comparison of the old and new syntax using msbuild <Message/> msbuild tasks, the printf() of msbuild.  <Target Name="BeforeBuild"> <!-- Read the registry using the native MSBUILD 3.5 method: http://blogs.msdn.com/b/msbuild/archive/2007/05/04/new-registry-syntax-in-msbuild-v3-5.aspx --> <PropertyGroup> <MsBuildNativeProductId>$(Registry:HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion@ProductId)</MsBuildNativeProductId>
<MsBuildNativeProductName>$(Registry:HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion@ProductName)</MsBuildNativeProductName> <MsBuild4NativeProductId>$([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion', 'ProductId', null, RegistryView.Registry64))</MsBuild4NativeProductId>
<MsBuild4NativeProductName>$([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion', 'ProductName', null, RegistryView.Registry64))</MsBuild4NativeProductName> </PropertyGroup> <!-- Lets use the MSBuild Extension Pack (still no joy) http://www.msbuildextensionpack.com/help/4.0.5.0/html/9c8ecf24-3d8d-2b2d-e986-3e026dda95fe.htm --> <MSBuild.ExtensionPack.Computer.Registry TaskAction="Get" RegistryHive="LocalMachine" Key="SOFTWARE\Microsoft\Windows NT\CurrentVersion" Value="ProductId"> <Output PropertyName="MsBuildExtProductId" TaskParameter="Data" /> </MSBuild.ExtensionPack.Computer.Registry> <MSBuild.ExtensionPack.Computer.Registry TaskAction="Get" RegistryHive="LocalMachine" Key="SOFTWARE\Microsoft\Windows NT\CurrentVersion" Value="ProductName"> <Output PropertyName="MsBuildExtProductName" TaskParameter="Data" /> </MSBuild.ExtensionPack.Computer.Registry> <!-- And now RegistryView: http://msdn.microsoft.com/en-us/library/microsoft.win32.registryview.aspx --> <MSBuild.ExtensionPack.Computer.Registry TaskAction="Get" RegistryHive="LocalMachine" Key="SOFTWARE\Microsoft\Windows NT\CurrentVersion" Value="ProductId" RegistryView="Registry64"> <Output PropertyName="MsBuildExt64ProductId" TaskParameter="Data" /> </MSBuild.ExtensionPack.Computer.Registry> <MSBuild.ExtensionPack.Computer.Registry TaskAction="Get" RegistryHive="LocalMachine" Key="SOFTWARE\Microsoft\Windows NT\CurrentVersion" Value="ProductName" RegistryView="Registry64"> <Output PropertyName="MsBuildExt64ProductName" TaskParameter="Data" /> </MSBuild.ExtensionPack.Computer.Registry> <!-- All messages are of high importance so Visual Studio will display them by default. See: http://stackoverflow.com/questions/7557562/how-do-i-get-the-message-msbuild-task-that-shows-up-in-the-visual-studio-proje --> <Message Importance="High" Text="Using Msbuild Native: ProductId:$(MsBuildNativeProductId) ProductName: $(MsBuildNativeProductName)" /> <Message Importance="High" Text="Using Msbuild v4 Native: ProductId:$(MsBuild4NativeProductId) ProductName: $(MsBuild4NativeProductName)" /> <Message Importance="High" Text="Using Msbuild Extension Pack: ProductId:$(MsBuildExtProductId) ProductName: $(MsBuildExtProductName)" /> <Message Importance="High" Text="Using Msbuild Extension Pack: ProductId:$(MsBuildExt64ProductId) ProductName: $(MsBuildExt64ProductName)" /> </Target> That MSBuild code has been tested via this github project on two machines running Visual Studio 2010 SP1. One has Windows XP3 32 bit and the other runs Windows 8 64 bit. I’ve verified that $([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\whatever', 'value', null, RegistryView.Registry64)) will give you the same value as you see in regedit.exe

Yes MSBuild 4.0, and therefore Visual Studio 2010 solved this problem and I simply didn’t google hard enough for the answer. However, I googled pretty hard, and I’m pretty good at googling. I didn’t think I was particularly rash in “pulling the Hanselman card.” The best I can do is write this blog post, comment on other blogs and answer questions on StackOverflow to fill the internet with references to the MSBuild syntax.

## December 21, 2012

#### Cormen on Algorithms: Blogging my way through [1/?]

Two of my good friends recently started reading Introduction to Algorithms by Thomas H. Cormen, et. al. Being unable to resist peer pressure I decided to follow and read along.

I plan on blogging my way through the chapters writing my answers to the questions as I go through the book. Like most of my plans they don't always work out, but one could try.

Here it goes!

1.1-1: Give a real-world example in which each of the following computational problems appears: (a)Sorting, (b) Determining the best order for multiplying matrices, (c) finding the convex hull of a set of points.
Sorting - Sorting comes up in virtually every algorithm one could think of. Everything from optimizing monetary investments to efficient compression algorithms has to sort data at some point or another. A harder question might be: Name one non-trivial algorithm that doesn't require sorting.
Multiplying Matrices - graphics and scientific problems frequently require matrix operations.
Convex Hull - Collision detection for use in games, modeling biological systems, or other related work could make use of this
1.1-2: Other than speed what other measures of efficiency might one use in a real-world setting?
It is possible to optimize for (and against) every limited resource. For example minimizing the amount of memory usage is important for embedded applications (and desktop ones too). Reducing total disk I/O is important to increase the longevity of hard drives. On a less technical note optimizing for monetary cost or man hours expended is important too.
1.1-3: Select a data structure you have seen previously and discuss its strengths and limitations
One of the most interesting data structures I know is the Bloom Filter. It is a probabilistic data structure that can determine if an element is NOT in a set but can't determine definitively if an element is in a set. It works by hashing each element in a set to a fixed size bit array. It then ORs the hash with itself (which starts at all zeros). One can test to see if an element is in a set by generating the hash and testing to see if every bit set to 1 in the queried element is set to 1 in the filter. If it is then you have some degree of confidence that the element is in the set. Any negative means that what you are querying for has not been added.
While most probabilistic structures have certain properties in common, bloom filters have a number of interesting pros and cons.
1. A negative result is definitive - if a query returns that an element has not been added then one knows this to be 100% true.
2. Since hashes are fixed size the amount of memory a Bloom Filter uses is known and bounded.
3. Bloom filters can quickly become useless with large amounts of data. It is possible that every bit will be set to 1 which effectively makes the query a NOP.
4. It is impossible to remove data from a bloom filter. One can't just set all the bits of the hash to a zero because that might be removing other elements as well.
5. Without a second set of data there is no way to deterministically list all elements (unlike other probabilistic data structures such as Skip Lists).
1.1-4: How are the shortest path and traveling salesmen problems similar? How are they different?
The shortest path problem is
Given a weighted (undirected) graph G:, a start vertex $V_0$ and an end vertex $V_e$, find a path between $V_0$ and $V_e$ such that the sum of the weights is minimized. This could be expanded to $Given a weighted graph G:, find a path between every pair such that the sum of the weights for each path is minimized. Traveling salesman is defined as: Given a weighted, undirected, graph G: and a start vertex$V_0$find a path starting and ending at$V_0$such that it passes through every other vertex exactly once and the sum of the weights is minimized. The traveling salesman problem might make use of the shortest path problem repeatedly in order to come up with the correct solution. 1.1-5: Come up with a real-world problem in which only the best solution will do. Then come up with a problem in which a solution that is "approximately" the best will do? There are very few problems where one needs the objectively optimal solution. Mathematical questions are the only problems I could think of that need that level of accuracy. Virtually every problem needs a good enough solution. Some examples include finding a fast route for packets on the internet or locating a piece of data in a database. update 2011-06-30: modified text of answers 1.1-3 and 1.1-5 to be more clear. ## December 14, 2012 ### Justin Dearing #### Trouble purchasing an MSDN subscription Recently I’ve decided to purchase a Visual Studio 2012 Professional MSDN subscription. There are several reasons for this. First of all, my Visual Studio 2012 30 day trial ran out and I absolutely need the non-express edition of it for a side project. Secondly, I’d like to be able to test poshrunner in older versions of Windows. Thirdly, Having access to checked builds of Windows would allow me to lean more in my Windows Internals study group. I started my journey to an MSDN subscription on Saturday December 8th 2012. I was able to access my benefits Thursday December 12th. The four day journey was not pleasant. On Saturday I sat down credit card in hand and placed my order. I didn’t save the receipt (stupid I know). I got no confirmation email, and I did not see an authorization on my credit card. I waited. On Sunday I got notification that my order was pending. Perhaps they wanted to verify I wasn’t a software pirate. It seemed annoying that this wasn’t an instant process, but I remained patient and understanding. Then Tuesday I woke up to an email stating that my order was canceled. MSDN customer support hours are from 5:30PST to 17:30PST. I am on EST so I had to wait until 8:30 to call. I was already in the office at that time. I was told the bank did not accept my charge, but that if I placed the order again in 48 hours, the security check would be overridden and I would be able to download the software instantaneously I tried buying the MSDN license again. It failed, but instantaneously. I called my bank. I was told both authorizations were successful on their end. So I called Microsoft again. They claimed a system glitch prevented them from accepting the payment. The specific phrase “system glitch” was used consistently by several MSDN customer support representatives over several phone calls to describe instances when my bank authorized a charge but Microsoft rejected it. I never uttered that phrase once. I’m suspicious this is a common enough occurrence that there are procedures and guidelines in place documenting the “system glitch”. At this point they asked if I placed the second order from a computer on the same network as the first. I said no. The first order was placed at home and the second order was placed in the office. I was told to try again from the same network. I don’t have remote access to my home computer (take away my geek card) so I had to wait till I got home. I asked what would happen if it didn’t work when I tried again. I was told the only other option was to place the order over the phone, and that phone orders take three business days to process. I didn’t get home until after midnight so I didn’t try Tuesday night. ### Wednesday Wednesday I awoke and attempted to place the order. It failed. I went into the office, called customer support and attempted a phone order. It failed, because my bank decided three identical charges for$1,305.41 (Microsoft collects sales tax in NY on top of the $1199 base price) seemed suspicious. Luckily I am able to fix that by responding to a text message CitiBank sent me. A chat session and a call later and the purchase seems to have been resolved. I would have my subscription on Monday. ### Thursday Thursday I got a call saying my order was canceled. However, T-Mobile dropped the call before I could deal with it. When I had some free time I called CitiBank. The first operator gave me some free airline miles and transfered me to Ashley, the fraud department specialist. Ashley ensured me Microsoft could bang my credit card as often and as many times as they wanted to. I then called MSDN support and talked to Chris. I summarized the situation for Chris. I told him I didn’t want to wait another three days for a phone order. He said he had no power to deal with that. He determined my order from Wednesday was still going through. After putting me on hold a few times, he said he would get me a welcome email that would let me download my MSDN products in 30 minutes. I got his name and a case number and he did just that. I got a call back to ensure I was able to access my download, and everything worked just fine. I’m a little curious as to why his tune changed and he was able to get me my subscription number in thirty minutes though. ### Conclusion First of all I have to thank CitiBank for their actions. At no point did they do anything wrong or fail to do anything. Secondly, the customer service staff at MSDN were very professional and understanding, despite my growing irateness. However, the fact is they were never able to tell me why my order was canceled. If they at some point explained that I was flagged as a pirate, or something else, I’d be a bit more understanding. Thirdly, why does the process take so long? I was able to buy a new car in about an hour. It took a few days for delivery because the package I wanted wasn’t on the lot. However, it took less than four days for the car to be driver off the lot (by someone else because it was the car I learned stick on). The MSDN subscription sales model seems to make sense for businesses purchasing volumne licenses. They take checks, you can talk to a real person. Its not at all optimized for the person that wants to buy one MSDN license “right now”. People like me are on the lower end of the income bracket for Microsoft, but we are also the ones that are either really passionate hobbyists, entrepreneurs, or the people on the fence. While I’m still going to develop on the Microsoft stack for years, this experience has left a bad taste in my mouth for their purchase process, compared with for example JetBrains or RedGate. In the end the real issue was the lack of transparency. Its generally safe to assume that when you are buying software for online delivery, you will have it within an hour. If Microsoft made it clear its not as simple for them, first time subscribers like me would be a little more understanding. ## December 10, 2012 ### Justin Dearing #### Announcing ILRepack-BuildTasks ILMerge is a great tool for creating a single executable out of multiple .NET assemblies. However, it has two limitations. The first is that its not open source, and you’re not supposed to include a copy in your public source code repos. The second is its an executable and therefore needs to be called from the MSBuild post build event as opposed to a proper MSBuild task. Each problem had its own mutually exclusive solution. For the first problem, Francois Valdy (blog|twitter) wrote IL-Repack, an open source clone of ILMerge. So now I could have an exe that could be included in github repos. This allowed my projects (specifically poshrunner.exe) to have a merge step in the postbuild. ALthough this was still a clunky batch file embedded in the csproj, it just worked. For the second problem, Marcus Griep (blog|twitter) created ILMerge Tasks. Since the merging APIs in ILMerge are all exposed as public members, you can simply reference the exe as a dll. He did this in an MSBuild DLL. However, this dll still requires ILMerge.exe. These solutions are no longer mutually exclusive. I’ve forked ILMerge-tasks (and contacted Marcus to see if he wants to incorporate my changes). I had it reference ILRepack. The new project is called ILRepack-BuildTasks on github. Enjoy! #### A misleading SQL Error Message Error: 18456, Severity: 14, State: 38 On Friday I had to help a client out with an error that kept appearing in their event logs: Login failed for user ‘domain\user’. Reason: Failed to open the explicitly specified database. [CLIENT: 192.168.0.25] It took me a while to troubleshoot the error. The client’s internal system administrator (who was quite sharp) only had to call me in in the first place because the error was a little misleading. See the first thing I did when I saw that was audit login failures. In the trace, the database was listed as master. The user had full access to master. However, I later learned that the user was switching from master to a non-existent database, which was triggering this error. I figured this out thanks to Sadequl Hussain‘s article, SQL Server Error 18456: Finding the Missing Databases. Sadequl explains in detail the how and the why. However, the take home is you need to trace for User Error Message to get the message that tells you what database you are connecting to. This took me about an hour to solve. Honestly, it was a bit humbling of an experience. It took me an hour to figure out something a full time senior DBA would probably be able to solve in 15 minutes. However, I’ll probably be able to solve this error in 15 minutes myself go forward. Finally, the fact that it took me a while to find this one blog article that explained what the issue actually was proves how dependent I’ve become upon google. ## December 05, 2012 ### Justin Dearing #### The #MongoHelp twitter manifesto ## What is #mongohelp? #mongohelp is a hashtag on twitter that members of the mongo community use for support. ## What’s appropiate to tag #mongohelp? In order for something to be appropiate for appending the #mongohelp hash tag to one of the following two criteria must be met 1. You are asking a question related to MongoDb 2. You are @replying to a question #mongohelp with an answer or a request for clarification Those are the rules. You can reply with a recommendation for a commercial product or service, but please disclose if you work for partner with or own the product. You can’t make unsolicited promotions with this hash tag. You can’t post a link to your latest blog article to the tag, unless you are answering a question. ## Any other guidelines? Twitter is about instant gratification, so if you ask a question on #mongohelp, its expected you will be sticking around for 10-15 minutes for an answer. Also, if you have a long question to ask on #mongohelp, you should ask it in one of these forums, and link to the question. ## This seems awfully familar Your absolutely right! I borrowed the idea from Aaron Nelson’s (blog|twitter) proposal that was documented by Brent Ozar (blog|twitter) of creating the #sqlhelp hashtag. I’ve spent the last year speaking about MongoDb at SQL Saturdays and after observing both communities. Both communities are very self organized, and provide a lot of free help. The one thing I saw missing from the MongoDB community was a grassroots support tag to connect with others. ## November 22, 2012 ### Justin Dearing #### Announcing poshrunner.exe so MyScript.ps1 can use MyScript.ps1.config instead of powershell.exe.config I have a tendency to do odd things with technology so that things don’t just work. When I point out the obscure edge cases I find, most people tell me, “well don’t do that.” I usually ignore them and dream of tilting windmills. Well today a windmill has been tilted, and this is the epic tale. I’m a developer that fell in love with PowerShell. As such I often call .NET API functions from powershell scripts. This usually just works. However, it kind of falls apart when you have to use settings in the app.config file. This means its basically impossible call functions from a DLL that use NHibernate, Entity Framework or WCF Service references. (However, WCF Services can be called direcctly from PowerShell quite easily) The solution is to run the PowerShell script in a new PowerShell Runspace in a second AppDomain that uses its own app.config. However, things quickly fall apart because you need to write three classes that inherit from PSHostRawUserInterface, PSHostUserInterface and PSHost respectively or else Write-Host will throw an exception. Now all this is a lot scarier than it sounds. However, it stops two important groups of people from ever using PowerShell to call DLLs that absolutely require you to manipulate your app.config: • People scared off by the word AppDomain • People that realize they have better things to do than everything I described above Lucky for these two groups of people, I wasted my time so they didn’t have to! The project is currently called AppDomainPoshRunner, and I ILMerge it (via IL-Repack) into poshrunner.exe. Right now poshrunner takes one command line argument, the path to a script. If the script exists it will run it in an AppDomain whose config file is scriptname.config. Log4net configuration is read from a file called ADPR.log4net.config in the same directory as poshrunner.config. The full background is to long and convoluted for this post. This was all born out of a problem with calling New-WebServiceProxy twice in the same PowerShell console. I use log4net to write the console messages so this has the potential to be quite extendable. Then Stan needed to run PowerShell scripts from msbuild and was complaining to me about it over twitter. He didn’t like the hacky solution I had then. Eventually I realized this was the way to simplify my previous solution. So download the zip file. Try it out. Complain to me when you find bugs! TLDR; Unlike powershell.exe -file foo.ps1, which uses the shared powershell.exe.config, poshrunner.exe foo.ps1 uses foo.ps1.config, for great justice. Download it now! ### Nate Berry #### Warcraft III on Ubuntu 12.04 Installing Ubuntu on the MacBook recently, I knew there would be a bunch of OSX programs I would no longer be able to run but I was pretty confident that I’d be able to get some Windows programs going with wine. Having had good luck with Temple of Elemental Evil on the Elitebook last December, […] ## November 21, 2012 ### Eitan Adler #### Finding the majority element in a stream of numbers Some time ago I came across the following question. As input a finite stream stream of numbers is provided. Define an algorithm to find the majority element of the input. The algorithm need not provide a sensible result if no majority element exists. You may assume a transdichotomous memory model. There are a few definitions which may not be immediately clear: Stream A possibly infinite set of data which may not be reused in either the forward or backward direction without explicitly storing it. Majority element An element in a set which occurs more than half the time. Unfortunately this answer isn't of my own invention, but it is interesting and succinct. The algorithm (click to view)Using 3 registers the accumulator, the guess and the current element (next): 1. Initialize accumulator to 0 2. Accept the next element of the stream and place it into next. If there are no more elements go to step #7. 3. If accumulator is 0 place next into guess and increment accumulator. 4. Else if guess matches next increment accumulator 5. Else decrement accumulator 6. Go to step 2 7. Return the value in guess as the result An interesting property of this algorithm is that it can be implemented in$O(n)$time even on a single tape Turing Machine. ## November 08, 2012 ### Justin Dearing #### Visual Studio 2010 and VisualStudio.com TFS Hosting Yesterday, Stan, the founder of this blog, gave me a link to a project host on the Team Foundation Service (visual studio.com). I tried to connect to it with Visual Studio 2010. it simply refused to work. After much annoyance, he asked me to try adding the TFS server to Visual Studio 2012 and it worked (Why didn’t I think of that?). Eventually I figured out that I needed to install the Visual Studio 2010 SP1 Team Foundation Server 2012 Compatibility GDR (KB2662296). Then I was able to add the solution to Visual Studio 2010. It seems there are several updates for Visual Studio 2010 SP1, some specifically dealing with Windows 8 compatibility. Unfortunately Windows update does not prompt me to install them. I will search for and install these tonight to prevent future issues. #### Windows Internals Study Group – First meeting Last night myself and two others had out first planning meeting via google huddle for our Windows Internals Study group. We will be meeting next Wednesday 2012-11-17 at 20:30 EST to discuss the first two chapters of Windows Internals 6th edition. If you still want to participate its not too late, just let me know. One thing we decided was to make all our notes public. Right now they are being stored in a google drive shared folder that is publicly accessible. The information there will grow with time. ## November 04, 2012 ### Justin Dearing #### My ConEmu Tasks Update: Thanks to the author of ConEmu for some constructive feedback in the comments! I’ve fallen in love with ConEmu, after being introduced to it by a Scott Hanselman blog post. While my initial attraction to it was its integration with farmanager, that was only its initial selling point. The ability to have one resizable tabbed window to hold all my cmd.exe, powershell.exe and bash.exe consoles is what makes it a killer app. While 90% of my command line needs are met by a vanilla powershell.exe or far.exe command line, sometimes I need my environment set up in another way. For example, sometimes I need the vcvarsall.bat run that sets the environment up for a particular version of Visual Studio or the Windows SDK. Also, on occasion I will need to fire up the mingw bash instance that comes with git. (Generally, this is only to run sshkeygen.exe). Finally, while a 64 bit instance of PowerShell 3.0 captures 99% of my PowerShell needs, I want easy access to both 32 and 64 bit versions of the PowerShell 1.0, 2.0 and 3.0 environments. So I set all these up in ConEmu, and exported the registry keys into a gist repository, and now I share them here as a pastebin (because githup lacks syntax highlighting): ## Breakdown of settings All the settings are stored in subkeys of HKEY_CURRENT_USER\Software\ConEmu\.Vanilla\Tasks with the name TaskN where N is a base 10 integer. In addition, that key has a DWORD value named Count that contains the number of tasks. Each task contains the following key/value pairs: • Name This is the name of the task • GuiArgs These are the ConEmu configuration options. In all my cases I use /single /Dir %userprofile%. The /single flag adds the task to the currently open ConEmu instance, if there is one. The /Dir %userprofile% switch sets the working directory to my home directory. • CmdN These are the commands executed. Each one represents a command executed in a new tab by a task. All the tasks I have configured execute exactly one tab. • CountThis is the number of tabs opened by this command. Like the TaskN keys, it should match up to the number CmdN Values in a Task. • Active I set this to zero, but if you have a task that open multiple tabs you can set this to the tab number you want activated after running that task. For example, If you have a task with a Cmd1 that opens mongod.exe, and a Cmd2 that opens up an instance of tail.exe tailing the mongo logs, setting Active to 2 will display the console running the tail. ConEmu’s true power is your ability to customize it, so by showing you how I customized it, I hope you have found it to be more powerful. ## November 01, 2012 ### Justin Dearing #### Windows Internals Google Group Created I’ve made a google group to talk about the Windows Internals Study Group I proposed. Lets move the conversation there. ## October 31, 2012 ### Eitan Adler #### Finding the min and max in 1.5n comparisons A friend of mine recently gave me the following problem: Given an unsorted set of numbers find the minimum and maximum of set in a maximum of$1.5n$comparisons. My answer involves splitting the list up pairwise and finding the result on the only half of the set. 1. Go through list and compare every even index to its immediate right (odd) index. Sort each pair numerically within itself. This step takes$\dfrac{1}{2}n$comparisons. 2. Find the minimum of every odd index and find the maximum of every even element using the typical algorithm. This step takes$n$comparisons. Note that this could be done in one pass by doing the pair comparison and the min/max comparison in one pass. Is there a better way? #### Blogging my way through CLRS Section 11.1 (edition 2) I've taken a brief break from blogging about my Cormen readings but I decided to write up the answers to chapter 11. Note that the chapters and question numbers may not match up because I'm using an older edition of the book. Question 11.1-1: Suppose that a dynamic set$S$is represented by a direct address table$T$of length$m$. Describe a procedure that finds the maximum element of$S$. What is the worst case performance of your procedure? Assuming the addresses are sorted by key: Start at the end of the direct address table and scan downward until a non-empty slot is found. This is the maximum and if not: 1. Initialize$max$to$-\infty$2. Start at the first address in the table and scan downward until a used slot is found. If you reach the end goto #5 3. Compare key to$max$. If it is greater assign it to$max$4. Goto #2 5. Return$max$The performance of this algorithm is$\Theta(m)$. A slightly smaller bound can be found in the first case of$\Theta(m - max)$Question 11.1-2: Describe how to use a bit vector to represent a dynamic set of distinct elements with no satellite data. Dictionary operations should run in$O(1)$time. Initialize a bit vector of length$|U|$to all$0$s. When storing key$k$set the$k$th bit and when deleting the$k$th bit set it to zero. This is$O(1)$even in a non-transdichotomous model though it may be slower. Question 11.1-3: Suggest how to implement a direct address table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations must take$O(1)$time. Each element in the table should be a pointer to the head of a linked list containing the satellite data.$nul$can be used for non-existent items. Question 11.1-4: We wish to implement a dictionary by using direct addressing on a large array. At the start the array entries may contain garbage, and initializing the entire array is impractical because of its size. Describe a scheme for implementing a direct address dictionary on the array. Dictionary operations should take$O(1)$time. Using an additional stack with size proportional to the number of stored keys is permitted. On insert the array address is inserted into a stack. The array element is then initialized to the value of the location in the stack. On search the array element value is to see if it is pointing into the stack. If it is the value of the stack is checked to see if it is pointing back to the array.[1] On delete, the array element can be set to a value not pointing the stack but this isn't required. If the element points to the value of the stack, it is simply popped off. If it is pointing to the middle of the stack, the top element and the key element are swapped and then the pop is performed. In addition the value which the top element was pointing to must be modified to point to the new location Question 11.2-1: Suppose we have use a hash function$h$to hash$n$distinct keys into an array$T$of length$m$. Assuming simple uniform hashing what is the expected number of collisions? Since each new value is equally likely to hash to any slot we would expect$n/m$collisions. Question 11.2-2: Demonstrate the insertion of the keys:$5, 28, 19, 15, 20, 33, 12, 17, 10$into a hash table with 9 slots and$h(k) = k \mod{9}$[2] hashvalues 128 -> 19 -> 1 220 312 55 615 -> 33 178 Question 11.2-3: If the keys were stored in sorted order how is the running time for successful searches, unsuccessful searches, insertions, and deletions affected under the assumption of simple uniform hashing? Successful and unsuccessful searches are largely unaffected although small gains can be achieved if if the search bails out early once the search finds a key later in the sort order than the one being searched for. Insertions are the most affected operation. The time is changed from$\Theta(1)$to$O(n/m)$Deletions are unaffected. If the list was doubly linked the time remains$O(1)$. If it was singly linked the time remains$O(1 + \alpha)$Question 11.2-4: Suggest how storage for elements can be allocated and deallocated within the ash table by linking all unused slots into a free list. Assume one slot can store a flag and either one element or two pointers. All dictionary operations should run in$O(1)$expected time. Initialize all the values to a singly linked free list (flag set to false) with a head and tail pointer. On insert, use the memory pointed to by the head pointer and set the flag to true for the new element and increment the head pointer by one. On delete, set the flag to false and insert the newly freed memory at the tail of the linked list. Question 11.2-5: Show that if$|U| > nm$with$m$the number of slots, there is a subset of$U$of size$n$consisting of keys that all hash to the same slot, so that the worst case searching time for hashing with chaining is$\Theta(n)$Assuming the worst case of$|U|$keys in the hash tabe assuming the optimial case of simple uniform hashing all m slots will have$|U|/m = n$items. Removing the assumption of uniform hashing will allow some chains to become shorter at the expense of other chains becoming longer. There are more items then the number of slots so at least one slot must have at least$n$items by the pigeon hole principle. Question 11.3-1: Suppose we wish to search a linked list of length$n$, where every element contains a key$k$along with a hash value$h(k)$. Each key is a long character string. How might we take advantage of the hash values when searching the list for an element of a given key? You can use$h(k)$to create a bloom filter of strings in the linked list. This is an$\Theta(1)$check to determine if it is possible that a string appears in the linked list. Additionally, you can create a hash table of pointers to elements in the linked list with that hash value. this way you only check a subset of the linked list. Alternatively, one can keep the hash of the value stored in the linked list as well and compare the hash of the search value to the hash of each item and only do the long comparison if the hash matches. Question 11.3-2: Suppose that a string of length$r$is hashed into$m$slots by treating it as a radix-128 number and then using the division method. The number$m$is easily represented as a 32 bit word but the string of$r$character treated as a radix-128 number takes many words. How can we apply the division method to compute the hash of the character string without using more than a constant number of words outside of the string itself? Instead of treating the word as a radix-128 number some form of combination could be used. For example you may add the values of each character together modulus 128. Question 11.3-4: Consider a hash table of size$m = 1000$and a corresponding hash function$h(k) = \lfloor m (k A \mod{1})\rfloor$for$ A = \frac{\sqrt{5} - 1}{2}$Compute the locations to which the keys 61, 62, 63, 64, 65 are mapped. keyhash 61700 62318 63936 64554 65172 #### Blogging my way through CLRS section 3.1 [part 5] Part 4 here. I wrote an entire blog post explaining the answers to 2.3 but Blogger decided to eat it. I don't want to redo those answers so here is 3.1: For now on I will title my posts with the section number as well to help Google. Question 3.1-1: Let$f(n)$and$g(n)$be asymptotically non-negative functions. Using the basic definition of$\theta$-notation, prove that$\max(f(n) , g(n)) \in \theta(f(n) + g(n))$. CLRS defines$\theta$as$\theta(g(n))= \{ f(n) :$there exists some positive constants$c_1, c_2$, and$n_0,$such that$0 \leq c_1g(n) \leq f(n) \leq c_2g(n)$for all$n \geq n_0\}$Essentially we must prove that there exists some$c_1$and$c_2$such that$c_1 \times (f(n) + g(n)) \leq \max(f(n), g(n)) \leq c_2 \times (f(n) + g(n))$There are a variety of ways to do this but I will choose the easiest way I could think of. Based on the above equation we know that$\max(f(n), g(n)) \leq f(n) + g(n)$(as f(n) and g(n) must both me non-negative) and we further know that$\max(f(n), g(n))$can't be more than twice f(n)+g(n). What we have then are the following inequalities: $$\max(f(n), g(n)) \leq c_1 \times (f(n) + g(n))$$ and $$c_2 \times (f(n) + g(n)) \leq 2 \times \max(f(n), g(n))$$ Solving for$c_1$we get 1 and for$c_2$we get$\frac {1} {2}$Question 3.1-2: Show for any real constants$a$and$b$where$b \gt 0$that$(n+a)^b \in \theta(n^b)$Because$a$is a constant and the definition of$\theta$is true after some$n_0$adding$a$to$n$does not affect the definition and we simplify to$n^b \in \theta(n^b)$which is trivially true Question 3.1-3: Explain why the statement "The running time of$A$is at least$O(n^2)$," is meaningless. I'm a little uncertain of this answer but I think this is what CLRS is getting at when we say a function$f(n)$has a running time of$O(g(n))$what we really mean is that$f(n)$has an asymptotic upper bound of$g(n)$. This means that$f(n) \leq g(n)$after some$n_0$. To say a function has a running time of at least g(n) seems to be saying that$f(n) \leq g(n) \And f(n) \geq g(n)$which is a contradiction. Question 3.1-4: Is$2^{n+1} = O(2^n)$? Is$2^{2n} = O(2^n)$?$2^{n+1} = 2 \times 2^n$. which means that$2^{n+1} \leq c_1 \times 2^n$after$n_0$so we have our answer that$2^{n+1} \in o(2^n)$Alternatively we could say that the two functions only differ by a constant coefficient and therefore the answer is yes. There is no constant such that$2^{2n} = c \times 2^n$and thefore$2^{2n} \notin O(2^n)$Question 3.1-5: Prove that for any two functions$f(n)$and$g(n)$, we have$f(n) \in \theta(g(n)) \iff f(n) \in O(g(n)) \And f(n) \in \Omega(g(n))$This is an "if an only if" problem so we must prove this in two parts: Firstly, if$f(n) \in O(g(n))$then there exists some$c_1$and$n_0$such that$f(n) \leq c_1 \times g(n)$after some$n_0$. Further if$f(n) \in Omega(g(n))$then there exists some$c_2$and$n_0$such that$f(n) \geq c_2 \times g(n)$after some$n_0$. If we combine the above two statements (which come from the definitions of$\Omega$and O) than we know that there exists some$c_1, c_2, and n_0,$such that$c_1g(n) \leq f(n) \leq c_2g(n)$for all$n \geq n_0\}$We could do the same thing backward for the other direction: If$f(n) \in \theta(g(n))$then we could split the above inequality and show that each of the individual statements are true. Question 3.1-6: Prove that the running time of an algorithm is$\theta(g(n)) \iff$its worst-case running time is$O(g(n))$and its best case running time$\Omega(g(n))$. I'm going to try for an intuitive proof here instead of a mathematical one. If the worst case is asymptotically bound above in the worst case by a certain function and is asymptotically bound from below in the best case which means that the function is tightly bound by both those functions. f(n) never goes below some constant times g(n) and never goes above some constant times g(n). This is what we get from the above definition of$\theta(g(n)))$A mathematical follows from question 3.1-5. Question 3.1-7: Prove that$o(g(n)) \cap \omega(g(n)) = \varnothing$little o and little omega are defined as follows: $o(g(n)) = \{ f(n) : \forall c > 0 \exists n_0 \text{such that } 0 \leq f(n) \leq c \times g(n) \forall n \gt n_0$ and $\omega(g(n)) = \{ f(n) : \forall c > 0 \exists n_0 \text{such that } 0 \leq c \times g(n) \leq f(n) \forall n \gt n_0$ In other words $$f(n) \in o(g(n)) \iff \lim_{n \to \infty} \frac {f(n)} {g(n)} = 0$$ and $$f(n) \in \omega(g(n)) \iff \lim_{n \to \infty} \frac {f(n)} {g(n)} = \infty$$ It is obvious that these can not be true at the same time. This would require that$0 = \infty$#### Blogging my way through CLRS [part 4] Part 3 here This set is a bit easier than last time. Question 2.2-1:Express the function $$\frac{n^3}{1000} - 100n^2 - 100n + 3$$ in terms of$\Theta$notation A function g(x) is said to be in the set of all functions$\Theta(x)$if and only if g(x) is also in the set of all functions$\Omega(x)$and in the set of all functions$O(x)$. Symbolically: $$g(x) \in \Theta(x) \iff g(x) \in O(x) \And g(x) \in \Omega(x)$$ A function g(x) is in the set of all functions$\Theta(x)$if and only if after some constant$c$it is always true that for some constant C,$g(x) \lt Cf(x)$A function g(x) is in the set of all functions O(x) if and only if after some constant$c$it is always true that for some constant C,$g(x) \gt Cf(x)$With our function we could choose practically any function to satisfy either one of these conditions. However we need to satisfy both of them. One thing that makes this easier is that it only has to be true after some constant number. This allows us to throw away the "trivial" parts that are eventually overwhelmed by the faster growing terms. We therefore are only left with$n^3$, which is the answer. Question 2.2-2: Consider sorting n numbers stored in an array A by first finding the smallest element and exchanging it with the element in A[1], then find the second smallest element and exchange it with A[2], and continue this for the first n-1 elements of A. Write the pseudocode for this algorithm, which is known as Selection Sort. What loop invariant does this algorithm maintain? Why does it need to run only for the first n-1 elements and not for all n? Give the best case and worst case running times in$\Theta$notation This question is asking us to analyze selection sort in a variety of ways. I will start with writing out the pseudocode: for$j \leftarrow 1$to$n-1$min$\leftarrow$j for$i \leftarrow j+1$to$n\rhd$if A[i] < A[min] then min$\leftarrow$i$\rhd$if min$\neq$j then swap A[min] and A[j] A loop invariant that this algorithm maintains is that every elements prior to A[j] is sorted among the subarray A[1] to A[j] and is less than or equal to every element in the subarray A[j+1] to A[n]. I do not believe a stronger loop invariant is provable. The algorithm only needs to run until n-1 because of the second part of the loop invariant. When$j = n-1$we know that every element after A[j], which is A[n] is not less than all previous elements. Therefore no check has to be done. In the best case (an already sorted array) and in the worst case (a reverse sorted array) the running time is the same:$\Theta(n^2)$Question 2.2-3: Consider linear search again. How many elements of the input sequence need to be checked on average, assuming that the element being searched for is equally likely to be any element in the array? How about in the worst case? What are the average-case and worst-case running times of linear search in$\Theta$notation? The best case for a linear search algorithm is when the searched-for element is in the first location. In the worst case all n locations must be searched. In the average case$\frac{n}{2}$locations have to be searched. Question 2.2-4: How can we modify almost any algorithm to have a good best-case running time? I have no idea what this question is asking for. I guess checking for the optimal case (as in a pre-sorted array for a sorting algorithm) and then skipping the rest of the procedure might work. #### Blogging my way through CLRS [3/?] part 2 here According to wikipedia Introduction to Algorithms is also known as CLRS which is shorter (and more fair to the other authors) so I'll use that name for now on. Question 2.1-1 asks me to redraw a previous diagram, but with different numbers. I am not going to post that here. Question 2.1-2Rewrite the insertion sort procedure to sort into nonincreasing instead of nondecreasing order: Here is the pseudocode of the nonincreasing version of insertion sort: for j$ \leftarrow 2$to length[A] do key$ \leftarrow A[j]\rhd$Insert A[j] into sorted sequence A[1..j-1]$ i \leftarrow j - 1$while$i \gt 0$AND$A[i] \lt key$do$A[i+1] \leftarrow A[i]i \leftarrow i - 1A[i+1] \leftarrow key$Now we prove that this loop correctly terminates with a nonincreasing array to about the same level of formality as the book proved the original. Initialization: At the first iteration, when$j=2$the subarray A[1..j-1] is trivially sorted (as it has only one element). Maintenance: In order to prove maintenance we need to show that the inner loop correctly terminates with an array with "space" for the correct element. As CLRS did not prove this property, I will also skip this proof. Termination: this loop terminates when j > length[A] or when$j = length[A]+1$. Since we have "proven" (to some level) the maintenance of the loop invariant (that at each point during the loop the subarray [1..j-1] is sorted) we could substitute length[A]+1 for$j$which becomes [1..length[A]] or the entire array. This shows that the loop terminates with a correctly sorted array. Question 2.1-3: Input:A sequence of$n$numbers$A = {a_1,a_2,...,a_n}$and a value$v$. Output: An index i such that$v = A[i]$or a special value$\varnothing$(NIL) if$v \notin A$Write the pseudocode for Linear Search, which scans through the sequence looking for$v$. Using a loop invariant, prove that your algorithm is correct. The first part, writing the pseudocode, seems fairly easy:$r \leftarrow \varnothingj \leftarrow 1$to length[A] if$v = A[j] \rhd$optionally check that$r = \varnothingr \leftarrow j$return$r$The second part, proving that this is correct is harder than before because we don't have a trivially true initialization of our loop invariant. Initialization:$j = 1\ \And\ r = \varnothing$at the start of our loop. At this point there are no elements prior to A[j] and we have yet to find$v$in A. As such our invariant (that r will contain the correct value) is true. Maintenance: At every point in the loop the subarray A[1..j] has either contained$v$in which case it has been assigned to$r$or has not contained$v$in which case$r$remains$\varnothing$. This means that loop invariant holds for every subarray A[1..j]. Termination: At the end of the loop$j = $length[A]. We know from our maintenance that$r$is correct for every subarray A[1..j] so at termination$r$contains the correct value Question 2.1-4 Consider the problem of adding two$l$-bit binary integers, stored in two$l$-element arrays$A$and$B$. the sum of the two integers should be stored in binary form in$(l+1)$-element array$C$. State the problem formally and write pseudocode for adding the integers. Stating the problem formally looks something like: Input: Two$l$-bit integers$A$and$B$stored as arrays of length$l$with the most significant bit stored last Output: An$l+1$-bit integer ($C$) stored as arrays of length$l+1$with the most significant bit stored last Here is the pseudocode:$\rhd$X is a$l$-bit array of bits initialized to all zeros in order to store the carry for j$\leftarrow$1 to$lC[j] \leftarrow copyC \leftarrow A[j] \oplus B[j]X[j+1] \leftarrow A[j] \And B[j]C[j] \leftarrow C[j] \oplus X[j] X[j+1] \leftarrow copyC \oplus X[j+1] $#### Blogging my way through Cormen [2/?] As I said in part 1 I am reading a book on algorithms and have decided to blog my way through. My primary goal in doing so is to improve my writing skills. A secondary goal is to force myself to actually answer the questions. 1.2-2 Suppose we are comparing implementations of insertion sort and merge sort on the same machine. For inputs of size n, insertion sort runs in$8n^2$steps, while merge sort runs in$64n \lg n$steps. For which values of$n$does insertion sort beat merge sort? The question is essentially asking for which values of$n$is$8n^{2} \lt 64n \lg n$. We can solve this question by first factoring out an$8n$and we get$n \lt 8 \lg n$Unfortunately this problem is not solvable using elementary operations. Luckily we are being asked for an integer solution (as computers operate in discrete steps) and we could use the underutilized guess-and-and method.$n$$8 \lg n 14 30.46 41 42.86 43 43.41 44 43.675 So there we have it: given this data we would prefer insertion sort whenever we have fewer than 43 items. 1.2-3 What is the smallest value of n such that an algorithm whose running time is 100n^2 runs faster than an algorithm whose running time is 2^n on the same machine. This question is asking us find the smallest positive integer n that satisfies 100n^{2} \lt 2^n. This could be solved by doing the math, by looking at a plot of the curves, or using the above method again. .$$2^{14} = 16384(100 \times 14^{2}) = 196002^{15} = 32768(100 \times 15^{2}) = 22500
An exponential algorithm becomes far worse than a polynomial algorithm when we have only 15 items to worry about. In other words: avoid exponential algorithms!
Thank you JT and JM for giving me the idea to go through the book, and for looking at my posts before I publish them.
Updated 2012-09-05: I had a brain lapse the day I originally published this and accidentally used the natural logarithm instead of the base 2 log for question 1.2-2. How I ever managed to do that I will not know, but I've fixed it.

#### Google translate proxy no longer available

One old trick to bypass simple domain based filters was to use Google translate on the domain and go from English to English (or the native language to the native language - whatever it might be).

I recently came across a link that happen to be using Google translate in that way a (I'm not sure why) and I got an error from Google

"Translation from English into English is not supported."

When I tried with other languages I got similar errors. Translating to other languages works as usual.

Luckily this trick is not really needed as there are thousands of available proxies or one could just make their own.

When you make a request to certain websites you may find an unusual header that looks a little strange:

[8000 eitan@radar ~ ]%curl -I http://www.imdb.com/ 2>/dev/null|grep closeCneonction: close[8001 eitan@radar ~ ]%curl -I http://maps.apple.com/ 2>/dev/null|grep closeCneonction: close

This isn't a typo though. Some load balancers that sit between the web server and end user want to implement HTTP keep-alive without modifying the back end web server. The load balancer therefore has to add "Connection: Keep-Alive" to the HTTP header and also has to elide the "Connection: close" from the real webserver. However, if it completely removes the line the load balancer (acting as a TCP proxy) would have to stall before forwarding the complete text in order to recompute the TCP checksum. This increases latency on packet delivery.

Instead, the proxy uses a hack to keep the checksum unchanged. The TCP checksum of a packet is the 1s complement summation of all the 16 bit words (the final word might be right padded with zeros).[1] By manipulating the ordering, but not the content of the header the proxy can avoid changing the TCP checksum except by the fixed amount that the "Connection: Keep-Alive" adds (2061).

In particular:

>>>sum(ord(i) for i in "Connection") - sum(ord(i) for i in "Cneonction")0`

This reordering also keeps the packet size the same.

1. RFC793
Edit 2012-10-31: Make the RFC a link and remove pointless "2>&1"
Thanks abbe for the inspiration! Thanks wxs for the proofreading.

## October 29, 2012

### Justin Dearing

#### Windows Internals Study Group Proposal

While most of the code I’ve written to put food on the table has been for application development, I’ve always had a true passion for system development. Its a bit meta, sort of like being the mechanical engineer that just wants to make a better ratchet wrench. However, I think its important to understand, and if given the opportunity, help write the code that runs and supports the code that makes the end users productive.

As a result of this passion, I’ve been reading the 5th edition of Windows Internals by Windows Hacker extraordinaires Mark E. Russinovich and David A. Solomon. This edition covers Windows Vista and Windows Server 2008. While its been enlightening, and the exercises have helped reinforce the concepts, the book is still somewhat academic. It’s also not a book about programming or a book about system administration. Its a book that teaches concepts. Some of these concepts mesh well with my previous knowledge and are directly applicable to things I’ve done or want to do. I retain these concepts well. Other concepts, such as the finer points of windows objects, are not things I can relate to other concepts (yes I see the unix everything is a file parallel, but I don’t see the dd if=/dev/foo of=/tmp/foo.img parallel that makes it useful).

So I decided to form a study group. I’ve done this before for the ZCE in PHP5, with the Long Island PHP user group. Here is what I am proposing. All interested parties contact me via the comments or at @zippy1981 on twitter. We will meet once a week, each week covering a different chapter.  The expectation as participants would be as follows:

### Before the meeting

• Do the assigned homework for the previous chapter
• Read the chapter. Take notes of what you did not understand. Make a list of hyperlinks and deadtree references that you used to supplement the chapter, if any.
• Do all the exercises with the prescribed  Windows System Internals tools and Windows Debugger (these are all free downloads).
• Do the exercises with equivalent third party tools including:

### During the Meeting

• Take a turn in leading on of the meetings. I suggest picking the chapter you are most intimidated by, not the one you are most confident in. This is a reinforcement tool.
• Present your homework from the previous chapter. We will go around the room. Depending on the size of the group and scope of the assignment we might only have a subset of the group volunteer to present or break into small groups.
• Discuss the chapter of the week. Share what you didn’t understand, help others who didn’t understand things that you did understand. (the whole point of a study group)
• Last weeks discussion leader will demonstrate this weeks exercises using both the sysinternals tools and the third party ones we agree upon.
• The group leader will assign us a homework for next week,
• Next weeks discussion leader will act as secretary and do the following:
• Collect everyone’s links to post to a wiki we will maintain.
• Take minutes of what we discussed in the meetings for the wiki.
• Where appropriate, filing bugs and feature requests for third party tools that lack the functionality to do the exercises.

Now, this isn’t high school, and we’re not getting graded. No one will chastise you if you don’t go to all the sessions or do all the homework. We all have jobs, friends, families, and sometimes a study group isn’t the most important thing. I expect the class to consist of mostly adults, but welcome any teenagers that think they can handle the material. Therefore, I will treat you all like adults.

That being said, I do want to run this as a pass fail course and give out certificates. Unlike the PHP ZCE study class I ran, there is no clear external goal. There is no certification on Windows Internals, except perhaps as parts of instructor lead courses. Passing will consist of doing all the homework, and actively participating in all the classes. If you miss a class, you can make it up my meeting later in the week with at least one other member of the study group. While a certificate seems kind of corny, especially coming from as unaccredited and aprestigious a body as the one we will be forming, I feel this  small carrot will help with commitment.

### Location

There is no reason this can’t work remotely. I will not turn anyone down because they can only attend via skype. However, I’d like at least some of us to meet in person.

I live in Jersey City and work in Hoboken. I’ll travel to Manhattan, Brooklyn, and Hudson, Bergen and Essex counties, or perhaps somewhere a little farther if I can find a really convenient train. We can certainly have multiple physical meeting locations (e.g. a group of people from Chicago meet there and I meet with some people in Hoboken).

Ideally, I’d like a meeting facility with a projector for the group location((s). We’d probably need to use webex if we are all not in the same room.

### What edition?

I’m reading the 5th edition (because I happen to own it). The 6th edition covers Windows 7 and Windows Server 2008R2. Its also a two volume edition. so its more expensive. I need to research if there will be an addition for Windows 8. I’m  proposing at this point the 6th Edition, because part 2 of the 6th edition was just released.

### Flexibility

My proposal is mostly based on what worked for the PHP cert. This will be quite different though. We might want to break some chapters in half, or dedicate two meetings to a chapter.

I hope to get enough interest to make this happen. I think this could work out really well and prove to be a great learning experience for everyone.

## October 23, 2012

### Nate Berry

#### Ubuntu breathes new life into old MacBook

Last year when I got an HP EliteBook for work I thought my days with the old MacBook were numbered. The MacBook isn’t that old, its a 2009 Core 2 Duo aluminum body 13″ model, but the EliteBook’s iCore 5 was is faster. The Mac screen is better, but not by very much. Both processors […]