Wednesday, November 18, 2009

Unix Options: a lisp cli parser

I recently spun off a sub project from another that I was working on recently. Index is an effort at an application-level file tagger (written in Common Lisp) and, being largely CLI driven, needed a decent CLI option parser to go with it. A quick search on Google, Cliki, and the CL-Directory didn't turn up the hits I'd like so I wrote my own for Index. Eventually, Unix Options, as I call it, grew large enough to be spun off into its own library, with unit tests and ASDF file so I did and here it is.

Unix Options attempts to support reading in options in the most idiomatic fashion possible. It supports parsing options both in short option style (single dash: '-a') long option style (double dash: '--alpha'). It recognizes grouped short options ('-abc') and recognizes file and argument parameters to options. Both '--option=parameter' and '--option parameter' syntax is recognized and '--' ends the parser and treats all remaining tokens as free tokens, as is usually expected at a Unix CLI.

Unix Options was meant to have the flexibility of the Perl GetOptions command. That is, rather than act as a simple getopt replacement for sorting tokens and making them easy to parse by the rest of the program, Unix Options provides mechanisms for automatically binding options and handling details like unsupported options and invalid input. In additions, things like usage printouts (from a '-h') options and document generation hooks are planned.

To this end, I split the 'back-end' or parser part of Unix Options, from the various 'front-end' parts which handle the values found. The function map-parsed-options is the back-end. It accepts a list of the tokens passed in on the CLI as passed in on the CLI, as well as lists of valid options. It takes two callbacks, one to handle valid options, and one to handle free tokens. It processes the list of tokens, splitting it into option-value pairs, for each option found and splitting out free tokens, running the appropriate callback for each. For options which take parameters, the parameter is passed as value; options that don't take parameters simply receive T for the value. map-parsed-options has the prototype:

(map-parsed-options cli-options bool-options param-options opt-val-func free-opt-func) 

bool-options lists all options that take no parameters ('bool' implies that they are either true or false, passed in or not) and param-options lists all that do.

While map-parsed-options is very flexible, it is also more complex than is convenient for normal use. To remedy this, there are a two 'front-ends' available with Unix Options getopt and with-cli-options. (In addition, the user can construct his own front-end using map-cli-options as the back-end, but this is unnecessary.) The full usage for either of these is better explained in the README.

getopt is a simple replacement for the Unix 'getopt' command. It's syntax is similar to that of the Python 'getopt' command and simply converts a list of CLI tokens into one more easily parsed.

The with-cli-options macro attempts to be a more comprehensive solution for binding values passed in on the CLI. The macro is best explained with a simplistic example:

(with-cli-options () 
(option &parameters file)
(if option
(print file)))

with-cli-options binds values passed in on the cli to a list of names provided by the user. Each name generates a 'spec' for binding options to it: a long option of the same name, and a short option of the first letter of the name. If more than one name is listed of the same first letter, the second listed uses the capital for the short option, and any more only use the long option. Names listed after a &parameters symbol take parameters. A list of free tokens is generated and is bound to free unless a single name listed after a &free symbol exists, in which case the list is bound to it instead. The prototype for with-cli-options is such:

(with-cli-options (&optional (cli-options *cli-options*)) option-variables &body body) 

(*cli-options* is bound to the list of cli tokens provided the Lisp implementation.)

As such, with-cli-options reduces the entire process of dealing with CLI options to about a single line of code, which I thought was neat.

Posted via email from astine's posterous

Am working right now

Watch me go.

Posted via email from astine's posterous

Wednesday, October 21, 2009

Multi-linugal Programming Language

A fellow recently raised the notion of multi-lingual programming languages.

I have a few thoughts on the matter:

It seems to me like Multi-Lingual Programming Languages (MLPLs for short) would be an unnecessary layer of complexity, and they could make communication between international teams much more difficult. Imagine if an English speaker opened the source file from a team in India in order to isolate a critical bug and found that it was all in Hindi. Perhaps it's inconvenient for the Indians that they have to read programs in English, but at least it's a single language, not several.

Not to say that MLPLs would be necessarily be a bad thing. One thing I can imagine working would be Multi-Lingualism being supported in an IDE, or at a 'meta symbolic' (for lack of a better term) level.

For example, if we simply implemented a strict transliteration system, so that there was a one-to-one correspondence between a program written in English and the same one written in Chinese, then an editor could convert between them on the fly. This would be useful for switching between alphabets and making code easier to read for different people but doesn't necessarily carry meaning through. (The English word "while," spelled with Chinese characters is still the English word "while.")

Working with an IDE that maintains a one-to-one mapping internally between English keywords in a given language and there non-English equivalents, one could conceivably code in Russian and have the IDE internally translate it to English before passing the code to the compiler or interpreter, a sort of pre-compilation stage if you will. It could work in reverse as well, so that the canonical version is always in English (or whatever the native language of the programming language is) but coders in different nationalities see their own language.

With languages that use symbols, like Lisp, one could create a many-to-one mapping such that a single symbol has multiple printed representations depending on the (human) language in use. This would have to be enabled with a reader level switch. In Common Lisp, you could do something similar to this by creating a package for the language of your choice which maps symbols to their standard equivalents.

Non of these solutions are perfect however. For example, non handle languages designed to imitate English grammar such as SQL, for example. And they introduce complexity that may be unnecessary.

Monday, October 19, 2009

Call by Common Lisp

It is a minor nuisance in my opinion that Common Lisp is a pass-by-value language. Or, rather, it is a pass-by-reference. Or, rather, it is a rather confusing mixture of the two. For example; take the following bit of code:

(defvar *list* nil)

(defun add-to-list (item list)
(push item list))

(add-to-list 'a *list*)

*list*

What do you think the outcome is? It's of course nil, because Lisp is pass-by-value. Now check this out:

(defvar *hash* (make-hash-table))

(defun add-to-hash (key item hash)
(setf (gethash key hash) item))

(add-to-hash 'x 'a *hash*)

(gethash 'x *hash*)

What is the outcome now? It's 'a. Which would mean that Lisp is pass-by-reference. So what's happening? Unless the changes to the hash bound in the function were echoed to the original, the hash was passed by reference.

The problem is actually rather simple. Parameter passing in Common Lisp *is* pass-by-value, but some of those values are references. It's a little bit like C:

int x = 2;

int double(int i, int *pi)
{
*pi = i * 2;
return i * 2;
}

double(x, &x);

Not only does double return 4, x is now set to 4. In "double(x, &x);" x is first passed to double by value, then by reference. Or rather, a reference (C pointer) to x is pass by value to double. There is no mystery to this; in fact it is a rather handy feature that lets you specify which parameters to a function should be mutable and which should not. It's efficient, low-level, and works very well in C.

In Common Lisp however, there are no pointers. References are not some explicit data type that can be set or derived from a variable. There is no "(ref x)". Instead reverences are treated mostly implicitly. "(list 1 2 3)" returns an object joined together with references and one can exploit this progamatically, but there is no way to explicitly pass a reference to an object. That is, there is no way to force a pass-by-reference on a Lisp object in the same way that "&x" works in C. While many complex data structures, such as hashes and arrays can be treated implicitly as pass-by-reference, atomic types, such as numbers and characters cannot. Lists are a special case where one can modify the entire structure so long as they don't attempt to rebind the initial cons. So:

(defun add-item-a (item list)
(push item list))

(defun add-item-b (item list)
(nconc list (cons item nil)))

The first definition will no work as expected, but the latter will, so long as it is not passed an empty list. Objects and structures are similar in that one can rebind slots, but not replace the entire object. One can modify variables from a function which has been passed a reference to it, but one cannot modify the references themselves.

So what does one do when one wants to do something like this?:

(let ((count 0))
(defun add-next-count (list)
(push (incf count) list)))

(defvar *counter-list* nil)

(add-next-count *counter-list*)

Well, one option would be to rewrite the code to something a little more idiomatic. Another would be to use a macro. However, sometimes the clearest way to write the code is to pass a value to be modified, and macros add a lot of unneeded complexity if one can find another way.

For example passing the variable symbol can emulate a reference of sort:

(defvar *list* nil)

(defun add-to-list (item list)
(push item (symbol-value list)))

(add-to-list '*list*)

But this only works with dynamically bound, special variables. In order to rebind a lexical variable, one needs access to the scope in which it is bound, and this is lost in a function call. There is no way to pass a lexical variable by reference in Common Lisp. However, we can fake it.

Take this macro for example:

(defmacro add-to-list (item list)
`(push ,item ,list))

This works, and is idiomatic in Lisp. It works because it expands to code within the same scope in which it is being called. We can get the call-by-reference effect in Lisp simply by rewriting any functions that want to modify their parameters as macros. Unfortunately, this is not a perfect solution as macros are potential source of bugs which functions are not. If we wanted to make a very complex function into a macro it may be very difficult to debug or lead to complexity that could otherwise have been avoided (not long ago I ran into the very problem myself.) In a way, there is no way to avoid this, if we want the ability to consistently modify any parameters we need to use a macro. However, there might be a way to encapsulate the code that handles the rebinding of variables. Take this function for example:

(defvar *temp-list*)

(defun add-to-list (item)
(push item *temp-list*))

This function always operates on "*temp-list*". We can use "*temp-list*" as a means of passing values to and from "add-to-list". For example:

(defvar *list* nil)

(setf *temp-list* *list*)
(add-to-list 'item)
(setf *list* *temp-list*)

*list*

This code returns what we expect. With that in mind, we can certainly wrap this in a macro:

(with-bindings ((&rest bindings) &body body)
`(let ,bindings
,@body
,@(mapcar (lambda (binding)
`(setf ,@(reverse binding)))
bindings)))

such that:

(with-bindings (*temp-list* *list*)
(add-to-list 'item))

Does what the previous sample did. This gives me an idea. If I define a function and then define a macro around it that sets any variables that I want set. Furthermore I can write a macro that abstracts the whole process and allows me to define a function with several parameters passed in as "references". Here is the code:

(defun maptree (function tree)
(mapcar (lambda (branch)
(if (atom branch)
(funcall function branch)
(maptree function branch)))
tree))

(defmacro define-function-with-references (name (&rest references) (&rest parameters) &body body)
(let ((_references nil)
(fun-name (gensym)))
(dotimes (count (list-length references))
(push (gensym) _references))
`(progn ,@(mapcar (lambda (_reference)
`(defvar ,_reference))
_references)
(defun ,fun-name ,parameters
,@(maptree (lambda (atom)
(loop for reference in references
for _reference in _references
when (equal atom reference)
return _reference
finally (return atom)))
body))
(defmacro ,name ,(append references parameters)
`(let ,(mapcar (lambda (_reference reference)
`(,_reference ,reference))
',_references (list ,@references))
(funcall ,#',fun-name ,,@parameters)
,@(mapcar (lambda (reference _reference)
`(setf ,reference ,_reference))
(list ,@references) ',_references))))))

What this macro "defines" a function with reference parameters. Actually, it defines a function that operates on global variables and wraps that function in a macro which sets those global variables to the value of passed in variables and sets the passed in variables to the value of the the globals upon completion of the function, effectively allowing the function to modify the value of parameters passed to it. The globals are named with generated symbols to prevent namespace collisions. So, code like this finally works:

(let ((count 0))
(define-function-with-references add-next-count (list) ()
(push (incf count) list)))

(defvar *counter-list* nil)

(add-next-count *counter-list*)
(add-next-count *counter-list*)
(add-next-count *counter-list*)

*counter-list*

This returns "(3 2 1)" as expected. Of course, it's unusual that this kind of idiom is the correct way of doing things, and it's likely that my macro could be done better, but it works.

Friday, July 17, 2009

Dired mode and too many buffers.

Emacs has a very nice file browser/directory editor builtin called Dired. It's convenient when you're searching for files or or just need a general directory browser. You open it with:
C-x d

Supply to the directory that you want, and the directory listing appears in a buffer which you can navigate like any Emacs buffer. You can open files and directories by moving the cursor sand selecting them, as well as delete, rename, move and apply other opperations to the files and directories. All in all, it's pretty slick.

Dired's one major inconvenience for me is that every time I open something, it opens in a separate buffer. This includes new directories. So, if I use it to browse directories I end up with a large backlog of buffers that I have to delete and it gets very annoying.

Fortunatly, this being Emacs, things are eay to fix. A Google search found me this code:

(defun dired-follow-file ()
"In dired, visit the file or directory on this line.
If a directory is on the current line, replace the current
dired buffer with one containing the contents of the directory.
Otherwise, invoke `dired-find-file' on the file."
(interactive)
(let ((filename (dired-get-filename)))
;; if the file is a directory, replace the buffer with the
;; directory's contents
(if (file-directory-p filename)
(find-alternate-file filename)
;; otherwise simply perform a normal `dired-find-file'
(dired-find-file))))

(add-hook
'dired-mode-hook
(lambda ()
(local-set-key "\C-m" 'dired-follow-file)
(local-set-key "e" 'dired-follow-file)
(local-set-key "f" 'dired-follow-file)))


Which causes Dired to do exactly what I want, with one caveat: for some reason it bombs on . and .. directories. A little bit probing showed that it was dired-get-filename that was was failing for some reasons on these entries. I couldn't find either the definition or a description of dired-get-filename, but did find usage examples through a simple web search and these examples showed the usage:
(dired-get-filename nil t)

Instead of:
(dired-get-filename)

Which seemed to be worth a shor and, in fact, solved the problem.

I also added:

(defun dired-follow-up ()
"In dired, visit the directory up in the
hierarchy from this one"
(interactive)
(find-alternate-file ".."))


So that ^ would also behave as I wanted. The final code in my .emacs is as follows:

;; dired stuff

(defun dired-follow-file ()
"In dired, visit the file or directory on this line.
If a directory is on the current line, replace the current
dired buffer with one containing the contents of the directory.
Otherwise, invoke `dired-find-file' on the file."
(interactive)
(let ((filename (dired-get-filename nil t)))
;; if the file is a directory, replace the buffer with the
;; directory's contents
(if (file-directory-p filename)
(find-alternate-file filename)
;; otherwise simply perform a normal `dired-find-file'
(dired-find-file))))

(defun dired-follow-up ()
"In dired, visit the directory up in the
hierarchy from this one"
(interactive)
(find-alternate-file ".."))

(add-hook
'dired-mode-hook
(lambda ()
(local-set-key "\C-m" 'dired-follow-file)
(local-set-key "e" 'dired-follow-file)
(local-set-key "f" 'dired-follow-file)
(local-set-key "^" 'dired-follow-up)))

Saturday, July 4, 2009

PhpPgAdmin for Gentoo

I just spent an afternoon getting PhpPgAdmin working on my Gentoo server. There were a couple of pitfalls that tripped me up so I thought I'd relate the process.

To get PhpPgAdmin working you need three things: Postgres (of course), Php, and a webserver. Postgres, I already had installed and working following this tutorial. Rather than emerging the default Postgres package which is version 8.2, I emerged the latest package 8.3. So:

emerge -av virtual/postgresql-base #emerge the database
passwd postgres #set a password for the db user
New UNIX password:
Retype new UNIX password:
passwd: password updated successfully
emerge --config =postgresql-8.3.5 #(now 8.3.7), finish the install
/etc/init.d/postgresql start #start the database
rc-update add postgresql default #and, of course, add to the default runlevel


You can then configure the users to your hearts content using the createuser and dropuser commands. In addition, make sure to edit pg_hba.conf to allow all the connections you need. Make sure local, unix socket, connections are available. Is should be available under /var/lib/postgresql/8.3/data/pg_hba.conf. Also, make sure any users that will need to locally access the database are in the Postgres group.

usermod -aG postgres username

Keep in mind that whatever user your webserver runs as will also need to be added.

Which, brings us to the next step: the webserver. For reasons unreleated to PhpPgAdmin I already had an install of Lighttpd, and I wanted to use it rather than the standard Apache2. After fiddling for a while I reallized that I would have to reemerge it with the fastcgi and php flags enables (they are not enabled by default.) So:

echo "www-servers/lighttpd fastcgi php" >> /etc/portage/package.use #enable flags
emerge -av lighttpd #emerge lighttpd
/etc/init.d/lighttpd start #start the server
rc-update add lighttpd default #make sure it starts on restart


Php is a straightforward install:

echo "dev-lang/php cgi" >> /etc/portage/package.use
emerge -av php


You may wany to set cgi.fix_pathinfo to 1 in php.info:

vim /etc/php/cgi-php/php.ini

Before you install PhpPgAdmin, change vhost_server to whatever webserver you installed. It defaults to Apache, so if you used Lighttpd, change it to that. This file is /etc/vhosts/webapp-config/. And we can install PhpPgAdmin:

emerge -av phppgadmin

That should be it. You can access it through http://your.domain.com/phppgadmin using whatever login you made for Postgres.

Friday, June 5, 2009

Elephant and UFFI

Reminder to self: don't attempt to install Elephant so that it uses CFFI. CFFI-UFFI-Compat does not work. Use the actual UFFI. That means not attempting to install Elephant through clbuild.

Another note, Elephant seems to go really slowly when accessing a Postgres database over the Internet. I hope that keeping it in the same cluster will allow performance to be sufficient.