diff options
Diffstat (limited to 'src/modules/systemlib/uthash/doc')
-rw-r--r-- | src/modules/systemlib/uthash/doc/userguide.txt | 1682 | ||||
-rw-r--r-- | src/modules/systemlib/uthash/doc/utarray.txt | 376 | ||||
-rw-r--r-- | src/modules/systemlib/uthash/doc/utlist.txt | 219 | ||||
-rw-r--r-- | src/modules/systemlib/uthash/doc/utstring.txt | 178 |
4 files changed, 2455 insertions, 0 deletions
diff --git a/src/modules/systemlib/uthash/doc/userguide.txt b/src/modules/systemlib/uthash/doc/userguide.txt new file mode 100644 index 000000000..3e65a52fc --- /dev/null +++ b/src/modules/systemlib/uthash/doc/userguide.txt @@ -0,0 +1,1682 @@ +uthash User Guide +================= +Troy D. Hanson <thanson@users.sourceforge.net> +v1.9.6, April 2012 + +include::sflogo.txt[] +include::topnav.txt[] + +A hash in C +----------- +include::toc.txt[] + +This document is written for C programmers. Since you're reading this, chances +are that you know a hash is used for looking up items using a key. In scripting +languages like Perl, hashes are used all the time. In C, hashes don't exist in +the language itself. This software provides a hash table for C structures. + +What can it do? +~~~~~~~~~~~~~~~~~ +This software supports these operations on items in a hash table: + +1. add +2. find +3. delete +4. count +5. iterate +6. sort +7. select + +Is it fast? +~~~~~~~~~~~ +Add, find and delete are normally constant-time operations. This is influenced +by your key domain and the hash function. + +This hash aims to be minimalistic and efficient. It's around 900 lines of C. +It inlines automatically because it's implemented as macros. It's fast as long +as the hash function is suited to your keys. You can use the default hash +function, or easily compare performance and choose from among several other +<<hash_functions,built-in hash functions>>. + +Is it a library? +~~~~~~~~~~~~~~~~ +No, it's just a single header file: `uthash.h`. All you need to do is copy +the header file into your project, and: + + #include "uthash.h" + +Since uthash is a header file only, there is no library code to link against. + +C/C++ and platforms +~~~~~~~~~~~~~~~~~~~ +This software can be used in C and C++ programs. It has been tested on: + + * Linux + * Mac OS X + * Windows using Visual Studio 2008 and 2010 + * Solaris + * OpenBSD + * FreeBSD + +Test suite +^^^^^^^^^^ +To run the test suite, enter the `tests` directory. Then, + + * on Unix platforms, run `make` + * on Windows, run the "do_tests_win32.cmd" batch file. (You may edit the + batch file if your Visual Studio is installed in a non-standard location). + +BSD licensed +~~~~~~~~~~~~ +This software is made available under the +link:license.html[revised BSD license]. +It is free and open source. + +Obtaining uthash +~~~~~~~~~~~~~~~~ +Please follow the link to download on the +http://uthash.sourceforge.net[uthash website] at http://uthash.sourceforge.net. + +A number of platforms include uthash in their package repositories. For example, +Debian/Ubuntu users may simply `aptitude install uthash-dev`. + +Getting help +~~~~~~~~~~~~ +Feel free to mailto:tdh@tkhanson.net[email the author] at +tdh@tkhanson.net. + +Resources +~~~~~~~~~ +Users of uthash may wish to follow the news feed for information about new +releases. Also, there are some extra bonus headers included with uthash. + +News:: + The author has a news feed for http://troydhanson.wordpress.com/[software updates] image:img/rss.png[(RSS)]. +Extras included with uthash:: + uthash ships with these "extras"-- independent headers similar to uthash. + First link:utlist.html[utlist.h] provides linked list macros for C structures. + Second, link:utarray.html[utarray.h] implements dynamic arrays using macros. + Third, link:utstring.html[utstring.h] implements a basic dynamic string. +Other software:: + Other open-source software by the author is listed at http://tkhanson.net. + +Who's using it? +~~~~~~~~~~~~~~~ +Since releasing uthash in 2006, it has been downloaded thousands of times, +incorporated into commercial software, academic research, and into other +open-source software. + +Your structure +-------------- + +In uthash, a hash table is comprised of structures. Each structure represents a +key-value association. One or more of the structure fields constitute the key. +The structure pointer itself is the value. + +.Defining a structure that can be hashed +---------------------------------------------------------------------- +#include "uthash.h" + +struct my_struct { + int id; /* key */ + char name[10]; + UT_hash_handle hh; /* makes this structure hashable */ +}; +---------------------------------------------------------------------- + +Note that, in uthash, your structure will never be moved or copied into another +location when you add it into a hash table. This means that you can keep other +data structures that safely point to your structure-- regardless of whether you +add or delete it from a hash table during your program's lifetime. + +The key +~~~~~~~ +There are no restrictions on the data type or name of the key field. The key +can also comprise multiple contiguous fields, having any names and data types. + +.Any data type... really? +***************************************************************************** +Yes, your key and structure can have any data type. Unlike function calls with +fixed prototypes, uthash consists of macros-- whose arguments are untyped-- and +thus able to work with any type of structure or key. +***************************************************************************** + +Unique keys +^^^^^^^^^^^ +As with any hash, every item must have a unique key. Your application must +enforce key uniqueness. Before you add an item to the hash table, you must +first know (if in doubt, check!) that the key is not already in use. You +can check whether a key already exists in the hash table using `HASH_FIND`. + +The hash handle +~~~~~~~~~~~~~~~ +The `UT_hash_handle` field must be present in your structure. It is used for +the internal bookkeeping that makes the hash work. It does not require +initialization. It can be named anything, but you can simplify matters by +naming it `hh`. This allows you to use the easier "convenience" macros to add, +find and delete items. + +A word about memory +~~~~~~~~~~~~~~~~~~~ +Some have asked how uthash cleans up its internal memory. The answer is simple: +'when you delete the final item' from a hash table, uthash releases all the +internal memory associated with that hash table, and sets its pointer to NULL. + +Hash operations +--------------- + +This section introduces the uthash macros by example. For a more succinct +listing, see <<Macro_reference,Macro Reference>>. + +.Convenience vs. general macros: +***************************************************************************** +The uthash macros fall into two categories. The 'convenience' macros can be used +with integer, pointer or string keys (and require that you chose the conventional +name `hh` for the `UT_hash_handle` field). The convenience macros take fewer +arguments than the general macros, making their usage a bit simpler for these +common types of keys. + +The 'general' macros can be used for any types of keys, or for multi-field keys, +or when the `UT_hash_handle` has been named something other than `hh`. These +macros take more arguments and offer greater flexibility in return. But if the +convenience macros suit your needs, use them-- your code will be more readable. +***************************************************************************** + +Declare the hash +~~~~~~~~~~~~~~~~ +Your hash must be declared as a `NULL`-initialized pointer to your structure. + + struct my_struct *users = NULL; /* important! initialize to NULL */ + +Add item +~~~~~~~~ +Allocate and initialize your structure as you see fit. The only aspect +of this that matters to uthash is that your key must be initialized to +a unique value. Then call `HASH_ADD`. (Here we use the convenience macro +`HASH_ADD_INT`, which offers simplified usage for keys of type `int`). + +.Add an item to a hash +---------------------------------------------------------------------- +void add_user(int user_id, char *name) { + struct my_struct *s; + + s = malloc(sizeof(struct my_struct)); + s->id = user_id; + strcpy(s->name, name); + HASH_ADD_INT( users, id, s ); /* id: name of key field */ +} +---------------------------------------------------------------------- + +The first parameter to `HASH_ADD_INT` is the hash table, and the +second parameter is the 'name' of the key field. Here, this is `id`. The +last parameter is a pointer to the structure being added. + +[[validc]] +.Wait.. the field name is a parameter? +******************************************************************************* +If you find it strange that `id`, which is the 'name of a field' in the +structure, can be passed as a parameter, welcome to the world of macros. Don't +worry- the C preprocessor expands this to valid C code. +******************************************************************************* + +Key must not be modified while in-use +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Once a structure has been added to the hash, do not change the value of its key. +Instead, delete the item from the hash, change the key, and then re-add it. + +Passing the hash pointer into functions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +In the example above `users` is a global variable, but what if the caller wanted +to pass the hash pointer 'into' the `add_user` function? At first glance it would +appear that you could simply pass `users` as an argument, but that won't work +right. + + /* bad */ + void add_user(struct my_struct *users, int user_id, char *name) { + ... + HASH_ADD_INT( users, id, s ); + } + +You really need to pass 'a pointer' to the hash pointer: + + /* good */ + void add_user(struct my_struct **users, int user_id, char *name) { ... + ... + HASH_ADD_INT( *users, id, s ); + } + +Note that we dereferenced the pointer in the `HASH_ADD` also. + +The reason it's necessary to deal with a pointer to the hash pointer is simple: +the hash macros modify it (in other words, they modify the 'pointer itself' not +just what it points to). + +Find item +~~~~~~~~~ +To look up a structure in a hash, you need its key. Then call `HASH_FIND`. +(Here we use the convenience macro `HASH_FIND_INT` for keys of type `int`). + +.Find a structure using its key +---------------------------------------------------------------------- +struct my_struct *find_user(int user_id) { + struct my_struct *s; + + HASH_FIND_INT( users, &user_id, s ); /* s: output pointer */ + return s; +} +---------------------------------------------------------------------- + +Here, the hash table is `users`, and `&user_id` points to the key (an integer +in this case). Last, `s` is the 'output' variable of `HASH_FIND_INT`. The +final result is that `s` points to the structure with the given key, or +is `NULL` if the key wasn't found in the hash. + +[NOTE] +The middle argument is a 'pointer' to the key. You can't pass a literal key +value to `HASH_FIND`. Instead assign the literal value to a variable, and pass +a pointer to the variable. + + +Delete item +~~~~~~~~~~~ +To delete a structure from a hash, you must have a pointer to it. (If you only +have the key, first do a `HASH_FIND` to get the structure pointer). + +.Delete an item from a hash +---------------------------------------------------------------------- +void delete_user(struct my_struct *user) { + HASH_DEL( users, user); /* user: pointer to deletee */ + free(user); /* optional; it's up to you! */ +} +---------------------------------------------------------------------- + +Here again, `users` is the hash table, and `user` is a pointer to the +structure we want to remove from the hash. + +uthash never frees your structure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Deleting a structure just removes it from the hash table-- it doesn't `free` +it. The choice of when to free your structure is entirely up to you; uthash +will never free your structure. + +Delete can change the pointer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The hash table pointer (which initially points to the first item added to the +hash) can change in response to `HASH_DEL` (i.e. if you delete the first item +in the hash table). + +Iterative deletion +^^^^^^^^^^^^^^^^^^ +The `HASH_ITER` macro is a deletion-safe iteration construct which expands +to a simple 'for' loop. + +.Delete all items from a hash +---------------------------------------------------------------------- +void delete_all() { + struct my_struct *current_user, *tmp; + + HASH_ITER(hh, users, current_user, tmp) { + HASH_DEL(users,current_user); /* delete; users advances to next */ + free(current_user); /* optional- if you want to free */ + } +} +---------------------------------------------------------------------- + +All-at-once deletion +^^^^^^^^^^^^^^^^^^^^ +If you only want to delete all the items, but not free them or do any +per-element clean up, you can do this more efficiently in a single operation: + + HASH_CLEAR(hh,users); + +Afterward, the list head (here, `users`) will be set to `NULL`. + +Count items +~~~~~~~~~~~ + +The number of items in the hash table can be obtained using `HASH_COUNT`: + +.Count of items in the hash table +---------------------------------------------------------------------- +unsigned int num_users; +num_users = HASH_COUNT(users); +printf("there are %u users\n", num_users); +---------------------------------------------------------------------- + +Incidentally, this works even the list (`users`, here) is `NULL`, in +which case the count is 0. + +Iterating and sorting +~~~~~~~~~~~~~~~~~~~~~ + +You can loop over the items in the hash by starting from the beginning and +following the `hh.next` pointer. + +.Iterating over all the items in a hash +---------------------------------------------------------------------- +void print_users() { + struct my_struct *s; + + for(s=users; s != NULL; s=s->hh.next) { + printf("user id %d: name %s\n", s->id, s->name); + } +} +---------------------------------------------------------------------- + +There is also an `hh.prev` pointer you could use to iterate backwards through +the hash, starting from any known item. + +[[deletesafe]] +Deletion-safe iteration +^^^^^^^^^^^^^^^^^^^^^^^ +In the example above, it would not be safe to delete and free `s` in the body +of the 'for' loop, (because `s` is derefenced each time the loop iterates). +This is easy to rewrite correctly (by copying the `s->hh.next` pointer to a +temporary variable 'before' freeing `s`), but it comes up often enough that a +deletion-safe iteration macro, `HASH_ITER`, is included. It expands to a +`for`-loop header. Here is how it could be used to rewrite the last example: + + struct my_struct *s, *tmp; + + HASH_ITER(hh, users, s, tmp) { + printf("user id %d: name %s\n", s->id, s->name); + /* ... it is safe to delete and free s here */ + } + +.A hash is also a doubly-linked list. +******************************************************************************* +Iterating backward and forward through the items in the hash is possible +because of the `hh.prev` and `hh.next` fields. All the items in the hash can +be reached by repeatedly following these pointers, thus the hash is also a +doubly-linked list. +******************************************************************************* + +If you're using uthash in a C++ program, you need an extra cast on the `for` +iterator, e.g., `s=(struct my_struct*)s->hh.next`. + +Sorted iteration +^^^^^^^^^^^^^^^^ +The items in the hash are, by default, traversed in the order they were added +("insertion order") when you follow the `hh.next` pointer. But you can sort +the items into a new order using `HASH_SORT`. E.g., + + HASH_SORT( users, name_sort ); + +The second argument is a pointer to a comparison function. It must accept two +arguments which are pointers to two items to compare. Its return value should +be less than zero, zero, or greater than zero, if the first item sorts before, +equal to, or after the second item, respectively. (Just like `strcmp`). + +.Sorting the items in the hash +---------------------------------------------------------------------- +int name_sort(struct my_struct *a, struct my_struct *b) { + return strcmp(a->name,b->name); +} + +int id_sort(struct my_struct *a, struct my_struct *b) { + return (a->id - b->id); +} + +void sort_by_name() { + HASH_SORT(users, name_sort); +} + +void sort_by_id() { + HASH_SORT(users, id_sort); +} +---------------------------------------------------------------------- + +When the items in the hash are sorted, the first item may change position. In +the example above, `users` may point to a different structure after calling +`HASH_SORT`. + +A complete example +~~~~~~~~~~~~~~~~~~ + +We'll repeat all the code and embellish it with a `main()` function to form a +working example. + +If this code was placed in a file called `example.c` in the same directory as +`uthash.h`, it could be compiled and run like this: + + cc -o example example.c + ./example + +Follow the prompts to try the program. + +.A complete program +---------------------------------------------------------------------- +#include <stdio.h> /* gets */ +#include <stdlib.h> /* atoi, malloc */ +#include <string.h> /* strcpy */ +#include "uthash.h" + +struct my_struct { + int id; /* key */ + char name[10]; + UT_hash_handle hh; /* makes this structure hashable */ +}; + +struct my_struct *users = NULL; + +void add_user(int user_id, char *name) { + struct my_struct *s; + + s = (struct my_struct*)malloc(sizeof(struct my_struct)); + s->id = user_id; + strcpy(s->name, name); + HASH_ADD_INT( users, id, s ); /* id: name of key field */ +} + +struct my_struct *find_user(int user_id) { + struct my_struct *s; + + HASH_FIND_INT( users, &user_id, s ); /* s: output pointer */ + return s; +} + +void delete_user(struct my_struct *user) { + HASH_DEL( users, user); /* user: pointer to deletee */ + free(user); +} + +void delete_all() { + struct my_struct *current_user, *tmp; + + HASH_ITER(hh, users, current_user, tmp) { + HASH_DEL(users,current_user); /* delete it (users advances to next) */ + free(current_user); /* free it */ + } +} + +void print_users() { + struct my_struct *s; + + for(s=users; s != NULL; s=(struct my_struct*)(s->hh.next)) { + printf("user id %d: name %s\n", s->id, s->name); + } +} + +int name_sort(struct my_struct *a, struct my_struct *b) { + return strcmp(a->name,b->name); +} + +int id_sort(struct my_struct *a, struct my_struct *b) { + return (a->id - b->id); +} + +void sort_by_name() { + HASH_SORT(users, name_sort); +} + +void sort_by_id() { + HASH_SORT(users, id_sort); +} + +int main(int argc, char *argv[]) { + char in[10]; + int id=1, running=1; + struct my_struct *s; + unsigned num_users; + + while (running) { + printf("1. add user\n"); + printf("2. find user\n"); + printf("3. delete user\n"); + printf("4. delete all users\n"); + printf("5. sort items by name\n"); + printf("6. sort items by id\n"); + printf("7. print users\n"); + printf("8. count users\n"); + printf("9. quit\n"); + gets(in); + switch(atoi(in)) { + case 1: + printf("name?\n"); + add_user(id++, gets(in)); + break; + case 2: + printf("id?\n"); + s = find_user(atoi(gets(in))); + printf("user: %s\n", s ? s->name : "unknown"); + break; + case 3: + printf("id?\n"); + s = find_user(atoi(gets(in))); + if (s) delete_user(s); + else printf("id unknown\n"); + break; + case 4: + delete_all(); + break; + case 5: + sort_by_name(); + break; + case 6: + sort_by_id(); + break; + case 7: + print_users(); + break; + case 8: + num_users=HASH_COUNT(users); + printf("there are %u users\n", num_users); + break; + case 9: + running=0; + break; + } + } + + delete_all(); /* free any structures */ + return 0; +} +---------------------------------------------------------------------- + +This program is included in the distribution in `tests/example.c`. You can run +`make example` in that directory to compile it easily. + +Standard key types +------------------ +This section goes into specifics of how to work with different kinds of keys. +You can use nearly any type of key-- integers, strings, pointers, structures, etc. + +[NOTE] +.A note about float +================================================================================ +You can use floating point keys. This comes with the same caveats as with any +program that tests floating point equality. In other words, even the tiniest +difference in two floating point numbers makes them distinct keys. +================================================================================ + +Integer keys +~~~~~~~~~~~~ +The preceding examples demonstrated use of integer keys. To recap, use the +convenience macros `HASH_ADD_INT` and `HASH_FIND_INT` for structures with +integer keys. (The other operations such as `HASH_DELETE` and `HASH_SORT` are +the same for all types of keys). + +String keys +~~~~~~~~~~~ +If your structure has a string key, the operations to use depend on whether your +structure 'points to' the key (`char *`) or the string resides `within` the +structure (`char a[10]`). *This distinction is important*. As we'll see below, +you need to use `HASH_ADD_KEYPTR` when your structure 'points' to a key (that is, +the key itself is 'outside' of the structure); in contrast, use `HASH_ADD_STR` +for a string key that is contained *within* your structure. + +[NOTE] +.char[ ] vs. char* +================================================================================ +The string is 'within' the structure in the first example below-- `name` is a +`char[10]` field. In the second example, the key is 'outside' of the +structure-- `name` is a `char *`. So the first example uses `HASH_ADD_STR` but +the second example uses `HASH_ADD_KEYPTR`. For information on this macro, see +the <<Macro_reference,Macro reference>>. +================================================================================ + +String 'within' structure +^^^^^^^^^^^^^^^^^^^^^^^^^ + +.A string-keyed hash (string within structure) +---------------------------------------------------------------------- +#include <string.h> /* strcpy */ +#include <stdlib.h> /* malloc */ +#include <stdio.h> /* printf */ +#include "uthash.h" + +struct my_struct { + char name[10]; /* key (string is WITHIN the structure) */ + int id; + UT_hash_handle hh; /* makes this structure hashable */ +}; + + +int main(int argc, char *argv[]) { + const char **n, *names[] = { "joe", "bob", "betty", NULL }; + struct my_struct *s, *tmp, *users = NULL; + int i=0; + + for (n = names; *n != NULL; n++) { + s = (struct my_struct*)malloc(sizeof(struct my_struct)); + strncpy(s->name, *n,10); + s->id = i++; + HASH_ADD_STR( users, name, s ); + } + + HASH_FIND_STR( users, "betty", s); + if (s) printf("betty's id is %d\n", s->id); + + /* free the hash table contents */ + HASH_ITER(hh, users, s, tmp) { + HASH_DEL(users, s); + free(s); + } + return 0; +} +---------------------------------------------------------------------- + +This example is included in the distribution in `tests/test15.c`. It prints: + + betty's id is 2 + +String 'pointer' in structure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Now, here is the same example but using a `char *` key instead of `char [ ]`: + +.A string-keyed hash (structure points to string) +---------------------------------------------------------------------- +#include <string.h> /* strcpy */ +#include <stdlib.h> /* malloc */ +#include <stdio.h> /* printf */ +#include "uthash.h" + +struct my_struct { + const char *name; /* key */ + int id; + UT_hash_handle hh; /* makes this structure hashable */ +}; + + +int main(int argc, char *argv[]) { + const char **n, *names[] = { "joe", "bob", "betty", NULL }; + struct my_struct *s, *tmp, *users = NULL; + int i=0; + + for (n = names; *n != NULL; n++) { + s = (struct my_struct*)malloc(sizeof(struct my_struct)); + s->name = *n; + s->id = i++; + HASH_ADD_KEYPTR( hh, users, s->name, strlen(s->name), s ); + } + + HASH_FIND_STR( users, "betty", s); + if (s) printf("betty's id is %d\n", s->id); + + /* free the hash table contents */ + HASH_ITER(hh, users, s, tmp) { + HASH_DEL(users, s); + free(s); + } + return 0; +} +---------------------------------------------------------------------- + +This example is included in `tests/test40.c`. + +Pointer keys +~~~~~~~~~~~~ +Your key can be a pointer. To be very clear, this means the 'pointer itself' +can be the key (in contrast, if the thing 'pointed to' is the key, this is a +different use case handled by `HASH_ADD_KEYPTR`). + +Here is a simple example where a structure has a pointer member, called `key`. + +.A pointer key +---------------------------------------------------------------------- +#include <stdio.h> +#include <stdlib.h> +#include "uthash.h" + +typedef struct { + void *key; + int i; + UT_hash_handle hh; +} el_t; + +el_t *hash = NULL; +char *someaddr = NULL; + +int main() { + el_t *d; + el_t *e = (el_t*)malloc(sizeof(el_t)); + if (!e) return -1; + e->key = (void*)someaddr; + e->i = 1; + HASH_ADD_PTR(hash,key,e); + HASH_FIND_PTR(hash, &someaddr, d); + if (d) printf("found\n"); + + /* release memory */ + HASH_DEL(hash,e); + free(e); + return 0; +} +---------------------------------------------------------------------- + +This example is included in `tests/test57.c`. Note that the end of the program +deletes the element out of the hash, (and since no more elements remain in the +hash), uthash releases its internal memory. + +Structure keys +~~~~~~~~~~~~~~ +Your key field can have any data type. To uthash, it is just a sequence of +bytes. Therefore, even a nested structure can be used as a key. We'll use the +general macros `HASH_ADD` and `HASH_FIND` to demonstrate. + +NOTE: Structures contain padding (wasted internal space used to fulfill +alignment requirements for the members of the structure). These padding bytes +'must be zeroed' before adding an item to the hash or looking up an item. +Therefore always zero the whole structure before setting the members of +interest. The example below does this-- see the two calls to `memset`. + +.A key which is a structure +---------------------------------------------------------------------- +#include <stdlib.h> +#include <stdio.h> +#include "uthash.h" + +typedef struct { + char a; + int b; +} record_key_t; + +typedef struct { + record_key_t key; + /* ... other data ... */ + UT_hash_handle hh; +} record_t; + +int main(int argc, char *argv[]) { + record_t l, *p, *r, *tmp, *records = NULL; + + r = (record_t*)malloc( sizeof(record_t) ); + memset(r, 0, sizeof(record_t)); + r->key.a = 'a'; + r->key.b = 1; + HASH_ADD(hh, records, key, sizeof(record_key_t), r); + + memset(&l, 0, sizeof(record_t)); + l.key.a = 'a'; + l.key.b = 1; + HASH_FIND(hh, records, &l.key, sizeof(record_key_t), p); + + if (p) printf("found %c %d\n", p->key.a, p->key.b); + + HASH_ITER(hh, records, p, tmp) { + HASH_DEL(records, p); + free(p); + } + return 0; +} + +---------------------------------------------------------------------- + +This usage is nearly the same as use of a compound key explained below. + +Note that the general macros require the name of the `UT_hash_handle` to be +passed as the first argument (here, this is `hh`). The general macros are +documented in <<Macro_reference,Macro Reference>>. + +Advanced Topics +--------------- + +Compound keys +~~~~~~~~~~~~~ +Your key can even comprise multiple contiguous fields. + +.A multi-field key +---------------------------------------------------------------------- +#include <stdlib.h> /* malloc */ +#include <stddef.h> /* offsetof */ +#include <stdio.h> /* printf */ +#include <string.h> /* memset */ +#include "uthash.h" + +#define UTF32 1 + +typedef struct { + UT_hash_handle hh; + int len; + char encoding; /* these two fields */ + int text[]; /* comprise the key */ +} msg_t; + +typedef struct { + char encoding; + int text[]; +} lookup_key_t; + +int main(int argc, char *argv[]) { + unsigned keylen; + msg_t *msg, *tmp, *msgs = NULL; + lookup_key_t *lookup_key; + + int beijing[] = {0x5317, 0x4eac}; /* UTF-32LE for 北京 */ + + /* allocate and initialize our structure */ + msg = (msg_t*)malloc( sizeof(msg_t) + sizeof(beijing) ); + memset(msg, 0, sizeof(msg_t)+sizeof(beijing)); /* zero fill */ + msg->len = sizeof(beijing); + msg->encoding = UTF32; + memcpy(msg->text, beijing, sizeof(beijing)); + + /* calculate the key length including padding, using formula */ + keylen = offsetof(msg_t, text) /* offset of last key field */ + + sizeof(beijing) /* size of last key field */ + - offsetof(msg_t, encoding); /* offset of first key field */ + + /* add our structure to the hash table */ + HASH_ADD( hh, msgs, encoding, keylen, msg); + + /* look it up to prove that it worked :-) */ + msg=NULL; + + lookup_key = (lookup_key_t*)malloc(sizeof(*lookup_key) + sizeof(beijing)); + memset(lookup_key, 0, sizeof(*lookup_key) + sizeof(beijing)); + lookup_key->encoding = UTF32; + memcpy(lookup_key->text, beijing, sizeof(beijing)); + HASH_FIND( hh, msgs, &lookup_key->encoding, keylen, msg ); + if (msg) printf("found \n"); + free(lookup_key); + + HASH_ITER(hh, msgs, msg, tmp) { + HASH_DEL(msgs, msg); + free(msg); + } + return 0; +} +---------------------------------------------------------------------- + +This example is included in the distribution in `tests/test22.c`. + +If you use multi-field keys, recognize that the compiler pads adjacent fields +(by inserting unused space between them) in order to fulfill the alignment +requirement of each field. For example a structure containing a `char` followed +by an `int` will normally have 3 "wasted" bytes of padding after the char, in +order to make the `int` field start on a multiple-of-4 address (4 is the length +of the int). + +.Calculating the length of a multi-field key: +******************************************************************************* +To determine the key length when using a multi-field key, you must include any +intervening structure padding the compiler adds for alignment purposes. + +An easy way to calculate the key length is to use the `offsetof` macro from +`<stddef.h>`. The formula is: + + key length = offsetof(last_key_field) + + sizeof(last_key_field) + - offsetof(first_key_field) + +In the example above, the `keylen` variable is set using this formula. +******************************************************************************* + +When dealing with a multi-field key, you must zero-fill your structure before +`HASH_ADD`'ing it to a hash table, or using its fields in a `HASH_FIND` key. + +In the previous example, `memset` is used to initialize the structure by +zero-filling it. This zeroes out any padding between the key fields. If we +didn't zero-fill the structure, this padding would contain random values. The +random values would lead to `HASH_FIND` failures; as two "identical" keys will +appear to mismatch if there are any differences within their padding. + +[[multilevel]] +Multi-level hash tables +~~~~~~~~~~~~~~~~~~~~~~~ +A multi-level hash table arises when each element of a hash table contains its +own secondary hash table. There can be any number of levels. In a scripting +language you might see: + + $items{bob}{age}=37 + +The C program below builds this example in uthash: the hash table is called +`items`. It contains one element (`bob`) whose own hash table contains one +element (`age`) with value 37. No special functions are necessary to build +a multi-level hash table. + +While this example represents both levels (`bob` and `age`) using the same +structure, it would also be fine to use two different structure definitions. +It would also be fine if there were three or more levels instead of two. + +.Multi-level hash table +---------------------------------------------------------------------- +#include <stdio.h> +#include <string.h> +#include <stdlib.h> +#include "uthash.h" + +/* hash of hashes */ +typedef struct item { + char name[10]; + struct item *sub; + int val; + UT_hash_handle hh; +} item_t; + +item_t *items=NULL; + +int main(int argc, char *argvp[]) { + item_t *item1, *item2, *tmp1, *tmp2; + + /* make initial element */ + item_t *i = malloc(sizeof(*i)); + strcpy(i->name, "bob"); + i->sub = NULL; + i->val = 0; + HASH_ADD_STR(items, name, i); + + /* add a sub hash table off this element */ + item_t *s = malloc(sizeof(*s)); + strcpy(s->name, "age"); + s->sub = NULL; + s->val = 37; + HASH_ADD_STR(i->sub, name, s); + + /* iterate over hash elements */ + HASH_ITER(hh, items, item1, tmp1) { + HASH_ITER(hh, item1->sub, item2, tmp2) { + printf("$items{%s}{%s} = %d\n", item1->name, item2->name, item2->val); + } + } + + /* clean up both hash tables */ + HASH_ITER(hh, items, item1, tmp1) { + HASH_ITER(hh, item1->sub, item2, tmp2) { + HASH_DEL(item1->sub, item2); + free(item2); + } + HASH_DEL(items, item1); + free(item1); + } + + return 0; +} +---------------------------------------------------------------------- +The example above is included in `tests/test59.c`. + +[[multihash]] +Items in several hash tables +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A structure can be added to more than one hash table. A few reasons you might do +this include: + +- each hash table may use an alternative key; +- each hash table may have its own sort order; +- or you might simply use multiple hash tables for grouping purposes. E.g., + you could have users in an `admin_users` and a `users` hash table. + +Your structure needs to have a `UT_hash_handle` field for each hash table to +which it might be added. You can name them anything. E.g., + + UT_hash_handle hh1, hh2; + +Items with alternative keys +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +You might create a hash table keyed on an ID field, and another hash table keyed +on username (if usernames are unique). You can add the same user structure to +both hash tables (without duplication of the structure), allowing lookup of a +user structure by their name or ID. The way to achieve this is to have a +separate `UT_hash_handle` for each hash to which the structure may be added. + +.A structure with two alternative keys +---------------------------------------------------------------------- +struct my_struct { + int id; /* usual key */ + char username[10]; /* alternative key */ + UT_hash_handle hh1; /* handle for first hash table */ + UT_hash_handle hh2; /* handle for second hash table */ +}; +---------------------------------------------------------------------- + +In the example above, the structure can now be added to two separate hash +tables. In one hash, `id` is its key, while in the other hash, `username` is +its key. (There is no requirement that the two hashes have different key +fields. They could both use the same key, such as `id`). + +Notice the structure has two hash handles (`hh1` and `hh2`). In the code +below, notice that each hash handle is used exclusively with a particular hash +table. (`hh1` is always used with the `users_by_id` hash, while `hh2` is +always used with the `users_by_name` hash table). + +.Two keys on a structure +---------------------------------------------------------------------- + struct my_struct *users_by_id = NULL, *users_by_name = NULL, *s; + int i; + char *name; + + s = malloc(sizeof(struct my_struct)); + s->id = 1; + strcpy(s->username, "thanson"); + + /* add the structure to both hash tables */ + HASH_ADD(hh1, users_by_id, id, sizeof(int), s); + HASH_ADD(hh2, users_by_name, username, strlen(s->username), s); + + /* lookup user by ID in the "users_by_id" hash table */ + i=1; + HASH_FIND(hh1, users_by_id, &i, sizeof(int), s); + if (s) printf("found id %d: %s\n", i, s->username); + + /* lookup user by username in the "users_by_name" hash table */ + name = "thanson"; + HASH_FIND(hh2, users_by_name, name, strlen(name), s); + if (s) printf("found user %s: %d\n", name, s->id); +---------------------------------------------------------------------- + + +Several sort orders +~~~~~~~~~~~~~~~~~~~ +It comes as no suprise that two hash tables can have different sort orders, but +this fact can also be used advantageously to sort the 'same items' in several +ways. This is based on the ability to store a structure in several hash tables. + +Extending the previous example, suppose we have many users. We have added each +user structure to the `users_by_id` hash table and the `users_by_name` hash table. +(To reiterate, this is done without the need to have two copies of each structure). +Now we can define two sort functions, then use `HASH_SRT`. + + int sort_by_id(struct my_struct *a, struct my_struct *b) { + if (a->id == b->id) return 0; + return (a->id < b->id) ? -1 : 1; + } + + int sort_by_name(struct my_struct *a, struct my_struct *b) { + return strcmp(a->username,b->username); + } + + HASH_SRT(hh1, users_by_id, sort_by_id); + HASH_SRT(hh2, users_by_name, sort_by_name); + +Now iterating over the items in `users_by_id` will traverse them in id-order +while, naturally, iterating over `users_by_name` will traverse them in +name-order. The items are fully forward-and-backward linked in each order. +So even for one set of users, we might store them in two hash tables to provide +easy iteration in two different sort orders. + +Bloom filter (faster misses) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Programs that generate a fair miss rate (`HASH_FIND` that result in `NULL`) may +benefit from the built-in Bloom filter support. This is disabled by default, +because programs that generate only hits would incur a slight penalty from it. +Also, programs that do deletes should not use the Bloom filter. While the +program would operate correctly, deletes diminish the benefit of the filter. +To enable the Bloom filter, simply compile with `-DHASH_BLOOM=n` like: + + -DHASH_BLOOM=27 + +where the number can be any value up to 32 which determines the amount of memory +used by the filter, as shown below. Using more memory makes the filter more +accurate and has the potential to speed up your program by making misses bail +out faster. + +.Bloom filter sizes for selected values of n +[width="50%",cols="10m,30",grid="none",options="header"] +|===================================================================== +| n | Bloom filter size (per hash table) +| 16 | 8 kilobytes +| 20 | 128 kilobytes +| 24 | 2 megabytes +| 28 | 32 megabytes +| 32 | 512 megabytes +|===================================================================== + +Bloom filters are only a performance feature; they do not change the results of +hash operations in any way. The only way to gauge whether or not a Bloom filter +is right for your program is to test it. Reasonable values for the size of the +Bloom filter are 16-32 bits. + +Select +~~~~~~ +An experimental 'select' operation is provided that inserts those items from a +source hash that satisfy a given condition into a destination hash. This +insertion is done with somewhat more efficiency than if this were using +`HASH_ADD`, namely because the hash function is not recalculated for keys of the +selected items. This operation does not remove any items from the source hash. +Rather the selected items obtain dual presence in both hashes. The destination +hash may already have items in it; the selected items are added to it. In order +for a structure to be usable with `HASH_SELECT`, it must have two or more hash +handles. (As described <<multihash,here>>, a structure can exist in many +hash tables at the same time; it must have a separate hash handle for each one). + + user_t *users=NULL, *admins=NULL; /* two hash tables */ + + typedef struct { + int id; + UT_hash_handle hh; /* handle for users hash */ + UT_hash_handle ah; /* handle for admins hash */ + } user_t; + +Now suppose we have added some users, and want to select just the administrator +users who have id's less than 1024. + + #define is_admin(x) (((user_t*)x)->id < 1024) + HASH_SELECT(ah,admins,hh,users,is_admin); + +The first two parameters are the 'destination' hash handle and hash table, the +second two parameters are the 'source' hash handle and hash table, and the last +parameter is the 'select condition'. Here we used a macro `is_admin()` but we +could just as well have used a function. + + int is_admin(void *userv) { + user_t *user = (user_t*)userv; + return (user->id < 1024) ? 1 : 0; + } + +If the select condition always evaluates to true, this operation is +essentially a 'merge' of the source hash into the destination hash. Of course, +the source hash remains unchanged under any use of `HASH_SELECT`. It only adds +items to the destination hash selectively. + +The two hash handles must differ. An example of using `HASH_SELECT` is included +in `tests/test36.c`. + + +[[hash_functions]] +Built-in hash functions +~~~~~~~~~~~~~~~~~~~~~~~ +Internally, a hash function transforms a key into a bucket number. You don't +have to take any action to use the default hash function, currently Jenkin's. + +Some programs may benefit from using another of the built-in hash functions. +There is a simple analysis utility included with uthash to help you determine +if another hash function will give you better performance. + +You can use a different hash function by compiling your program with +`-DHASH_FUNCTION=HASH_xyz` where `xyz` is one of the symbolic names listed +below. E.g., + + cc -DHASH_FUNCTION=HASH_BER -o program program.c + +.Built-in hash functions +[width="50%",cols="^5m,20",grid="none",options="header"] +|=============================================================================== +|Symbol | Name +|JEN | Jenkins (default) +|BER | Bernstein +|SAX | Shift-Add-Xor +|OAT | One-at-a-time +|FNV | Fowler/Noll/Vo +|SFH | Paul Hsieh +|MUR | MurmurHash v3 (see note) +|=============================================================================== + +[NOTE] +.MurmurHash +================================================================================ +A special symbol must be defined if you intend to use MurmurHash. To use it, add +`-DHASH_USING_NO_STRICT_ALIASING` to your `CFLAGS`. And, if you are using +the gcc compiler with optimization, add `-fno-strict-aliasing` to your `CFLAGS`. +================================================================================ + +Which hash function is best? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +You can easily determine the best hash function for your key domain. To do so, +you'll need to run your program once in a data-collection pass, and then run +the collected data through an included analysis utility. + +First you must build the analysis utility. From the top-level directory, + + cd tests/ + make + +We'll use `test14.c` to demonstrate the data-collection and analysis steps +(here using `sh` syntax to redirect file descriptor 3 to a file): + +.Using keystats +-------------------------------------------------------------------------------- +% cc -DHASH_EMIT_KEYS=3 -I../src -o test14 test14.c +% ./test14 3>test14.keys +% ./keystats test14.keys +fcn ideal% #items #buckets dup% fl add_usec find_usec del-all usec +--- ------ ---------- ---------- ----- -- ---------- ---------- ------------ +SFH 91.6% 1219 256 0% ok 92 131 25 +FNV 90.3% 1219 512 0% ok 107 97 31 +SAX 88.7% 1219 512 0% ok 111 109 32 +OAT 87.2% 1219 256 0% ok 99 138 26 +JEN 86.7% 1219 256 0% ok 87 130 27 +BER 86.2% 1219 256 0% ok 121 129 27 +-------------------------------------------------------------------------------- + +[NOTE] +The number 3 in `-DHASH_EMIT_KEYS=3` is a file descriptor. Any file descriptor +that your program doesn't use for its own purposes can be used instead of 3. +The data-collection mode enabled by `-DHASH_EMIT_KEYS=x` should not be used in +production code. + +Usually, you should just pick the first hash function that is listed. Here, this +is `SFH`. This is the function that provides the most even distribution for +your keys. If several have the same `ideal%`, then choose the fastest one +according to the `find_usec` column. + +keystats column reference +^^^^^^^^^^^^^^^^^^^^^^^^^ +fcn:: + symbolic name of hash function +ideal%:: + The percentage of items in the hash table which can be looked up within an + ideal number of steps. (Further explained below). +#items:: + the number of keys that were read in from the emitted key file +#buckets:: + the number of buckets in the hash after all the keys were added +dup%:: + the percent of duplicate keys encountered in the emitted key file. + Duplicates keys are filtered out to maintain key uniqueness. (Duplicates + are normal. For example, if the application adds an item to a hash, + deletes it, then re-adds it, the key is written twice to the emitted file.) +flags:: + this is either `ok`, or `nx` (noexpand) if the expansion inhibited flag is + set, described in <<expansion,Expansion internals>>. It is not recommended + to use a hash function that has the `noexpand` flag set. +add_usec:: + the clock time in microseconds required to add all the keys to a hash +find_usec:: + the clock time in microseconds required to look up every key in the hash +del-all usec:: + the clock time in microseconds required to delete every item in the hash + +[[ideal]] +ideal% +^^^^^^ + +.What is ideal%? +***************************************************************************** +The 'n' items in a hash are distributed into 'k' buckets. Ideally each bucket +would contain an equal share '(n/k)' of the items. In other words, the maximum +linear position of any item in a bucket chain would be 'n/k' if every bucket is +equally used. If some buckets are overused and others are underused, the +overused buckets will contain items whose linear position surpasses 'n/k'. +Such items are considered non-ideal. + +As you might guess, `ideal%` is the percentage of ideal items in the hash. These +items have favorable linear positions in their bucket chains. As `ideal%` +approaches 100%, the hash table approaches constant-time lookup performance. +***************************************************************************** + +[[hashscan]] +hashscan +~~~~~~~~ +NOTE: This utility is only available on Linux, and on FreeBSD (8.1 and up). + +A utility called `hashscan` is included in the `tests/` directory. It +is built automatically when you run `make` in that directory. This tool +examines a running process and reports on the uthash tables that it finds in +that program's memory. It can also save the keys from each table in a format +that can be fed into `keystats`. + +Here is an example of using `hashscan`. First ensure that it is built: + + cd tests/ + make + +Since `hashscan` needs a running program to inspect, we'll start up a simple +program that makes a hash table and then sleeps as our test subject: + + ./test_sleep & + pid: 9711 + +Now that we have a test program, let's run `hashscan` on it: + + ./hashscan 9711 + Address ideal items buckets mc fl bloom/sat fcn keys saved to + ------------------ ----- -------- -------- -- -- --------- --- ------------- + 0x862e038 81% 10000 4096 11 ok 16 14% JEN + +If we wanted to copy out all its keys for external analysis using `keystats`, +add the `-k` flag: + + ./hashscan -k 9711 + Address ideal items buckets mc fl bloom/sat fcn keys saved to + ------------------ ----- -------- -------- -- -- --------- --- ------------- + 0x862e038 81% 10000 4096 11 ok 16 14% JEN /tmp/9711-0.key + +Now we could run `./keystats /tmp/9711-0.key` to analyze which hash function +has the best characteristics on this set of keys. + +hashscan column reference +^^^^^^^^^^^^^^^^^^^^^^^^^ +Address:: + virtual address of the hash table +ideal:: + The percentage of items in the table which can be looked up within an ideal + number of steps. See <<ideal>> in the `keystats` section. +items:: + number of items in the hash table +buckets:: + number of buckets in the hash table +mc:: + the maximum chain length found in the hash table (uthash usually tries to + keep fewer than 10 items in each bucket, or in some cases a multiple of 10) +fl:: + flags (either `ok`, or `NX` if the expansion-inhibited flag is set) +bloom/sat:: + if the hash table uses a Bloom filter, this is the size (as a power of two) + of the filter (e.g. 16 means the filter is 2^16 bits in size). The second + number is the "saturation" of the bits expressed as a percentage. The lower + the percentage, the more potential benefit to identify cache misses quickly. +fcn:: + symbolic name of hash function +keys saved to:: + file to which keys were saved, if any + +.How hashscan works +***************************************************************************** +When hashscan runs, it attaches itself to the target process, which suspends +the target process momentarily. During this brief suspension, it scans the +target's virtual memory for the signature of a uthash hash table. It then +checks if a valid hash table structure accompanies the signature and reports +what it finds. When it detaches, the target process resumes running normally. +The hashscan is performed "read-only"-- the target process is not modified. +Since hashscan is analyzing a momentary snapshot of a running process, it may +return different results from one run to another. +***************************************************************************** + +[[expansion]] +Expansion internals +~~~~~~~~~~~~~~~~~~~ +Internally this hash manages the number of buckets, with the goal of having +enough buckets so that each one contains only a small number of items. + +.Why does the number of buckets matter? +******************************************************************************** +When looking up an item by its key, this hash scans linearly through the items +in the appropriate bucket. In order for the linear scan to run in constant +time, the number of items in each bucket must be bounded. This is accomplished +by increasing the number of buckets as needed. +******************************************************************************** + +Normal expansion +^^^^^^^^^^^^^^^^ +This hash attempts to keep fewer than 10 items in each bucket. When an item is +added that would cause a bucket to exceed this number, the number of buckets in +the hash is doubled and the items are redistributed into the new buckets. In an +ideal world, each bucket will then contain half as many items as it did before. + +Bucket expansion occurs automatically and invisibly as needed. There is +no need for the application to know when it occurs. + +Per-bucket expansion threshold +++++++++++++++++++++++++++++++ +Normally all buckets share the same threshold (10 items) at which point bucket +expansion is triggered. During the process of bucket expansion, uthash can +adjust this expansion-trigger threshold on a per-bucket basis if it sees that +certain buckets are over-utilized. + +When this threshold is adjusted, it goes from 10 to a multiple of 10 (for that +particular bucket). The multiple is based on how many times greater the actual +chain length is than the ideal length. It is a practical measure to reduce +excess bucket expansion in the case where a hash function over-utilizes a few +buckets but has good overall distribution. However, if the overall distribution +gets too bad, uthash changes tactics. + +Inhibited expansion +^^^^^^^^^^^^^^^^^^^ +You usually don't need to know or worry about this, particularly if you used +the `keystats` utility during development to select a good hash for your keys. + +A hash function may yield an uneven distribution of items across the buckets. +In moderation this is not a problem. Normal bucket expansion takes place as +the chain lengths grow. But when significant imbalance occurs (because the hash +function is not well suited to the key domain), bucket expansion may be +ineffective at reducing the chain lengths. + +Imagine a very bad hash function which always puts every item in bucket 0. No +matter how many times the number of buckets is doubled, the chain length of +bucket 0 stays the same. In a situation like this, the best behavior is to +stop expanding, and accept O(n) lookup performance. This is what uthash +does. It degrades gracefully if the hash function is ill-suited to the keys. + +If two consecutive bucket expansions yield `ideal%` values below 50%, uthash +inhibits expansion for that hash table. Once set, the 'bucket expansion +inhibited' flag remains in effect as long as the hash has items in it. +Inhibited expansion may cause `HASH_FIND` to exhibit worse than constant-time +performance. + +Hooks +~~~~~ +You don't need to use these hooks- they are only here if you want to modify +the behavior of uthash. Hooks can be used to change how uthash allocates +memory, and to run code in response to certain internal events. + +malloc/free +^^^^^^^^^^^ +By default this hash implementation uses `malloc` and `free` to manage memory. +If your application uses its own custom allocator, this hash can use them too. + +.Specifying alternate memory management functions +---------------------------------------------------------------------------- +#include "uthash.h" + +/* undefine the defaults */ +#undef uthash_malloc +#undef uthash_free + +/* re-define, specifying alternate functions */ +#define uthash_malloc(sz) my_malloc(sz) +#define uthash_free(ptr,sz) my_free(ptr) + +... +---------------------------------------------------------------------------- + +Notice that `uthash_free` receives two parameters. The `sz` parameter is for +convenience on embedded platforms that manage their own memory. + +Out of memory +^^^^^^^^^^^^^ +If memory allocation fails (i.e., the malloc function returned `NULL`), the +default behavior is to terminate the process by calling `exit(-1)`. This can +be modified by re-defining the `uthash_fatal` macro. + + #undef uthash_fatal + #define uthash_fatal(msg) my_fatal_function(msg); + +The fatal function should terminate the process or `longjmp` back to a safe +place. Uthash does not support "returning a failure" if memory cannot be +allocated. + +Internal events +^^^^^^^^^^^^^^^ +There is no need for the application to set these hooks or take action in +response to these events. They are mainly for diagnostic purposes. + +These two hooks are "notification" hooks which get executed if uthash is +expanding buckets, or setting the 'bucket expansion inhibited' flag. Normally +both of these hooks are undefined and thus compile away to nothing. + +Expansion ++++++++++ +There is a hook for the bucket expansion event. + +.Bucket expansion hook +---------------------------------------------------------------------------- +#include "uthash.h" + +#undef uthash_expand_fyi +#define uthash_expand_fyi(tbl) printf("expanded to %d buckets\n", tbl->num_buckets) + +... +---------------------------------------------------------------------------- + +Expansion-inhibition +++++++++++++++++++++ +This hook can be defined to code to execute in the event that uthash decides to +set the 'bucket expansion inhibited' flag. + +.Bucket expansion inhibited hook +---------------------------------------------------------------------------- +#include "uthash.h" + +#undef uthash_noexpand_fyi +#define uthash_noexpand_fyi printf("warning: bucket expansion inhibited\n"); + +... +---------------------------------------------------------------------------- + + +Debug mode +~~~~~~~~~~ +If a program that uses this hash is compiled with `-DHASH_DEBUG=1`, a special +internal consistency-checking mode is activated. In this mode, the integrity +of the whole hash is checked following every add or delete operation. This is +for debugging the uthash software only, not for use in production code. + +In the `tests/` directory, running `make debug` will run all the tests in +this mode. + +In this mode, any internal errors in the hash data structure will cause a +message to be printed to `stderr` and the program to exit. + +The `UT_hash_handle` data structure includes `next`, `prev`, `hh_next` and +`hh_prev` fields. The former two fields determine the "application" ordering +(that is, insertion order-- the order the items were added). The latter two +fields determine the "bucket chain" order. These link the `UT_hash_handles` +together in a doubly-linked list that is a bucket chain. + +Checks performed in `-DHASH_DEBUG=1` mode: + +- the hash is walked in its entirety twice: once in 'bucket' order and a + second time in 'application' order +- the total number of items encountered in both walks is checked against the + stored number +- during the walk in 'bucket' order, each item's `hh_prev` pointer is compared + for equality with the last visited item +- during the walk in 'application' order, each item's `prev` pointer is compared + for equality with the last visited item + +.Macro debugging: +******************************************************************************** +Sometimes it's difficult to interpret a compiler warning on a line which +contains a macro call. In the case of uthash, one macro can expand to dozens of +lines. In this case, it is helpful to expand the macros and then recompile. +By doing so, the warning message will refer to the exact line within the macro. + +Here is an example of how to expand the macros and then recompile. This uses the +`test1.c` program in the `tests/` subdirectory. + + gcc -E -I../src test1.c > /tmp/a.c + egrep -v '^#' /tmp/a.c > /tmp/b.c + indent /tmp/b.c + gcc -o /tmp/b /tmp/b.c + +The last line compiles the original program (test1.c) with all macros expanded. +If there was a warning, the referenced line number can be checked in `/tmp/b.c`. +******************************************************************************** + +Thread safety +~~~~~~~~~~~~~ +You can use uthash in a threaded program. But you must do the locking. Use a +read-write lock to protect against concurrent writes. It is ok to have +concurrent readers (since uthash 1.5). + +For example using pthreads you can create an rwlock like this: + + pthread_rwlock_t lock; + if (pthread_rwlock_init(&lock,NULL) != 0) fatal("can't create rwlock"); + +Then, readers must acquire the read lock before doing any `HASH_FIND` calls or +before iterating over the hash elements: + + if (pthread_rwlock_rdlock(&lock) != 0) fatal("can't get rdlock"); + HASH_FIND_INT(elts, &i, e); + pthread_rwlock_unlock(&lock); + +Writers must acquire the exclusive write lock before doing any update. Add, +delete, and sort are all updates that must be locked. + + if (pthread_rwlock_wrlock(&lock) != 0) fatal("can't get wrlock"); + HASH_DEL(elts, e); + pthread_rwlock_unlock(&lock); + +If you prefer, you can use a mutex instead of a read-write lock, but this will +reduce reader concurrency to a single thread at a time. + +An example program using uthash with a read-write lock is included in +`tests/threads/test1.c`. + +[[Macro_reference]] +Macro reference +--------------- + +Convenience macros +~~~~~~~~~~~~~~~~~~ +The convenience macros do the same thing as the generalized macros, but +require fewer arguments. + +In order to use the convenience macros, + +1. the structure's `UT_hash_handle` field must be named `hh`, and +2. for add or find, the key field must be of type `int` or `char[]` or pointer + +.Convenience macros +[width="90%",cols="10m,30m",grid="none",options="header"] +|=============================================================================== +|macro | arguments +|HASH_ADD_INT | (head, keyfield_name, item_ptr) +|HASH_FIND_INT | (head, key_ptr, item_ptr) +|HASH_ADD_STR | (head, keyfield_name, item_ptr) +|HASH_FIND_STR | (head, key_ptr, item_ptr) +|HASH_ADD_PTR | (head, keyfield_name, item_ptr) +|HASH_FIND_PTR | (head, key_ptr, item_ptr) +|HASH_DEL | (head, item_ptr) +|HASH_SORT | (head, cmp) +|HASH_COUNT | (head) +|=============================================================================== + +General macros +~~~~~~~~~~~~~~ + +These macros add, find, delete and sort the items in a hash. You need to +use the general macros if your `UT_hash_handle` is named something other +than `hh`, or if your key's data type isn't `int` or `char[]`. + +.General macros +[width="90%",cols="10m,30m",grid="none",options="header"] +|=============================================================================== +|macro | arguments +|HASH_ADD | (hh_name, head, keyfield_name, key_len, item_ptr) +|HASH_ADD_KEYPTR| (hh_name, head, key_ptr, key_len, item_ptr) +|HASH_FIND | (hh_name, head, key_ptr, key_len, item_ptr) +|HASH_DELETE | (hh_name, head, item_ptr) +|HASH_SRT | (hh_name, head, cmp) +|HASH_CNT | (hh_name, head) +|HASH_CLEAR | (hh_name, head) +|HASH_SELECT | (dst_hh_name, dst_head, src_hh_name, src_head, condition) +|HASH_ITER | (hh_name, head, item_ptr, tmp_item_ptr) +|=============================================================================== + +[NOTE] +`HASH_ADD_KEYPTR` is used when the structure contains a pointer to the +key, rather than the key itself. + + +Argument descriptions +^^^^^^^^^^^^^^^^^^^^^ +hh_name:: + name of the `UT_hash_handle` field in the structure. Conventionally called + `hh`. +head:: + the structure pointer variable which acts as the "head" of the hash. So + named because it initially points to the first item that is added to the hash. +keyfield_name:: + the name of the key field in the structure. (In the case of a multi-field + key, this is the first field of the key). If you're new to macros, it + might seem strange to pass the name of a field as a parameter. See + <<validc,note>>. +key_len:: + the length of the key field in bytes. E.g. for an integer key, this is + `sizeof(int)`, while for a string key it's `strlen(key)`. (For a + multi-field key, see the notes in this guide on calculating key length). +key_ptr:: + for `HASH_FIND`, this is a pointer to the key to look up in the hash + (since it's a pointer, you can't directly pass a literal value here). For + `HASH_ADD_KEYPTR`, this is the address of the key of the item being added. +item_ptr:: + pointer to the structure being added, deleted, or looked up, or the current + pointer during iteration. This is an input parameter for `HASH_ADD` and + `HASH_DELETE` macros, and an output parameter for `HASH_FIND` and + `HASH_ITER`. (When using `HASH_ITER` to iterate, `tmp_item_ptr` + is another variable of the same type as `item_ptr`, used internally). +cmp:: + pointer to comparison function which accepts two arguments (pointers to + items to compare) and returns an int specifying whether the first item + should sort before, equal to, or after the second item (like `strcmp`). +condition:: + a function or macro which accepts a single argument-- a void pointer to a + structure, which needs to be cast to the appropriate structure type. The + function or macro should return (or evaluate to) a non-zero value if the + structure should be "selected" for addition to the destination hash. + +// vim: set tw=80 wm=2 syntax=asciidoc: + diff --git a/src/modules/systemlib/uthash/doc/utarray.txt b/src/modules/systemlib/uthash/doc/utarray.txt new file mode 100644 index 000000000..37830f124 --- /dev/null +++ b/src/modules/systemlib/uthash/doc/utarray.txt @@ -0,0 +1,376 @@ +utarray: dynamic array macros for C +=================================== +Troy D. Hanson <thanson@users.sourceforge.net> +v1.9.5, November 2011 + +include::sflogo.txt[] +include::topnav_utarray.txt[] + +Introduction +------------ +include::toc.txt[] + +A set of general-purpose dynamic array macros for C structures are included with +uthash in `utarray.h`. To use these macros in your own C program, just +copy `utarray.h` into your source directory and use it in your programs. + + #include "utarray.h" + +The dynamic array supports basic operations such as push, pop, and erase on the +array elements. These array elements can be any simple datatype or structure. +The array <<operations,operations>> are based loosely on the C++ STL vector methods. + +Internally the dynamic array contains a contiguous memory region into which +the elements are copied. This buffer is grown as needed using `realloc` to +accomodate all the data that is pushed into it. + +Download +~~~~~~~~ +To download the `utarray.h` header file, follow the link on the +http://uthash.sourceforge.net[uthash home page]. + +BSD licensed +~~~~~~~~~~~~ +This software is made available under the +link:license.html[revised BSD license]. +It is free and open source. + +Platforms +~~~~~~~~~ +The 'utarray' macros have been tested on: + + * Linux, + * Mac OS X, + * Windows, using Visual Studio 2008 and Visual Studio 2010 + +Usage +----- + +Declaration +~~~~~~~~~~~ + +The array itself has the data type `UT_array`, regardless of the type of +elements to be stored in it. It is declared like, + + UT_array *nums; + +New and free +~~~~~~~~~~~~ +The next step is to create the array using `utarray_new`. Later when you're +done with the array, `utarray_free` will free it and all its elements. + +Push, pop, etc +~~~~~~~~~~~~~~ +The central features of the utarray involve putting elements into it, taking +them out, and iterating over them. There are several <<operations,operations>> +to pick from that deal with either single elements or ranges of elements at a +time. In the examples below we will use only the push operation to insert +elements. + +Elements +-------- + +Support for dynamic arrays of integers or strings is especially easy. These are +best shown by example: + +Integers +~~~~~~~~ +This example makes a utarray of integers, pushes 0-9 into it, then prints it. +Lastly it frees it. + +.Integer elements +------------------------------------------------------------------------------- +#include <stdio.h> +#include "utarray.h" + +int main() { + UT_array *nums; + int i, *p; + + utarray_new(nums,&ut_int_icd); + for(i=0; i < 10; i++) utarray_push_back(nums,&i); + + for(p=(int*)utarray_front(nums); + p!=NULL; + p=(int*)utarray_next(nums,p)) { + printf("%d\n",*p); + } + + utarray_free(nums); + + return 0; +} +------------------------------------------------------------------------------- + +The second argument to `utarray_push_back` is always a 'pointer' to the type +(so a literal cannot be used). So for integers, it is an `int*`. + +Strings +~~~~~~~ +In this example we make a utarray of strings, push two strings into it, print +it and free it. + +.String elements +------------------------------------------------------------------------------- +#include <stdio.h> +#include "utarray.h" + +int main() { + UT_array *strs; + char *s, **p; + + utarray_new(strs,&ut_str_icd); + + s = "hello"; utarray_push_back(strs, &s); + s = "world"; utarray_push_back(strs, &s); + p = NULL; + while ( (p=(char**)utarray_next(strs,p))) { + printf("%s\n",*p); + } + + utarray_free(strs); + + return 0; +} +------------------------------------------------------------------------------- + +In this example, since the element is a `char*`, we pass a pointer to it +(`char**`) as the second argument to `utarray_push_back`. Note that "push" makes +a copy of the source string and pushes that copy into the array. + +About UT_icd +~~~~~~~~~~~~ + +Arrays be made of any type of element, not just integers and strings. The +elements can be basic types or structures. Unless you're dealing with integers +and strings (which use pre-defined `ut_int_icd` and `ut_str_icd`), you'll need +to define a `UT_icd` helper structure. This structure contains everything that +utarray needs to initialize, copy or destruct elements. + + typedef struct { + size_t sz; + init_f *init; + ctor_f *copy; + dtor_f *dtor; + } UT_icd; + +The three function pointers `init`, `copy`, and `dtor` have these prototypes: + + typedef void (ctor_f)(void *dst, const void *src); + typedef void (dtor_f)(void *elt); + typedef void (init_f)(void *elt); + +The `sz` is just the size of the element being stored in the array. + +The `init` function will be invoked whenever utarray needs to initialize an +empty element. This only happens as a byproduct of `utarray_resize` or +`utarray_extend_back`. If `init` is `NULL`, it defaults to zero filling the +new element using memset. + +The `copy` function is used whenever an element is copied into the array. +It is invoked during `utarray_push_back`, `utarray_insert`, `utarray_inserta`, +or `utarray_concat`. If `copy` is `NULL`, it defaults to a bitwise copy using +memcpy. + +The `dtor` function is used to clean up an element that is being removed from +the array. It may be invoked due to `utarray_resize`, `utarray_pop_back`, +`utarray_erase`, `utarray_clear`, `utarray_done` or `utarray_free`. If the +elements need no cleanup upon destruction, `dtor` may be `NULL`. + +Scalar types +~~~~~~~~~~~~ + +The next example uses `UT_icd` with all its defaults to make a utarray of +`long` elements. This example pushes two longs, prints them, and frees the +array. + +.long elements +------------------------------------------------------------------------------- +#include <stdio.h> +#include "utarray.h" + +UT_icd long_icd = {sizeof(long), NULL, NULL, NULL }; + +int main() { + UT_array *nums; + long l, *p; + utarray_new(nums, &long_icd); + + l=1; utarray_push_back(nums, &l); + l=2; utarray_push_back(nums, &l); + + p=NULL; + while( (p=(long*)utarray_next(nums,p))) printf("%ld\n", *p); + + utarray_free(nums); + return 0; +} +------------------------------------------------------------------------------- + +Structures +~~~~~~~~~~ + +Structures can be used as utarray elements. If the structure requires no +special effort to initialize, copy or destruct, we can use `UT_icd` with all +its defaults. This example shows a structure that consists of two integers. Here +we push two values, print them and free the array. + +.Structure (simple) +------------------------------------------------------------------------------- +#include <stdio.h> +#include "utarray.h" + +typedef struct { + int a; + int b; +} intpair_t; + +UT_icd intpair_icd = {sizeof(intpair_t), NULL, NULL, NULL}; + +int main() { + + UT_array *pairs; + intpair_t ip, *p; + utarray_new(pairs,&intpair_icd); + + ip.a=1; ip.b=2; utarray_push_back(pairs, &ip); + ip.a=10; ip.b=20; utarray_push_back(pairs, &ip); + + for(p=(intpair_t*)utarray_front(pairs); + p!=NULL; + p=(intpair_t*)utarray_next(pairs,p)) { + printf("%d %d\n", p->a, p->b); + } + + utarray_free(pairs); + return 0; +} +------------------------------------------------------------------------------- + +The real utility of `UT_icd` is apparent when the elements of the utarray are +structures that require special work to initialize, copy or destruct. + +For example, when a structure contains pointers to related memory areas that +need to be copied when the structure is copied (and freed when the structure is +freed), we can use custom `init`, `copy`, and `dtor` members in the `UT_icd`. + +Here we take an example of a structure that contains an integer and a string. +When this element is copied (such as when an element is pushed into the array), +we want to "deep copy" the `s` pointer (so the original element and the new +element point to their own copies of `s`). When an element is destructed, we +want to "deep free" its copy of `s`. Lastly, this example is written to work +even if `s` has the value `NULL`. + +.Structure (complex) +------------------------------------------------------------------------------- +#include <stdio.h> +#include <stdlib.h> +#include "utarray.h" + +typedef struct { + int a; + char *s; +} intchar_t; + +void intchar_copy(void *_dst, const void *_src) { + intchar_t *dst = (intchar_t*)_dst, *src = (intchar_t*)_src; + dst->a = src->a; + dst->s = src->s ? strdup(src->s) : NULL; +} + +void intchar_dtor(void *_elt) { + intchar_t *elt = (intchar_t*)_elt; + if (elt->s) free(elt->s); +} + +UT_icd intchar_icd = {sizeof(intchar_t), NULL, intchar_copy, intchar_dtor}; + +int main() { + UT_array *intchars; + intchar_t ic, *p; + utarray_new(intchars, &intchar_icd); + + ic.a=1; ic.s="hello"; utarray_push_back(intchars, &ic); + ic.a=2; ic.s="world"; utarray_push_back(intchars, &ic); + + p=NULL; + while( (p=(intchar_t*)utarray_next(intchars,p))) { + printf("%d %s\n", p->a, (p->s ? p->s : "null")); + } + + utarray_free(intchars); + return 0; +} + +------------------------------------------------------------------------------- + +[[operations]] +Reference +--------- +This table lists all the utarray operations. These are loosely based on the C++ +vector class. + +Operations +~~~~~~~~~~ + +[width="100%",cols="50<m,40<",grid="none",options="none"] +|=============================================================================== +| utarray_new(UT_array *a, UT_icd *icd)| allocate a new array +| utarray_free(UT_array *a) | free an allocated array +| utarray_init(UT_array *a,UT_icd *icd)| init an array (non-alloc) +| utarray_done(UT_array *a) | dispose of an array (non-allocd) +| utarray_reserve(UT_array *a,int n) | ensure space available for 'n' more elements +| utarray_push_back(UT_array *a,void *p) | push element p onto a +| utarray_pop_back(UT_array *a) | pop last element from a +| utarray_extend_back(UT_array *a) | push empty element onto a +| utarray_len(UT_array *a) | get length of a +| utarray_eltptr(UT_array *a,int j) | get pointer of element from index +| utarray_eltidx(UT_array *a,void *e) | get index of element from pointer +| utarray_insert(UT_array *a,void *p, int j) | insert element p to index j +| utarray_inserta(UT_array *a,UT_array *w, int j) | insert array w into array a at index j +| utarray_resize(UT_array *dst,int num) | extend or shrink array to num elements +| utarray_concat(UT_array *dst,UT_array *src) | copy src to end of dst array +| utarray_erase(UT_array *a,int pos,int len) | remove len elements from a[pos]..a[pos+len-1] +| utarray_clear(UT_array *a) | clear all elements from a, setting its length to zero +| utarray_sort(UT_array *a,cmpfcn *cmp) | sort elements of a using comparison function +| utarray_find(UT_array *a,void *v, cmpfcn *cmp) | find element v in utarray (must be sorted) +| utarray_front(UT_array *a) | get first element of a +| utarray_next(UT_array *a,void *e) | get element of a following e (front if e is NULL) +| utarray_prev(UT_array *a,void *e) | get element of a before e (back if e is NULL) +| utarray_back(UT_array *a) | get last element of a +|=============================================================================== + +Notes +~~~~~ + +1. `utarray_new` and `utarray_free` are used to allocate a new array and free it, + while `utarray_init` and `utarray_done` can be used if the UT_array is already + allocated and just needs to be initialized or have its internal resources + freed. +2. `utarray_reserve` takes the "delta" of elements to reserve (not the total + desired capacity of the array-- this differs from the C++ STL "reserve" notion) +3. `utarray_sort` expects a comparison function having the usual `strcmp` -like + convention where it accepts two elements (a and b) and returns a negative + value if a precedes b, 0 if a and b sort equally, and positive if b precedes a. + This is an example of a comparison function: + + int intsort(const void *a,const void*b) { + int _a = *(int*)a; + int _b = *(int*)b; + return _a - _b; + } + +4. `utarray_find` uses a binary search to locate an element having a certain value + according to the given comparison function. The utarray must be first sorted + using the same comparison function. An example of using `utarray_find` with + a utarray of strings is included in `tests/test61.c`. + +5. A 'pointer' to a particular element (obtained using `utarray_eltptr` or + `utarray_front`, `utarray_next`, `utarray_prev`, `utarray_back`) becomes invalid whenever + another element is inserted into the utarray. This is because the internal + memory management may need to `realloc` the element storage to a new address. + For this reason, it's usually better to refer to an element by its integer + 'index' in code whose duration may include element insertion. + +// vim: set nowrap syntax=asciidoc: + diff --git a/src/modules/systemlib/uthash/doc/utlist.txt b/src/modules/systemlib/uthash/doc/utlist.txt new file mode 100644 index 000000000..fbf961c27 --- /dev/null +++ b/src/modules/systemlib/uthash/doc/utlist.txt @@ -0,0 +1,219 @@ +utlist: linked list macros for C structures +=========================================== +Troy D. Hanson <thanson@users.sourceforge.net> +v1.9.5, November 2011 + +include::sflogo.txt[] +include::topnav_utlist.txt[] + +Introduction +------------ +include::toc.txt[] + +A set of general-purpose 'linked list' macros for C structures are included with +uthash in `utlist.h`. To use these macros in your own C program, just +copy `utlist.h` into your source directory and use it in your programs. + + #include "utlist.h" + +These macros support the basic linked list operations: adding and deleting +elements, sorting them and iterating over them. + +Download +~~~~~~~~ +To download the `utlist.h` header file, follow the link on the +http://uthash.sourceforge.net[uthash home page]. + +BSD licensed +~~~~~~~~~~~~ +This software is made available under the +link:license.html[revised BSD license]. +It is free and open source. + +Platforms +~~~~~~~~~ +The 'utlist' macros have been tested on: + + * Linux, + * Mac OS X, and + * Windows, using Visual Studio 2008, Visual Studio 2010, or Cygwin/MinGW. + +Using utlist +------------ + +Types of lists +~~~~~~~~~~~~~~ +Three types of linked lists are supported: + +- *singly-linked* lists, +- *doubly-linked* lists, and +- *circular, doubly-linked* lists + +Efficiency +^^^^^^^^^^ +For all types of lists, prepending elements and deleting elements are +constant-time operations. Appending to a singly-linked list is an 'O(n)' +operation but appending to a doubly-linked list is constant time using these +macros. (This is because, in the utlist implementation of the doubly-linked +list, the head element's `prev` member points back to the list tail, even when +the list is non-circular). Sorting is an 'O(n log(n))' operation. Iteration +and searching are `O(n)` for all list types. + +List elements +~~~~~~~~~~~~~ +You can use any structure with these macros, as long as the structure +contains a `next` pointer. If you want to make a doubly-linked list, +the element also needs to have a `prev` pointer. + + typedef struct element { + char *name; + struct element *prev; /* needed for a doubly-linked list only */ + struct element *next; /* needed for singly- or doubly-linked lists */ + } element; + +You can name your structure anything. In the example above it is called `element`. +Within a particular list, all elements must be of the same type. + +List head +~~~~~~~~~ +The list head is simply a pointer to your element structure. You can name it +anything. *It must be initialized to `NULL`*. + + element *head = NULL; + +List operations +~~~~~~~~~~~~~~~ +The lists support inserting or deleting elements, sorting the elements and +iterating over them. + +[width="100%",cols="10<m,10<m,10<m",grid="cols",options="header"] +|=============================================================================== +|Singly-linked | Doubly-linked | Circular, doubly-linked +|LL_PREPEND(head,add); | DL_PREPEND(head,add); | CDL_PREPEND(head,add; +|LL_APPEND(head,add); | DL_APPEND(head,add); | +|LL_CONCAT(head1,head2); | DL_CONCAT(head1,head2); | +|LL_DELETE(head,del); | DL_DELETE(head,del); | CDL_DELETE(head,del); +|LL_SORT(head,cmp); | DL_SORT(head,cmp); | CDL_SORT(head,cmp); +|LL_FOREACH(head,elt) {...}| DL_FOREACH(head,elt) {...} | CDL_FOREACH(head,elt) {...} +|LL_FOREACH_SAFE(head,elt,tmp) {...}| DL_FOREACH_SAFE(head,elt,tmp) {...} | CDL_FOREACH_SAFE(head,elt,tmp1,tmp2) {...} +|LL_SEARCH_SCALAR(head,elt,mbr,val);| DL_SEARCH_SCALAR(head,elt,mbr,val); | CDL_SEARCH_SCALAR(head,elt,mbr,val); +|LL_SEARCH(head,elt,like,cmp);| DL_SEARCH(head,elt,like,cmp); | CDL_SEARCH(head,elt,like,cmp); +|=============================================================================== + +'Prepend' means to insert an element in front of the existing list head (if any), +changing the list head to the new element. 'Append' means to add an element at the +end of the list, so it becomes the new tail element. 'Concatenate' takes two +properly constructed lists and appends the second list to the first. (Visual +Studio 2008 does not support `LL_CONCAT` and `DL_CONCAT`, but VS2010 is ok.) + +The 'sort' operation never moves the elements in memory; rather it only adjusts +the list order by altering the `prev` and `next` pointers in each element. Also +the sort operation can change the list head to point to a new element. + +The 'foreach' operation is for easy iteration over the list from the head to the +tail. A usage example is shown below. You can of course just use the `prev` and +`next` pointers directly instead of using the 'foreach' macros. +The 'foreach_safe' operation should be used if you plan to delete any of the list +elements while iterating. + +The 'search' operation is a shortcut for iteration in search of a particular +element. It is not any faster than manually iterating and testing each element. +There are two forms: the "scalar" version searches for an element using a +simple equality test on a given structure member, while the general version takes an +element to which all others in the list will be compared using a `cmp` function. + + +The parameters shown in the table above are explained here: + +head:: + The list head (a pointer to your list element structure). +add:: + A pointer to the list element structure you are adding to the list. +del:: + A pointer to the list element structure you are deleting from the list. +elt:: + A pointer that will be assigned to each list element in succession (see + example) in the case of iteration macros; also, the output pointer from + the search macros. +like:: + An element pointer, having the same type as `elt`, for which the search macro + seeks a match (if found, the match is stored in `elt`). A match is determined + by the given `cmp` function. +cmp:: + pointer to comparison function which accepts two arguments-- these are + pointers to two element structures to be compared. The comparison function + must return an `int` that is negative, zero, or positive, which specifies + whether the first item should sort before, equal to, or after the second item, + respectively. (In other words, the same convention that is used by `strcmp`). + Note that under Visual Studio 2008 you may need to declare the two arguments + as `void *` and then cast them back to their actual types. +tmp:: + A pointer of the same type as `elt`. Used internally. Need not be initialized. +mbr:: + In the scalar search macro, the name of a member within the `elt` structure which + will be tested (using `==`) for equality with the value `val`. +val:: + In the scalar search macro, specifies the value of (of structure member + `field`) of the element being sought. + +Example +~~~~~~~ +This example program reads names from a text file (one name per line), and +appends each name to a doubly-linked list. Then it sorts and prints them. + +.A doubly-linked list +-------------------------------------------------------------------------------- +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include "utlist.h" + +#define BUFLEN 20 + +typedef struct el { + char bname[BUFLEN]; + struct el *next, *prev; +} el; + +int namecmp(el *a, el *b) { + return strcmp(a->bname,b->bname); +} + +el *head = NULL; /* important- initialize to NULL! */ + +int main(int argc, char *argv[]) { + el *name, *elt, *tmp, etmp; + + char linebuf[BUFLEN]; + FILE *file; + + if ( (file = fopen( "test11.dat", "r" )) == NULL ) { + perror("can't open: "); + exit(-1); + } + + while (fgets(linebuf,BUFLEN,file) != NULL) { + if ( (name = (el*)malloc(sizeof(el))) == NULL) exit(-1); + strncpy(name->bname,linebuf,BUFLEN); + DL_APPEND(head, name); + } + DL_SORT(head, namecmp); + DL_FOREACH(head,elt) printf("%s", elt->bname); + + memcpy(&etmp.bname, "WES\n", 5); + DL_SEARCH(head,elt,&etmp,namecmp); + if (elt) printf("found %s\n", elt->bname); + + /* now delete each element, use the safe iterator */ + DL_FOREACH_SAFE(head,elt,tmp) { + DL_DELETE(head,elt); + } + + fclose(file); + + return 0; +} +-------------------------------------------------------------------------------- + +// vim: set nowrap syntax=asciidoc: + diff --git a/src/modules/systemlib/uthash/doc/utstring.txt b/src/modules/systemlib/uthash/doc/utstring.txt new file mode 100644 index 000000000..abfdcd107 --- /dev/null +++ b/src/modules/systemlib/uthash/doc/utstring.txt @@ -0,0 +1,178 @@ +utstring: dynamic string macros for C +===================================== +Troy D. Hanson <thanson@users.sourceforge.net> +v1.9.5, November 2011 + +include::sflogo.txt[] +include::topnav_utstring.txt[] + +Introduction +------------ +include::toc.txt[] + +A set of very basic dynamic string macros for C programs are included with +uthash in `utstring.h`. To use these macros in your own C program, just +copy `utstring.h` into your source directory and use it in your programs. + + #include "utstring.h" + +The dynamic string supports basic operations such as inserting data (including +binary data-- despite its name, utstring is not limited to string content), +concatenation, getting the length and content, and clearing it. The string +<<operations,operations>> are listed below. + +Download +~~~~~~~~ +To download the `utstring.h` header file, follow the link on the +http://uthash.sourceforge.net[uthash home page]. + +BSD licensed +~~~~~~~~~~~~ +This software is made available under the +link:license.html[revised BSD license]. +It is free and open source. + +Platforms +~~~~~~~~~ +The 'utstring' macros have been tested on: + + * Linux, + * Windows, using Visual Studio 2008 and Visual Studio 2010 + +Usage +----- + +Declaration +~~~~~~~~~~~ + +The dynamic string itself has the data type `UT_string`. It is declared like, + + UT_string *str; + +New and free +~~~~~~~~~~~~ +The next step is to create the string using `utstring_new`. Later when you're +done with it, `utstring_free` will free it and all its content. + +Manipulation +~~~~~~~~~~~~ +The `utstring_printf` or `utstring_bincpy` operations insert (copy) data into +the string. To concatenate one utstring to another, use `utstring_concat`. To +clear the content of the string, use `utstring_clear`. The length of the string +is available from `utstring_len`, and its content from `utstring_body`. This +evaluates to a `char*`. The buffer it points to is always null-terminated. +So, it can be used directly with external functions that expect a string. +This automatic null terminator is not counted in the length of the string. + +Samples +~~~~~~~ + +These examples show how to use utstring. + +.Sample 1 +------------------------------------------------------------------------------- +#include <stdio.h> +#include "utstring.h" + +int main() { + UT_string *s; + + utstring_new(s); + utstring_printf(s, "hello world!" ); + printf("%s\n", utstring_body(s)); + + utstring_free(s); + return 0; +} +------------------------------------------------------------------------------- + +The next example is meant to demonstrate that printf 'appends' to the string. +It also shows concatenation. + +.Sample 2 +------------------------------------------------------------------------------- +#include <stdio.h> +#include "utstring.h" + +int main() { + UT_string *s, *t; + + utstring_new(s); + utstring_new(t); + + utstring_printf(s, "hello " ); + utstring_printf(s, "world " ); + + utstring_printf(t, "hi " ); + utstring_printf(t, "there " ); + + utstring_concat(s, t); + printf("length: %u\n", utstring_len(s)); + printf("%s\n", utstring_body(s)); + + utstring_free(s); + utstring_free(t); + return 0; +} +------------------------------------------------------------------------------- + +The last example shows how binary data can be inserted into the string. It also +clears the string and prints new data into it. + +.Sample 3 +------------------------------------------------------------------------------- +#include <stdio.h> +#include "utstring.h" + +int main() { + UT_string *s; + char binary[] = "\xff\xff"; + + utstring_new(s); + utstring_bincpy(s, binary, sizeof(binary)); + printf("length is %u\n", utstring_len(s)); + + utstring_clear(s); + utstring_printf(s,"number %d", 10); + printf("%s\n", utstring_body(s)); + + utstring_free(s); + return 0; +} +------------------------------------------------------------------------------- + +[[operations]] +Reference +--------- +These are the utstring operations. + +Operations +~~~~~~~~~~ + +[width="100%",cols="50<m,40<",grid="none",options="none"] +|=============================================================================== +| utstring_new(s) | allocate a new utstring +| utstring_renew(s) | allocate a new utstring (if s is `NULL`) otherwise clears it +| utstring_free(s) | free an allocated utstring +| utstring_init(s) | init a utstring (non-alloc) +| utstring_done(s) | dispose of a utstring (non-allocd) +| utstring_printf(s,fmt,...) | printf into a utstring (appends) +| utstring_bincpy(s,bin,len) | insert binary data of length len (appends) +| utstring_concat(dst,src) | concatenate src utstring to end of dst utstring +| utstring_clear(s) | clear the content of s (setting its length to 0) +| utstring_len(s) | obtain the length of s as an unsigned integer +| utstring_body(s) | get `char*` to body of s (buffer is always null-terminated) +|=============================================================================== + +Notes +~~~~~ + +1. `utstring_new` and `utstring_free` are used to allocate a new string and free it, + while `utstring_init` and `utstring_done` can be used if the UT_string is already + allocated and just needs to be initialized or have its internal resources + freed. +2. `utstring_printf` is actually a function defined statically in `utstring.h` + rather than a macro. + +// vim: set nowrap syntax=asciidoc: + |