aboutsummaryrefslogtreecommitdiff
path: root/src/modules/systemlib/uthash/doc/userguide.txt
diff options
context:
space:
mode:
Diffstat (limited to 'src/modules/systemlib/uthash/doc/userguide.txt')
-rw-r--r--src/modules/systemlib/uthash/doc/userguide.txt1682
1 files changed, 1682 insertions, 0 deletions
diff --git a/src/modules/systemlib/uthash/doc/userguide.txt b/src/modules/systemlib/uthash/doc/userguide.txt
new file mode 100644
index 000000000..3e65a52fc
--- /dev/null
+++ b/src/modules/systemlib/uthash/doc/userguide.txt
@@ -0,0 +1,1682 @@
+uthash User Guide
+=================
+Troy D. Hanson <thanson@users.sourceforge.net>
+v1.9.6, April 2012
+
+include::sflogo.txt[]
+include::topnav.txt[]
+
+A hash in C
+-----------
+include::toc.txt[]
+
+This document is written for C programmers. Since you're reading this, chances
+are that you know a hash is used for looking up items using a key. In scripting
+languages like Perl, hashes are used all the time. In C, hashes don't exist in
+the language itself. This software provides a hash table for C structures.
+
+What can it do?
+~~~~~~~~~~~~~~~~~
+This software supports these operations on items in a hash table:
+
+1. add
+2. find
+3. delete
+4. count
+5. iterate
+6. sort
+7. select
+
+Is it fast?
+~~~~~~~~~~~
+Add, find and delete are normally constant-time operations. This is influenced
+by your key domain and the hash function.
+
+This hash aims to be minimalistic and efficient. It's around 900 lines of C.
+It inlines automatically because it's implemented as macros. It's fast as long
+as the hash function is suited to your keys. You can use the default hash
+function, or easily compare performance and choose from among several other
+<<hash_functions,built-in hash functions>>.
+
+Is it a library?
+~~~~~~~~~~~~~~~~
+No, it's just a single header file: `uthash.h`. All you need to do is copy
+the header file into your project, and:
+
+ #include "uthash.h"
+
+Since uthash is a header file only, there is no library code to link against.
+
+C/C++ and platforms
+~~~~~~~~~~~~~~~~~~~
+This software can be used in C and C++ programs. It has been tested on:
+
+ * Linux
+ * Mac OS X
+ * Windows using Visual Studio 2008 and 2010
+ * Solaris
+ * OpenBSD
+ * FreeBSD
+
+Test suite
+^^^^^^^^^^
+To run the test suite, enter the `tests` directory. Then,
+
+ * on Unix platforms, run `make`
+ * on Windows, run the "do_tests_win32.cmd" batch file. (You may edit the
+ batch file if your Visual Studio is installed in a non-standard location).
+
+BSD licensed
+~~~~~~~~~~~~
+This software is made available under the
+link:license.html[revised BSD license].
+It is free and open source.
+
+Obtaining uthash
+~~~~~~~~~~~~~~~~
+Please follow the link to download on the
+http://uthash.sourceforge.net[uthash website] at http://uthash.sourceforge.net.
+
+A number of platforms include uthash in their package repositories. For example,
+Debian/Ubuntu users may simply `aptitude install uthash-dev`.
+
+Getting help
+~~~~~~~~~~~~
+Feel free to mailto:tdh@tkhanson.net[email the author] at
+tdh@tkhanson.net.
+
+Resources
+~~~~~~~~~
+Users of uthash may wish to follow the news feed for information about new
+releases. Also, there are some extra bonus headers included with uthash.
+
+News::
+ The author has a news feed for http://troydhanson.wordpress.com/[software updates] image:img/rss.png[(RSS)].
+Extras included with uthash::
+ uthash ships with these "extras"-- independent headers similar to uthash.
+ First link:utlist.html[utlist.h] provides linked list macros for C structures.
+ Second, link:utarray.html[utarray.h] implements dynamic arrays using macros.
+ Third, link:utstring.html[utstring.h] implements a basic dynamic string.
+Other software::
+ Other open-source software by the author is listed at http://tkhanson.net.
+
+Who's using it?
+~~~~~~~~~~~~~~~
+Since releasing uthash in 2006, it has been downloaded thousands of times,
+incorporated into commercial software, academic research, and into other
+open-source software.
+
+Your structure
+--------------
+
+In uthash, a hash table is comprised of structures. Each structure represents a
+key-value association. One or more of the structure fields constitute the key.
+The structure pointer itself is the value.
+
+.Defining a structure that can be hashed
+----------------------------------------------------------------------
+#include "uthash.h"
+
+struct my_struct {
+ int id; /* key */
+ char name[10];
+ UT_hash_handle hh; /* makes this structure hashable */
+};
+----------------------------------------------------------------------
+
+Note that, in uthash, your structure will never be moved or copied into another
+location when you add it into a hash table. This means that you can keep other
+data structures that safely point to your structure-- regardless of whether you
+add or delete it from a hash table during your program's lifetime.
+
+The key
+~~~~~~~
+There are no restrictions on the data type or name of the key field. The key
+can also comprise multiple contiguous fields, having any names and data types.
+
+.Any data type... really?
+*****************************************************************************
+Yes, your key and structure can have any data type. Unlike function calls with
+fixed prototypes, uthash consists of macros-- whose arguments are untyped-- and
+thus able to work with any type of structure or key.
+*****************************************************************************
+
+Unique keys
+^^^^^^^^^^^
+As with any hash, every item must have a unique key. Your application must
+enforce key uniqueness. Before you add an item to the hash table, you must
+first know (if in doubt, check!) that the key is not already in use. You
+can check whether a key already exists in the hash table using `HASH_FIND`.
+
+The hash handle
+~~~~~~~~~~~~~~~
+The `UT_hash_handle` field must be present in your structure. It is used for
+the internal bookkeeping that makes the hash work. It does not require
+initialization. It can be named anything, but you can simplify matters by
+naming it `hh`. This allows you to use the easier "convenience" macros to add,
+find and delete items.
+
+A word about memory
+~~~~~~~~~~~~~~~~~~~
+Some have asked how uthash cleans up its internal memory. The answer is simple:
+'when you delete the final item' from a hash table, uthash releases all the
+internal memory associated with that hash table, and sets its pointer to NULL.
+
+Hash operations
+---------------
+
+This section introduces the uthash macros by example. For a more succinct
+listing, see <<Macro_reference,Macro Reference>>.
+
+.Convenience vs. general macros:
+*****************************************************************************
+The uthash macros fall into two categories. The 'convenience' macros can be used
+with integer, pointer or string keys (and require that you chose the conventional
+name `hh` for the `UT_hash_handle` field). The convenience macros take fewer
+arguments than the general macros, making their usage a bit simpler for these
+common types of keys.
+
+The 'general' macros can be used for any types of keys, or for multi-field keys,
+or when the `UT_hash_handle` has been named something other than `hh`. These
+macros take more arguments and offer greater flexibility in return. But if the
+convenience macros suit your needs, use them-- your code will be more readable.
+*****************************************************************************
+
+Declare the hash
+~~~~~~~~~~~~~~~~
+Your hash must be declared as a `NULL`-initialized pointer to your structure.
+
+ struct my_struct *users = NULL; /* important! initialize to NULL */
+
+Add item
+~~~~~~~~
+Allocate and initialize your structure as you see fit. The only aspect
+of this that matters to uthash is that your key must be initialized to
+a unique value. Then call `HASH_ADD`. (Here we use the convenience macro
+`HASH_ADD_INT`, which offers simplified usage for keys of type `int`).
+
+.Add an item to a hash
+----------------------------------------------------------------------
+void add_user(int user_id, char *name) {
+ struct my_struct *s;
+
+ s = malloc(sizeof(struct my_struct));
+ s->id = user_id;
+ strcpy(s->name, name);
+ HASH_ADD_INT( users, id, s ); /* id: name of key field */
+}
+----------------------------------------------------------------------
+
+The first parameter to `HASH_ADD_INT` is the hash table, and the
+second parameter is the 'name' of the key field. Here, this is `id`. The
+last parameter is a pointer to the structure being added.
+
+[[validc]]
+.Wait.. the field name is a parameter?
+*******************************************************************************
+If you find it strange that `id`, which is the 'name of a field' in the
+structure, can be passed as a parameter, welcome to the world of macros. Don't
+worry- the C preprocessor expands this to valid C code.
+*******************************************************************************
+
+Key must not be modified while in-use
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Once a structure has been added to the hash, do not change the value of its key.
+Instead, delete the item from the hash, change the key, and then re-add it.
+
+Passing the hash pointer into functions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+In the example above `users` is a global variable, but what if the caller wanted
+to pass the hash pointer 'into' the `add_user` function? At first glance it would
+appear that you could simply pass `users` as an argument, but that won't work
+right.
+
+ /* bad */
+ void add_user(struct my_struct *users, int user_id, char *name) {
+ ...
+ HASH_ADD_INT( users, id, s );
+ }
+
+You really need to pass 'a pointer' to the hash pointer:
+
+ /* good */
+ void add_user(struct my_struct **users, int user_id, char *name) { ...
+ ...
+ HASH_ADD_INT( *users, id, s );
+ }
+
+Note that we dereferenced the pointer in the `HASH_ADD` also.
+
+The reason it's necessary to deal with a pointer to the hash pointer is simple:
+the hash macros modify it (in other words, they modify the 'pointer itself' not
+just what it points to).
+
+Find item
+~~~~~~~~~
+To look up a structure in a hash, you need its key. Then call `HASH_FIND`.
+(Here we use the convenience macro `HASH_FIND_INT` for keys of type `int`).
+
+.Find a structure using its key
+----------------------------------------------------------------------
+struct my_struct *find_user(int user_id) {
+ struct my_struct *s;
+
+ HASH_FIND_INT( users, &user_id, s ); /* s: output pointer */
+ return s;
+}
+----------------------------------------------------------------------
+
+Here, the hash table is `users`, and `&user_id` points to the key (an integer
+in this case). Last, `s` is the 'output' variable of `HASH_FIND_INT`. The
+final result is that `s` points to the structure with the given key, or
+is `NULL` if the key wasn't found in the hash.
+
+[NOTE]
+The middle argument is a 'pointer' to the key. You can't pass a literal key
+value to `HASH_FIND`. Instead assign the literal value to a variable, and pass
+a pointer to the variable.
+
+
+Delete item
+~~~~~~~~~~~
+To delete a structure from a hash, you must have a pointer to it. (If you only
+have the key, first do a `HASH_FIND` to get the structure pointer).
+
+.Delete an item from a hash
+----------------------------------------------------------------------
+void delete_user(struct my_struct *user) {
+ HASH_DEL( users, user); /* user: pointer to deletee */
+ free(user); /* optional; it's up to you! */
+}
+----------------------------------------------------------------------
+
+Here again, `users` is the hash table, and `user` is a pointer to the
+structure we want to remove from the hash.
+
+uthash never frees your structure
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Deleting a structure just removes it from the hash table-- it doesn't `free`
+it. The choice of when to free your structure is entirely up to you; uthash
+will never free your structure.
+
+Delete can change the pointer
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The hash table pointer (which initially points to the first item added to the
+hash) can change in response to `HASH_DEL` (i.e. if you delete the first item
+in the hash table).
+
+Iterative deletion
+^^^^^^^^^^^^^^^^^^
+The `HASH_ITER` macro is a deletion-safe iteration construct which expands
+to a simple 'for' loop.
+
+.Delete all items from a hash
+----------------------------------------------------------------------
+void delete_all() {
+ struct my_struct *current_user, *tmp;
+
+ HASH_ITER(hh, users, current_user, tmp) {
+ HASH_DEL(users,current_user); /* delete; users advances to next */
+ free(current_user); /* optional- if you want to free */
+ }
+}
+----------------------------------------------------------------------
+
+All-at-once deletion
+^^^^^^^^^^^^^^^^^^^^
+If you only want to delete all the items, but not free them or do any
+per-element clean up, you can do this more efficiently in a single operation:
+
+ HASH_CLEAR(hh,users);
+
+Afterward, the list head (here, `users`) will be set to `NULL`.
+
+Count items
+~~~~~~~~~~~
+
+The number of items in the hash table can be obtained using `HASH_COUNT`:
+
+.Count of items in the hash table
+----------------------------------------------------------------------
+unsigned int num_users;
+num_users = HASH_COUNT(users);
+printf("there are %u users\n", num_users);
+----------------------------------------------------------------------
+
+Incidentally, this works even the list (`users`, here) is `NULL`, in
+which case the count is 0.
+
+Iterating and sorting
+~~~~~~~~~~~~~~~~~~~~~
+
+You can loop over the items in the hash by starting from the beginning and
+following the `hh.next` pointer.
+
+.Iterating over all the items in a hash
+----------------------------------------------------------------------
+void print_users() {
+ struct my_struct *s;
+
+ for(s=users; s != NULL; s=s->hh.next) {
+ printf("user id %d: name %s\n", s->id, s->name);
+ }
+}
+----------------------------------------------------------------------
+
+There is also an `hh.prev` pointer you could use to iterate backwards through
+the hash, starting from any known item.
+
+[[deletesafe]]
+Deletion-safe iteration
+^^^^^^^^^^^^^^^^^^^^^^^
+In the example above, it would not be safe to delete and free `s` in the body
+of the 'for' loop, (because `s` is derefenced each time the loop iterates).
+This is easy to rewrite correctly (by copying the `s->hh.next` pointer to a
+temporary variable 'before' freeing `s`), but it comes up often enough that a
+deletion-safe iteration macro, `HASH_ITER`, is included. It expands to a
+`for`-loop header. Here is how it could be used to rewrite the last example:
+
+ struct my_struct *s, *tmp;
+
+ HASH_ITER(hh, users, s, tmp) {
+ printf("user id %d: name %s\n", s->id, s->name);
+ /* ... it is safe to delete and free s here */
+ }
+
+.A hash is also a doubly-linked list.
+*******************************************************************************
+Iterating backward and forward through the items in the hash is possible
+because of the `hh.prev` and `hh.next` fields. All the items in the hash can
+be reached by repeatedly following these pointers, thus the hash is also a
+doubly-linked list.
+*******************************************************************************
+
+If you're using uthash in a C++ program, you need an extra cast on the `for`
+iterator, e.g., `s=(struct my_struct*)s->hh.next`.
+
+Sorted iteration
+^^^^^^^^^^^^^^^^
+The items in the hash are, by default, traversed in the order they were added
+("insertion order") when you follow the `hh.next` pointer. But you can sort
+the items into a new order using `HASH_SORT`. E.g.,
+
+ HASH_SORT( users, name_sort );
+
+The second argument is a pointer to a comparison function. It must accept two
+arguments which are pointers to two items to compare. Its return value should
+be less than zero, zero, or greater than zero, if the first item sorts before,
+equal to, or after the second item, respectively. (Just like `strcmp`).
+
+.Sorting the items in the hash
+----------------------------------------------------------------------
+int name_sort(struct my_struct *a, struct my_struct *b) {
+ return strcmp(a->name,b->name);
+}
+
+int id_sort(struct my_struct *a, struct my_struct *b) {
+ return (a->id - b->id);
+}
+
+void sort_by_name() {
+ HASH_SORT(users, name_sort);
+}
+
+void sort_by_id() {
+ HASH_SORT(users, id_sort);
+}
+----------------------------------------------------------------------
+
+When the items in the hash are sorted, the first item may change position. In
+the example above, `users` may point to a different structure after calling
+`HASH_SORT`.
+
+A complete example
+~~~~~~~~~~~~~~~~~~
+
+We'll repeat all the code and embellish it with a `main()` function to form a
+working example.
+
+If this code was placed in a file called `example.c` in the same directory as
+`uthash.h`, it could be compiled and run like this:
+
+ cc -o example example.c
+ ./example
+
+Follow the prompts to try the program.
+
+.A complete program
+----------------------------------------------------------------------
+#include <stdio.h> /* gets */
+#include <stdlib.h> /* atoi, malloc */
+#include <string.h> /* strcpy */
+#include "uthash.h"
+
+struct my_struct {
+ int id; /* key */
+ char name[10];
+ UT_hash_handle hh; /* makes this structure hashable */
+};
+
+struct my_struct *users = NULL;
+
+void add_user(int user_id, char *name) {
+ struct my_struct *s;
+
+ s = (struct my_struct*)malloc(sizeof(struct my_struct));
+ s->id = user_id;
+ strcpy(s->name, name);
+ HASH_ADD_INT( users, id, s ); /* id: name of key field */
+}
+
+struct my_struct *find_user(int user_id) {
+ struct my_struct *s;
+
+ HASH_FIND_INT( users, &user_id, s ); /* s: output pointer */
+ return s;
+}
+
+void delete_user(struct my_struct *user) {
+ HASH_DEL( users, user); /* user: pointer to deletee */
+ free(user);
+}
+
+void delete_all() {
+ struct my_struct *current_user, *tmp;
+
+ HASH_ITER(hh, users, current_user, tmp) {
+ HASH_DEL(users,current_user); /* delete it (users advances to next) */
+ free(current_user); /* free it */
+ }
+}
+
+void print_users() {
+ struct my_struct *s;
+
+ for(s=users; s != NULL; s=(struct my_struct*)(s->hh.next)) {
+ printf("user id %d: name %s\n", s->id, s->name);
+ }
+}
+
+int name_sort(struct my_struct *a, struct my_struct *b) {
+ return strcmp(a->name,b->name);
+}
+
+int id_sort(struct my_struct *a, struct my_struct *b) {
+ return (a->id - b->id);
+}
+
+void sort_by_name() {
+ HASH_SORT(users, name_sort);
+}
+
+void sort_by_id() {
+ HASH_SORT(users, id_sort);
+}
+
+int main(int argc, char *argv[]) {
+ char in[10];
+ int id=1, running=1;
+ struct my_struct *s;
+ unsigned num_users;
+
+ while (running) {
+ printf("1. add user\n");
+ printf("2. find user\n");
+ printf("3. delete user\n");
+ printf("4. delete all users\n");
+ printf("5. sort items by name\n");
+ printf("6. sort items by id\n");
+ printf("7. print users\n");
+ printf("8. count users\n");
+ printf("9. quit\n");
+ gets(in);
+ switch(atoi(in)) {
+ case 1:
+ printf("name?\n");
+ add_user(id++, gets(in));
+ break;
+ case 2:
+ printf("id?\n");
+ s = find_user(atoi(gets(in)));
+ printf("user: %s\n", s ? s->name : "unknown");
+ break;
+ case 3:
+ printf("id?\n");
+ s = find_user(atoi(gets(in)));
+ if (s) delete_user(s);
+ else printf("id unknown\n");
+ break;
+ case 4:
+ delete_all();
+ break;
+ case 5:
+ sort_by_name();
+ break;
+ case 6:
+ sort_by_id();
+ break;
+ case 7:
+ print_users();
+ break;
+ case 8:
+ num_users=HASH_COUNT(users);
+ printf("there are %u users\n", num_users);
+ break;
+ case 9:
+ running=0;
+ break;
+ }
+ }
+
+ delete_all(); /* free any structures */
+ return 0;
+}
+----------------------------------------------------------------------
+
+This program is included in the distribution in `tests/example.c`. You can run
+`make example` in that directory to compile it easily.
+
+Standard key types
+------------------
+This section goes into specifics of how to work with different kinds of keys.
+You can use nearly any type of key-- integers, strings, pointers, structures, etc.
+
+[NOTE]
+.A note about float
+================================================================================
+You can use floating point keys. This comes with the same caveats as with any
+program that tests floating point equality. In other words, even the tiniest
+difference in two floating point numbers makes them distinct keys.
+================================================================================
+
+Integer keys
+~~~~~~~~~~~~
+The preceding examples demonstrated use of integer keys. To recap, use the
+convenience macros `HASH_ADD_INT` and `HASH_FIND_INT` for structures with
+integer keys. (The other operations such as `HASH_DELETE` and `HASH_SORT` are
+the same for all types of keys).
+
+String keys
+~~~~~~~~~~~
+If your structure has a string key, the operations to use depend on whether your
+structure 'points to' the key (`char *`) or the string resides `within` the
+structure (`char a[10]`). *This distinction is important*. As we'll see below,
+you need to use `HASH_ADD_KEYPTR` when your structure 'points' to a key (that is,
+the key itself is 'outside' of the structure); in contrast, use `HASH_ADD_STR`
+for a string key that is contained *within* your structure.
+
+[NOTE]
+.char[ ] vs. char*
+================================================================================
+The string is 'within' the structure in the first example below-- `name` is a
+`char[10]` field. In the second example, the key is 'outside' of the
+structure-- `name` is a `char *`. So the first example uses `HASH_ADD_STR` but
+the second example uses `HASH_ADD_KEYPTR`. For information on this macro, see
+the <<Macro_reference,Macro reference>>.
+================================================================================
+
+String 'within' structure
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.A string-keyed hash (string within structure)
+----------------------------------------------------------------------
+#include <string.h> /* strcpy */
+#include <stdlib.h> /* malloc */
+#include <stdio.h> /* printf */
+#include "uthash.h"
+
+struct my_struct {
+ char name[10]; /* key (string is WITHIN the structure) */
+ int id;
+ UT_hash_handle hh; /* makes this structure hashable */
+};
+
+
+int main(int argc, char *argv[]) {
+ const char **n, *names[] = { "joe", "bob", "betty", NULL };
+ struct my_struct *s, *tmp, *users = NULL;
+ int i=0;
+
+ for (n = names; *n != NULL; n++) {
+ s = (struct my_struct*)malloc(sizeof(struct my_struct));
+ strncpy(s->name, *n,10);
+ s->id = i++;
+ HASH_ADD_STR( users, name, s );
+ }
+
+ HASH_FIND_STR( users, "betty", s);
+ if (s) printf("betty's id is %d\n", s->id);
+
+ /* free the hash table contents */
+ HASH_ITER(hh, users, s, tmp) {
+ HASH_DEL(users, s);
+ free(s);
+ }
+ return 0;
+}
+----------------------------------------------------------------------
+
+This example is included in the distribution in `tests/test15.c`. It prints:
+
+ betty's id is 2
+
+String 'pointer' in structure
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Now, here is the same example but using a `char *` key instead of `char [ ]`:
+
+.A string-keyed hash (structure points to string)
+----------------------------------------------------------------------
+#include <string.h> /* strcpy */
+#include <stdlib.h> /* malloc */
+#include <stdio.h> /* printf */
+#include "uthash.h"
+
+struct my_struct {
+ const char *name; /* key */
+ int id;
+ UT_hash_handle hh; /* makes this structure hashable */
+};
+
+
+int main(int argc, char *argv[]) {
+ const char **n, *names[] = { "joe", "bob", "betty", NULL };
+ struct my_struct *s, *tmp, *users = NULL;
+ int i=0;
+
+ for (n = names; *n != NULL; n++) {
+ s = (struct my_struct*)malloc(sizeof(struct my_struct));
+ s->name = *n;
+ s->id = i++;
+ HASH_ADD_KEYPTR( hh, users, s->name, strlen(s->name), s );
+ }
+
+ HASH_FIND_STR( users, "betty", s);
+ if (s) printf("betty's id is %d\n", s->id);
+
+ /* free the hash table contents */
+ HASH_ITER(hh, users, s, tmp) {
+ HASH_DEL(users, s);
+ free(s);
+ }
+ return 0;
+}
+----------------------------------------------------------------------
+
+This example is included in `tests/test40.c`.
+
+Pointer keys
+~~~~~~~~~~~~
+Your key can be a pointer. To be very clear, this means the 'pointer itself'
+can be the key (in contrast, if the thing 'pointed to' is the key, this is a
+different use case handled by `HASH_ADD_KEYPTR`).
+
+Here is a simple example where a structure has a pointer member, called `key`.
+
+.A pointer key
+----------------------------------------------------------------------
+#include <stdio.h>
+#include <stdlib.h>
+#include "uthash.h"
+
+typedef struct {
+ void *key;
+ int i;
+ UT_hash_handle hh;
+} el_t;
+
+el_t *hash = NULL;
+char *someaddr = NULL;
+
+int main() {
+ el_t *d;
+ el_t *e = (el_t*)malloc(sizeof(el_t));
+ if (!e) return -1;
+ e->key = (void*)someaddr;
+ e->i = 1;
+ HASH_ADD_PTR(hash,key,e);
+ HASH_FIND_PTR(hash, &someaddr, d);
+ if (d) printf("found\n");
+
+ /* release memory */
+ HASH_DEL(hash,e);
+ free(e);
+ return 0;
+}
+----------------------------------------------------------------------
+
+This example is included in `tests/test57.c`. Note that the end of the program
+deletes the element out of the hash, (and since no more elements remain in the
+hash), uthash releases its internal memory.
+
+Structure keys
+~~~~~~~~~~~~~~
+Your key field can have any data type. To uthash, it is just a sequence of
+bytes. Therefore, even a nested structure can be used as a key. We'll use the
+general macros `HASH_ADD` and `HASH_FIND` to demonstrate.
+
+NOTE: Structures contain padding (wasted internal space used to fulfill
+alignment requirements for the members of the structure). These padding bytes
+'must be zeroed' before adding an item to the hash or looking up an item.
+Therefore always zero the whole structure before setting the members of
+interest. The example below does this-- see the two calls to `memset`.
+
+.A key which is a structure
+----------------------------------------------------------------------
+#include <stdlib.h>
+#include <stdio.h>
+#include "uthash.h"
+
+typedef struct {
+ char a;
+ int b;
+} record_key_t;
+
+typedef struct {
+ record_key_t key;
+ /* ... other data ... */
+ UT_hash_handle hh;
+} record_t;
+
+int main(int argc, char *argv[]) {
+ record_t l, *p, *r, *tmp, *records = NULL;
+
+ r = (record_t*)malloc( sizeof(record_t) );
+ memset(r, 0, sizeof(record_t));
+ r->key.a = 'a';
+ r->key.b = 1;
+ HASH_ADD(hh, records, key, sizeof(record_key_t), r);
+
+ memset(&l, 0, sizeof(record_t));
+ l.key.a = 'a';
+ l.key.b = 1;
+ HASH_FIND(hh, records, &l.key, sizeof(record_key_t), p);
+
+ if (p) printf("found %c %d\n", p->key.a, p->key.b);
+
+ HASH_ITER(hh, records, p, tmp) {
+ HASH_DEL(records, p);
+ free(p);
+ }
+ return 0;
+}
+
+----------------------------------------------------------------------
+
+This usage is nearly the same as use of a compound key explained below.
+
+Note that the general macros require the name of the `UT_hash_handle` to be
+passed as the first argument (here, this is `hh`). The general macros are
+documented in <<Macro_reference,Macro Reference>>.
+
+Advanced Topics
+---------------
+
+Compound keys
+~~~~~~~~~~~~~
+Your key can even comprise multiple contiguous fields.
+
+.A multi-field key
+----------------------------------------------------------------------
+#include <stdlib.h> /* malloc */
+#include <stddef.h> /* offsetof */
+#include <stdio.h> /* printf */
+#include <string.h> /* memset */
+#include "uthash.h"
+
+#define UTF32 1
+
+typedef struct {
+ UT_hash_handle hh;
+ int len;
+ char encoding; /* these two fields */
+ int text[]; /* comprise the key */
+} msg_t;
+
+typedef struct {
+ char encoding;
+ int text[];
+} lookup_key_t;
+
+int main(int argc, char *argv[]) {
+ unsigned keylen;
+ msg_t *msg, *tmp, *msgs = NULL;
+ lookup_key_t *lookup_key;
+
+ int beijing[] = {0x5317, 0x4eac}; /* UTF-32LE for 北京 */
+
+ /* allocate and initialize our structure */
+ msg = (msg_t*)malloc( sizeof(msg_t) + sizeof(beijing) );
+ memset(msg, 0, sizeof(msg_t)+sizeof(beijing)); /* zero fill */
+ msg->len = sizeof(beijing);
+ msg->encoding = UTF32;
+ memcpy(msg->text, beijing, sizeof(beijing));
+
+ /* calculate the key length including padding, using formula */
+ keylen = offsetof(msg_t, text) /* offset of last key field */
+ + sizeof(beijing) /* size of last key field */
+ - offsetof(msg_t, encoding); /* offset of first key field */
+
+ /* add our structure to the hash table */
+ HASH_ADD( hh, msgs, encoding, keylen, msg);
+
+ /* look it up to prove that it worked :-) */
+ msg=NULL;
+
+ lookup_key = (lookup_key_t*)malloc(sizeof(*lookup_key) + sizeof(beijing));
+ memset(lookup_key, 0, sizeof(*lookup_key) + sizeof(beijing));
+ lookup_key->encoding = UTF32;
+ memcpy(lookup_key->text, beijing, sizeof(beijing));
+ HASH_FIND( hh, msgs, &lookup_key->encoding, keylen, msg );
+ if (msg) printf("found \n");
+ free(lookup_key);
+
+ HASH_ITER(hh, msgs, msg, tmp) {
+ HASH_DEL(msgs, msg);
+ free(msg);
+ }
+ return 0;
+}
+----------------------------------------------------------------------
+
+This example is included in the distribution in `tests/test22.c`.
+
+If you use multi-field keys, recognize that the compiler pads adjacent fields
+(by inserting unused space between them) in order to fulfill the alignment
+requirement of each field. For example a structure containing a `char` followed
+by an `int` will normally have 3 "wasted" bytes of padding after the char, in
+order to make the `int` field start on a multiple-of-4 address (4 is the length
+of the int).
+
+.Calculating the length of a multi-field key:
+*******************************************************************************
+To determine the key length when using a multi-field key, you must include any
+intervening structure padding the compiler adds for alignment purposes.
+
+An easy way to calculate the key length is to use the `offsetof` macro from
+`<stddef.h>`. The formula is:
+
+ key length = offsetof(last_key_field)
+ + sizeof(last_key_field)
+ - offsetof(first_key_field)
+
+In the example above, the `keylen` variable is set using this formula.
+*******************************************************************************
+
+When dealing with a multi-field key, you must zero-fill your structure before
+`HASH_ADD`'ing it to a hash table, or using its fields in a `HASH_FIND` key.
+
+In the previous example, `memset` is used to initialize the structure by
+zero-filling it. This zeroes out any padding between the key fields. If we
+didn't zero-fill the structure, this padding would contain random values. The
+random values would lead to `HASH_FIND` failures; as two "identical" keys will
+appear to mismatch if there are any differences within their padding.
+
+[[multilevel]]
+Multi-level hash tables
+~~~~~~~~~~~~~~~~~~~~~~~
+A multi-level hash table arises when each element of a hash table contains its
+own secondary hash table. There can be any number of levels. In a scripting
+language you might see:
+
+ $items{bob}{age}=37
+
+The C program below builds this example in uthash: the hash table is called
+`items`. It contains one element (`bob`) whose own hash table contains one
+element (`age`) with value 37. No special functions are necessary to build
+a multi-level hash table.
+
+While this example represents both levels (`bob` and `age`) using the same
+structure, it would also be fine to use two different structure definitions.
+It would also be fine if there were three or more levels instead of two.
+
+.Multi-level hash table
+----------------------------------------------------------------------
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include "uthash.h"
+
+/* hash of hashes */
+typedef struct item {
+ char name[10];
+ struct item *sub;
+ int val;
+ UT_hash_handle hh;
+} item_t;
+
+item_t *items=NULL;
+
+int main(int argc, char *argvp[]) {
+ item_t *item1, *item2, *tmp1, *tmp2;
+
+ /* make initial element */
+ item_t *i = malloc(sizeof(*i));
+ strcpy(i->name, "bob");
+ i->sub = NULL;
+ i->val = 0;
+ HASH_ADD_STR(items, name, i);
+
+ /* add a sub hash table off this element */
+ item_t *s = malloc(sizeof(*s));
+ strcpy(s->name, "age");
+ s->sub = NULL;
+ s->val = 37;
+ HASH_ADD_STR(i->sub, name, s);
+
+ /* iterate over hash elements */
+ HASH_ITER(hh, items, item1, tmp1) {
+ HASH_ITER(hh, item1->sub, item2, tmp2) {
+ printf("$items{%s}{%s} = %d\n", item1->name, item2->name, item2->val);
+ }
+ }
+
+ /* clean up both hash tables */
+ HASH_ITER(hh, items, item1, tmp1) {
+ HASH_ITER(hh, item1->sub, item2, tmp2) {
+ HASH_DEL(item1->sub, item2);
+ free(item2);
+ }
+ HASH_DEL(items, item1);
+ free(item1);
+ }
+
+ return 0;
+}
+----------------------------------------------------------------------
+The example above is included in `tests/test59.c`.
+
+[[multihash]]
+Items in several hash tables
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A structure can be added to more than one hash table. A few reasons you might do
+this include:
+
+- each hash table may use an alternative key;
+- each hash table may have its own sort order;
+- or you might simply use multiple hash tables for grouping purposes. E.g.,
+ you could have users in an `admin_users` and a `users` hash table.
+
+Your structure needs to have a `UT_hash_handle` field for each hash table to
+which it might be added. You can name them anything. E.g.,
+
+ UT_hash_handle hh1, hh2;
+
+Items with alternative keys
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+You might create a hash table keyed on an ID field, and another hash table keyed
+on username (if usernames are unique). You can add the same user structure to
+both hash tables (without duplication of the structure), allowing lookup of a
+user structure by their name or ID. The way to achieve this is to have a
+separate `UT_hash_handle` for each hash to which the structure may be added.
+
+.A structure with two alternative keys
+----------------------------------------------------------------------
+struct my_struct {
+ int id; /* usual key */
+ char username[10]; /* alternative key */
+ UT_hash_handle hh1; /* handle for first hash table */
+ UT_hash_handle hh2; /* handle for second hash table */
+};
+----------------------------------------------------------------------
+
+In the example above, the structure can now be added to two separate hash
+tables. In one hash, `id` is its key, while in the other hash, `username` is
+its key. (There is no requirement that the two hashes have different key
+fields. They could both use the same key, such as `id`).
+
+Notice the structure has two hash handles (`hh1` and `hh2`). In the code
+below, notice that each hash handle is used exclusively with a particular hash
+table. (`hh1` is always used with the `users_by_id` hash, while `hh2` is
+always used with the `users_by_name` hash table).
+
+.Two keys on a structure
+----------------------------------------------------------------------
+ struct my_struct *users_by_id = NULL, *users_by_name = NULL, *s;
+ int i;
+ char *name;
+
+ s = malloc(sizeof(struct my_struct));
+ s->id = 1;
+ strcpy(s->username, "thanson");
+
+ /* add the structure to both hash tables */
+ HASH_ADD(hh1, users_by_id, id, sizeof(int), s);
+ HASH_ADD(hh2, users_by_name, username, strlen(s->username), s);
+
+ /* lookup user by ID in the "users_by_id" hash table */
+ i=1;
+ HASH_FIND(hh1, users_by_id, &i, sizeof(int), s);
+ if (s) printf("found id %d: %s\n", i, s->username);
+
+ /* lookup user by username in the "users_by_name" hash table */
+ name = "thanson";
+ HASH_FIND(hh2, users_by_name, name, strlen(name), s);
+ if (s) printf("found user %s: %d\n", name, s->id);
+----------------------------------------------------------------------
+
+
+Several sort orders
+~~~~~~~~~~~~~~~~~~~
+It comes as no suprise that two hash tables can have different sort orders, but
+this fact can also be used advantageously to sort the 'same items' in several
+ways. This is based on the ability to store a structure in several hash tables.
+
+Extending the previous example, suppose we have many users. We have added each
+user structure to the `users_by_id` hash table and the `users_by_name` hash table.
+(To reiterate, this is done without the need to have two copies of each structure).
+Now we can define two sort functions, then use `HASH_SRT`.
+
+ int sort_by_id(struct my_struct *a, struct my_struct *b) {
+ if (a->id == b->id) return 0;
+ return (a->id < b->id) ? -1 : 1;
+ }
+
+ int sort_by_name(struct my_struct *a, struct my_struct *b) {
+ return strcmp(a->username,b->username);
+ }
+
+ HASH_SRT(hh1, users_by_id, sort_by_id);
+ HASH_SRT(hh2, users_by_name, sort_by_name);
+
+Now iterating over the items in `users_by_id` will traverse them in id-order
+while, naturally, iterating over `users_by_name` will traverse them in
+name-order. The items are fully forward-and-backward linked in each order.
+So even for one set of users, we might store them in two hash tables to provide
+easy iteration in two different sort orders.
+
+Bloom filter (faster misses)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Programs that generate a fair miss rate (`HASH_FIND` that result in `NULL`) may
+benefit from the built-in Bloom filter support. This is disabled by default,
+because programs that generate only hits would incur a slight penalty from it.
+Also, programs that do deletes should not use the Bloom filter. While the
+program would operate correctly, deletes diminish the benefit of the filter.
+To enable the Bloom filter, simply compile with `-DHASH_BLOOM=n` like:
+
+ -DHASH_BLOOM=27
+
+where the number can be any value up to 32 which determines the amount of memory
+used by the filter, as shown below. Using more memory makes the filter more
+accurate and has the potential to speed up your program by making misses bail
+out faster.
+
+.Bloom filter sizes for selected values of n
+[width="50%",cols="10m,30",grid="none",options="header"]
+|=====================================================================
+| n | Bloom filter size (per hash table)
+| 16 | 8 kilobytes
+| 20 | 128 kilobytes
+| 24 | 2 megabytes
+| 28 | 32 megabytes
+| 32 | 512 megabytes
+|=====================================================================
+
+Bloom filters are only a performance feature; they do not change the results of
+hash operations in any way. The only way to gauge whether or not a Bloom filter
+is right for your program is to test it. Reasonable values for the size of the
+Bloom filter are 16-32 bits.
+
+Select
+~~~~~~
+An experimental 'select' operation is provided that inserts those items from a
+source hash that satisfy a given condition into a destination hash. This
+insertion is done with somewhat more efficiency than if this were using
+`HASH_ADD`, namely because the hash function is not recalculated for keys of the
+selected items. This operation does not remove any items from the source hash.
+Rather the selected items obtain dual presence in both hashes. The destination
+hash may already have items in it; the selected items are added to it. In order
+for a structure to be usable with `HASH_SELECT`, it must have two or more hash
+handles. (As described <<multihash,here>>, a structure can exist in many
+hash tables at the same time; it must have a separate hash handle for each one).
+
+ user_t *users=NULL, *admins=NULL; /* two hash tables */
+
+ typedef struct {
+ int id;
+ UT_hash_handle hh; /* handle for users hash */
+ UT_hash_handle ah; /* handle for admins hash */
+ } user_t;
+
+Now suppose we have added some users, and want to select just the administrator
+users who have id's less than 1024.
+
+ #define is_admin(x) (((user_t*)x)->id < 1024)
+ HASH_SELECT(ah,admins,hh,users,is_admin);
+
+The first two parameters are the 'destination' hash handle and hash table, the
+second two parameters are the 'source' hash handle and hash table, and the last
+parameter is the 'select condition'. Here we used a macro `is_admin()` but we
+could just as well have used a function.
+
+ int is_admin(void *userv) {
+ user_t *user = (user_t*)userv;
+ return (user->id < 1024) ? 1 : 0;
+ }
+
+If the select condition always evaluates to true, this operation is
+essentially a 'merge' of the source hash into the destination hash. Of course,
+the source hash remains unchanged under any use of `HASH_SELECT`. It only adds
+items to the destination hash selectively.
+
+The two hash handles must differ. An example of using `HASH_SELECT` is included
+in `tests/test36.c`.
+
+
+[[hash_functions]]
+Built-in hash functions
+~~~~~~~~~~~~~~~~~~~~~~~
+Internally, a hash function transforms a key into a bucket number. You don't
+have to take any action to use the default hash function, currently Jenkin's.
+
+Some programs may benefit from using another of the built-in hash functions.
+There is a simple analysis utility included with uthash to help you determine
+if another hash function will give you better performance.
+
+You can use a different hash function by compiling your program with
+`-DHASH_FUNCTION=HASH_xyz` where `xyz` is one of the symbolic names listed
+below. E.g.,
+
+ cc -DHASH_FUNCTION=HASH_BER -o program program.c
+
+.Built-in hash functions
+[width="50%",cols="^5m,20",grid="none",options="header"]
+|===============================================================================
+|Symbol | Name
+|JEN | Jenkins (default)
+|BER | Bernstein
+|SAX | Shift-Add-Xor
+|OAT | One-at-a-time
+|FNV | Fowler/Noll/Vo
+|SFH | Paul Hsieh
+|MUR | MurmurHash v3 (see note)
+|===============================================================================
+
+[NOTE]
+.MurmurHash
+================================================================================
+A special symbol must be defined if you intend to use MurmurHash. To use it, add
+`-DHASH_USING_NO_STRICT_ALIASING` to your `CFLAGS`. And, if you are using
+the gcc compiler with optimization, add `-fno-strict-aliasing` to your `CFLAGS`.
+================================================================================
+
+Which hash function is best?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+You can easily determine the best hash function for your key domain. To do so,
+you'll need to run your program once in a data-collection pass, and then run
+the collected data through an included analysis utility.
+
+First you must build the analysis utility. From the top-level directory,
+
+ cd tests/
+ make
+
+We'll use `test14.c` to demonstrate the data-collection and analysis steps
+(here using `sh` syntax to redirect file descriptor 3 to a file):
+
+.Using keystats
+--------------------------------------------------------------------------------
+% cc -DHASH_EMIT_KEYS=3 -I../src -o test14 test14.c
+% ./test14 3>test14.keys
+% ./keystats test14.keys
+fcn ideal% #items #buckets dup% fl add_usec find_usec del-all usec
+--- ------ ---------- ---------- ----- -- ---------- ---------- ------------
+SFH 91.6% 1219 256 0% ok 92 131 25
+FNV 90.3% 1219 512 0% ok 107 97 31
+SAX 88.7% 1219 512 0% ok 111 109 32
+OAT 87.2% 1219 256 0% ok 99 138 26
+JEN 86.7% 1219 256 0% ok 87 130 27
+BER 86.2% 1219 256 0% ok 121 129 27
+--------------------------------------------------------------------------------
+
+[NOTE]
+The number 3 in `-DHASH_EMIT_KEYS=3` is a file descriptor. Any file descriptor
+that your program doesn't use for its own purposes can be used instead of 3.
+The data-collection mode enabled by `-DHASH_EMIT_KEYS=x` should not be used in
+production code.
+
+Usually, you should just pick the first hash function that is listed. Here, this
+is `SFH`. This is the function that provides the most even distribution for
+your keys. If several have the same `ideal%`, then choose the fastest one
+according to the `find_usec` column.
+
+keystats column reference
+^^^^^^^^^^^^^^^^^^^^^^^^^
+fcn::
+ symbolic name of hash function
+ideal%::
+ The percentage of items in the hash table which can be looked up within an
+ ideal number of steps. (Further explained below).
+#items::
+ the number of keys that were read in from the emitted key file
+#buckets::
+ the number of buckets in the hash after all the keys were added
+dup%::
+ the percent of duplicate keys encountered in the emitted key file.
+ Duplicates keys are filtered out to maintain key uniqueness. (Duplicates
+ are normal. For example, if the application adds an item to a hash,
+ deletes it, then re-adds it, the key is written twice to the emitted file.)
+flags::
+ this is either `ok`, or `nx` (noexpand) if the expansion inhibited flag is
+ set, described in <<expansion,Expansion internals>>. It is not recommended
+ to use a hash function that has the `noexpand` flag set.
+add_usec::
+ the clock time in microseconds required to add all the keys to a hash
+find_usec::
+ the clock time in microseconds required to look up every key in the hash
+del-all usec::
+ the clock time in microseconds required to delete every item in the hash
+
+[[ideal]]
+ideal%
+^^^^^^
+
+.What is ideal%?
+*****************************************************************************
+The 'n' items in a hash are distributed into 'k' buckets. Ideally each bucket
+would contain an equal share '(n/k)' of the items. In other words, the maximum
+linear position of any item in a bucket chain would be 'n/k' if every bucket is
+equally used. If some buckets are overused and others are underused, the
+overused buckets will contain items whose linear position surpasses 'n/k'.
+Such items are considered non-ideal.
+
+As you might guess, `ideal%` is the percentage of ideal items in the hash. These
+items have favorable linear positions in their bucket chains. As `ideal%`
+approaches 100%, the hash table approaches constant-time lookup performance.
+*****************************************************************************
+
+[[hashscan]]
+hashscan
+~~~~~~~~
+NOTE: This utility is only available on Linux, and on FreeBSD (8.1 and up).
+
+A utility called `hashscan` is included in the `tests/` directory. It
+is built automatically when you run `make` in that directory. This tool
+examines a running process and reports on the uthash tables that it finds in
+that program's memory. It can also save the keys from each table in a format
+that can be fed into `keystats`.
+
+Here is an example of using `hashscan`. First ensure that it is built:
+
+ cd tests/
+ make
+
+Since `hashscan` needs a running program to inspect, we'll start up a simple
+program that makes a hash table and then sleeps as our test subject:
+
+ ./test_sleep &
+ pid: 9711
+
+Now that we have a test program, let's run `hashscan` on it:
+
+ ./hashscan 9711
+ Address ideal items buckets mc fl bloom/sat fcn keys saved to
+ ------------------ ----- -------- -------- -- -- --------- --- -------------
+ 0x862e038 81% 10000 4096 11 ok 16 14% JEN
+
+If we wanted to copy out all its keys for external analysis using `keystats`,
+add the `-k` flag:
+
+ ./hashscan -k 9711
+ Address ideal items buckets mc fl bloom/sat fcn keys saved to
+ ------------------ ----- -------- -------- -- -- --------- --- -------------
+ 0x862e038 81% 10000 4096 11 ok 16 14% JEN /tmp/9711-0.key
+
+Now we could run `./keystats /tmp/9711-0.key` to analyze which hash function
+has the best characteristics on this set of keys.
+
+hashscan column reference
+^^^^^^^^^^^^^^^^^^^^^^^^^
+Address::
+ virtual address of the hash table
+ideal::
+ The percentage of items in the table which can be looked up within an ideal
+ number of steps. See <<ideal>> in the `keystats` section.
+items::
+ number of items in the hash table
+buckets::
+ number of buckets in the hash table
+mc::
+ the maximum chain length found in the hash table (uthash usually tries to
+ keep fewer than 10 items in each bucket, or in some cases a multiple of 10)
+fl::
+ flags (either `ok`, or `NX` if the expansion-inhibited flag is set)
+bloom/sat::
+ if the hash table uses a Bloom filter, this is the size (as a power of two)
+ of the filter (e.g. 16 means the filter is 2^16 bits in size). The second
+ number is the "saturation" of the bits expressed as a percentage. The lower
+ the percentage, the more potential benefit to identify cache misses quickly.
+fcn::
+ symbolic name of hash function
+keys saved to::
+ file to which keys were saved, if any
+
+.How hashscan works
+*****************************************************************************
+When hashscan runs, it attaches itself to the target process, which suspends
+the target process momentarily. During this brief suspension, it scans the
+target's virtual memory for the signature of a uthash hash table. It then
+checks if a valid hash table structure accompanies the signature and reports
+what it finds. When it detaches, the target process resumes running normally.
+The hashscan is performed "read-only"-- the target process is not modified.
+Since hashscan is analyzing a momentary snapshot of a running process, it may
+return different results from one run to another.
+*****************************************************************************
+
+[[expansion]]
+Expansion internals
+~~~~~~~~~~~~~~~~~~~
+Internally this hash manages the number of buckets, with the goal of having
+enough buckets so that each one contains only a small number of items.
+
+.Why does the number of buckets matter?
+********************************************************************************
+When looking up an item by its key, this hash scans linearly through the items
+in the appropriate bucket. In order for the linear scan to run in constant
+time, the number of items in each bucket must be bounded. This is accomplished
+by increasing the number of buckets as needed.
+********************************************************************************
+
+Normal expansion
+^^^^^^^^^^^^^^^^
+This hash attempts to keep fewer than 10 items in each bucket. When an item is
+added that would cause a bucket to exceed this number, the number of buckets in
+the hash is doubled and the items are redistributed into the new buckets. In an
+ideal world, each bucket will then contain half as many items as it did before.
+
+Bucket expansion occurs automatically and invisibly as needed. There is
+no need for the application to know when it occurs.
+
+Per-bucket expansion threshold
+++++++++++++++++++++++++++++++
+Normally all buckets share the same threshold (10 items) at which point bucket
+expansion is triggered. During the process of bucket expansion, uthash can
+adjust this expansion-trigger threshold on a per-bucket basis if it sees that
+certain buckets are over-utilized.
+
+When this threshold is adjusted, it goes from 10 to a multiple of 10 (for that
+particular bucket). The multiple is based on how many times greater the actual
+chain length is than the ideal length. It is a practical measure to reduce
+excess bucket expansion in the case where a hash function over-utilizes a few
+buckets but has good overall distribution. However, if the overall distribution
+gets too bad, uthash changes tactics.
+
+Inhibited expansion
+^^^^^^^^^^^^^^^^^^^
+You usually don't need to know or worry about this, particularly if you used
+the `keystats` utility during development to select a good hash for your keys.
+
+A hash function may yield an uneven distribution of items across the buckets.
+In moderation this is not a problem. Normal bucket expansion takes place as
+the chain lengths grow. But when significant imbalance occurs (because the hash
+function is not well suited to the key domain), bucket expansion may be
+ineffective at reducing the chain lengths.
+
+Imagine a very bad hash function which always puts every item in bucket 0. No
+matter how many times the number of buckets is doubled, the chain length of
+bucket 0 stays the same. In a situation like this, the best behavior is to
+stop expanding, and accept O(n) lookup performance. This is what uthash
+does. It degrades gracefully if the hash function is ill-suited to the keys.
+
+If two consecutive bucket expansions yield `ideal%` values below 50%, uthash
+inhibits expansion for that hash table. Once set, the 'bucket expansion
+inhibited' flag remains in effect as long as the hash has items in it.
+Inhibited expansion may cause `HASH_FIND` to exhibit worse than constant-time
+performance.
+
+Hooks
+~~~~~
+You don't need to use these hooks- they are only here if you want to modify
+the behavior of uthash. Hooks can be used to change how uthash allocates
+memory, and to run code in response to certain internal events.
+
+malloc/free
+^^^^^^^^^^^
+By default this hash implementation uses `malloc` and `free` to manage memory.
+If your application uses its own custom allocator, this hash can use them too.
+
+.Specifying alternate memory management functions
+----------------------------------------------------------------------------
+#include "uthash.h"
+
+/* undefine the defaults */
+#undef uthash_malloc
+#undef uthash_free
+
+/* re-define, specifying alternate functions */
+#define uthash_malloc(sz) my_malloc(sz)
+#define uthash_free(ptr,sz) my_free(ptr)
+
+...
+----------------------------------------------------------------------------
+
+Notice that `uthash_free` receives two parameters. The `sz` parameter is for
+convenience on embedded platforms that manage their own memory.
+
+Out of memory
+^^^^^^^^^^^^^
+If memory allocation fails (i.e., the malloc function returned `NULL`), the
+default behavior is to terminate the process by calling `exit(-1)`. This can
+be modified by re-defining the `uthash_fatal` macro.
+
+ #undef uthash_fatal
+ #define uthash_fatal(msg) my_fatal_function(msg);
+
+The fatal function should terminate the process or `longjmp` back to a safe
+place. Uthash does not support "returning a failure" if memory cannot be
+allocated.
+
+Internal events
+^^^^^^^^^^^^^^^
+There is no need for the application to set these hooks or take action in
+response to these events. They are mainly for diagnostic purposes.
+
+These two hooks are "notification" hooks which get executed if uthash is
+expanding buckets, or setting the 'bucket expansion inhibited' flag. Normally
+both of these hooks are undefined and thus compile away to nothing.
+
+Expansion
++++++++++
+There is a hook for the bucket expansion event.
+
+.Bucket expansion hook
+----------------------------------------------------------------------------
+#include "uthash.h"
+
+#undef uthash_expand_fyi
+#define uthash_expand_fyi(tbl) printf("expanded to %d buckets\n", tbl->num_buckets)
+
+...
+----------------------------------------------------------------------------
+
+Expansion-inhibition
+++++++++++++++++++++
+This hook can be defined to code to execute in the event that uthash decides to
+set the 'bucket expansion inhibited' flag.
+
+.Bucket expansion inhibited hook
+----------------------------------------------------------------------------
+#include "uthash.h"
+
+#undef uthash_noexpand_fyi
+#define uthash_noexpand_fyi printf("warning: bucket expansion inhibited\n");
+
+...
+----------------------------------------------------------------------------
+
+
+Debug mode
+~~~~~~~~~~
+If a program that uses this hash is compiled with `-DHASH_DEBUG=1`, a special
+internal consistency-checking mode is activated. In this mode, the integrity
+of the whole hash is checked following every add or delete operation. This is
+for debugging the uthash software only, not for use in production code.
+
+In the `tests/` directory, running `make debug` will run all the tests in
+this mode.
+
+In this mode, any internal errors in the hash data structure will cause a
+message to be printed to `stderr` and the program to exit.
+
+The `UT_hash_handle` data structure includes `next`, `prev`, `hh_next` and
+`hh_prev` fields. The former two fields determine the "application" ordering
+(that is, insertion order-- the order the items were added). The latter two
+fields determine the "bucket chain" order. These link the `UT_hash_handles`
+together in a doubly-linked list that is a bucket chain.
+
+Checks performed in `-DHASH_DEBUG=1` mode:
+
+- the hash is walked in its entirety twice: once in 'bucket' order and a
+ second time in 'application' order
+- the total number of items encountered in both walks is checked against the
+ stored number
+- during the walk in 'bucket' order, each item's `hh_prev` pointer is compared
+ for equality with the last visited item
+- during the walk in 'application' order, each item's `prev` pointer is compared
+ for equality with the last visited item
+
+.Macro debugging:
+********************************************************************************
+Sometimes it's difficult to interpret a compiler warning on a line which
+contains a macro call. In the case of uthash, one macro can expand to dozens of
+lines. In this case, it is helpful to expand the macros and then recompile.
+By doing so, the warning message will refer to the exact line within the macro.
+
+Here is an example of how to expand the macros and then recompile. This uses the
+`test1.c` program in the `tests/` subdirectory.
+
+ gcc -E -I../src test1.c > /tmp/a.c
+ egrep -v '^#' /tmp/a.c > /tmp/b.c
+ indent /tmp/b.c
+ gcc -o /tmp/b /tmp/b.c
+
+The last line compiles the original program (test1.c) with all macros expanded.
+If there was a warning, the referenced line number can be checked in `/tmp/b.c`.
+********************************************************************************
+
+Thread safety
+~~~~~~~~~~~~~
+You can use uthash in a threaded program. But you must do the locking. Use a
+read-write lock to protect against concurrent writes. It is ok to have
+concurrent readers (since uthash 1.5).
+
+For example using pthreads you can create an rwlock like this:
+
+ pthread_rwlock_t lock;
+ if (pthread_rwlock_init(&lock,NULL) != 0) fatal("can't create rwlock");
+
+Then, readers must acquire the read lock before doing any `HASH_FIND` calls or
+before iterating over the hash elements:
+
+ if (pthread_rwlock_rdlock(&lock) != 0) fatal("can't get rdlock");
+ HASH_FIND_INT(elts, &i, e);
+ pthread_rwlock_unlock(&lock);
+
+Writers must acquire the exclusive write lock before doing any update. Add,
+delete, and sort are all updates that must be locked.
+
+ if (pthread_rwlock_wrlock(&lock) != 0) fatal("can't get wrlock");
+ HASH_DEL(elts, e);
+ pthread_rwlock_unlock(&lock);
+
+If you prefer, you can use a mutex instead of a read-write lock, but this will
+reduce reader concurrency to a single thread at a time.
+
+An example program using uthash with a read-write lock is included in
+`tests/threads/test1.c`.
+
+[[Macro_reference]]
+Macro reference
+---------------
+
+Convenience macros
+~~~~~~~~~~~~~~~~~~
+The convenience macros do the same thing as the generalized macros, but
+require fewer arguments.
+
+In order to use the convenience macros,
+
+1. the structure's `UT_hash_handle` field must be named `hh`, and
+2. for add or find, the key field must be of type `int` or `char[]` or pointer
+
+.Convenience macros
+[width="90%",cols="10m,30m",grid="none",options="header"]
+|===============================================================================
+|macro | arguments
+|HASH_ADD_INT | (head, keyfield_name, item_ptr)
+|HASH_FIND_INT | (head, key_ptr, item_ptr)
+|HASH_ADD_STR | (head, keyfield_name, item_ptr)
+|HASH_FIND_STR | (head, key_ptr, item_ptr)
+|HASH_ADD_PTR | (head, keyfield_name, item_ptr)
+|HASH_FIND_PTR | (head, key_ptr, item_ptr)
+|HASH_DEL | (head, item_ptr)
+|HASH_SORT | (head, cmp)
+|HASH_COUNT | (head)
+|===============================================================================
+
+General macros
+~~~~~~~~~~~~~~
+
+These macros add, find, delete and sort the items in a hash. You need to
+use the general macros if your `UT_hash_handle` is named something other
+than `hh`, or if your key's data type isn't `int` or `char[]`.
+
+.General macros
+[width="90%",cols="10m,30m",grid="none",options="header"]
+|===============================================================================
+|macro | arguments
+|HASH_ADD | (hh_name, head, keyfield_name, key_len, item_ptr)
+|HASH_ADD_KEYPTR| (hh_name, head, key_ptr, key_len, item_ptr)
+|HASH_FIND | (hh_name, head, key_ptr, key_len, item_ptr)
+|HASH_DELETE | (hh_name, head, item_ptr)
+|HASH_SRT | (hh_name, head, cmp)
+|HASH_CNT | (hh_name, head)
+|HASH_CLEAR | (hh_name, head)
+|HASH_SELECT | (dst_hh_name, dst_head, src_hh_name, src_head, condition)
+|HASH_ITER | (hh_name, head, item_ptr, tmp_item_ptr)
+|===============================================================================
+
+[NOTE]
+`HASH_ADD_KEYPTR` is used when the structure contains a pointer to the
+key, rather than the key itself.
+
+
+Argument descriptions
+^^^^^^^^^^^^^^^^^^^^^
+hh_name::
+ name of the `UT_hash_handle` field in the structure. Conventionally called
+ `hh`.
+head::
+ the structure pointer variable which acts as the "head" of the hash. So
+ named because it initially points to the first item that is added to the hash.
+keyfield_name::
+ the name of the key field in the structure. (In the case of a multi-field
+ key, this is the first field of the key). If you're new to macros, it
+ might seem strange to pass the name of a field as a parameter. See
+ <<validc,note>>.
+key_len::
+ the length of the key field in bytes. E.g. for an integer key, this is
+ `sizeof(int)`, while for a string key it's `strlen(key)`. (For a
+ multi-field key, see the notes in this guide on calculating key length).
+key_ptr::
+ for `HASH_FIND`, this is a pointer to the key to look up in the hash
+ (since it's a pointer, you can't directly pass a literal value here). For
+ `HASH_ADD_KEYPTR`, this is the address of the key of the item being added.
+item_ptr::
+ pointer to the structure being added, deleted, or looked up, or the current
+ pointer during iteration. This is an input parameter for `HASH_ADD` and
+ `HASH_DELETE` macros, and an output parameter for `HASH_FIND` and
+ `HASH_ITER`. (When using `HASH_ITER` to iterate, `tmp_item_ptr`
+ is another variable of the same type as `item_ptr`, used internally).
+cmp::
+ pointer to comparison function which accepts two arguments (pointers to
+ items to compare) and returns an int specifying whether the first item
+ should sort before, equal to, or after the second item (like `strcmp`).
+condition::
+ a function or macro which accepts a single argument-- a void pointer to a
+ structure, which needs to be cast to the appropriate structure type. The
+ function or macro should return (or evaluate to) a non-zero value if the
+ structure should be "selected" for addition to the destination hash.
+
+// vim: set tw=80 wm=2 syntax=asciidoc:
+