Skip to content

Commit ac2936d

Browse files
committed
prep for public release
add conventional documentation files (README INSTALL COPYING etc) apply Copyright notice and LGPL 2.1 license header to *.[ch] t/*.c
1 parent 500fc76 commit ac2936d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+1616
-63
lines changed

CHANGELOG

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
mcdb change log
2+
3+
mcdb v0.01 (2011.08.25)
4+
- WFM: works-for-me! (alpha)
5+
Initial release for feedback and to get over the hurdle of initial release.
6+
More work needs to be done testing.

COPYING

+458
Large diffs are not rendered by default.

INSTALL

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
mcdb installation
2+
3+
$ make
4+
$ make install
5+
$ make test
6+
7+
8+
Using mcdb for nsswitch databases: (optional)
9+
10+
$ mkdir /etc/mcdb
11+
$ nss_mcdbctl
12+
$ vi /etc/nsswitch.conf
13+
passwd: mcdb
14+
shadow: mcdb
15+
group: mcdb
16+
hosts: mcdb dns
17+
protocols: mcdb
18+
services: mcdb
19+
rpc: mcdb
20+
networks: mcdb
21+
# (save /etc/nsswitch.conf and test things out in another window)
22+
$ getent passwd root
23+
root:x:0:0:root:/root:/bin/bash
24+
$ getent group root
25+
root:x:0:root
26+
27+
Please note that changes to any databases require re-running nss_mcdbctl.
28+
While I have been running the above configuration on my laptop for > 1 year,
29+
nss_mcdbctl still needs to be run for changes made by other users to passwd,
30+
shadow, and group databases. Since I have not yet written pam code to be
31+
triggered when those files are changed, you might leave
32+
shadow: files
33+
if others might change passwords and you do not automate running nss_mcdbctl
34+
so that password changes take effect immediately.
35+
36+
You may choose to disable nscd and test if performance increases.
37+
38+
39+
See NOTES for more technical (and probably less readable) details and features.
40+
See Makefile for various overrides, such as alternate installation location.

NOTES

+148
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
mcdb - fast, reliable, simple code to create and read constant databases
2+
3+
mcdb enhancements to cdb (on which mcdb is based; see References section below)
4+
- updated to C99 and POSIX.1-2001 (not available/portable when djb wrote cdb)
5+
- optimized for mmap access to constant db (and avoid double buffering)
6+
- redesigned for use in threaded programs (thread-safe interface available)
7+
- convenience routines to check for updated constant db and to refresh mmap
8+
- support cdb > 4 GB with 64-bit program (required to mmap() mcdb > 4 GB)
9+
- 64-bit safe (for use in 64-bit programs)
10+
11+
Advantages over external database
12+
- performance: better; avoids context switch to external database process
13+
Advantages over specialized hash map
14+
- generic, reusable
15+
- maintained (created and verified) externally from process (less overhead)
16+
- shared across processes (though shared-memory could be used for hash map)
17+
- read-only (though memory pages could also be marked read-only for hash map)
18+
Disadvantages to specialized hash map
19+
- performance: slightly slower than specialized hash map
20+
Disadvantages to djb cdb
21+
- mmap requires address space be available into which to mmap the const db
22+
(i.e. large const db might fail to mmap into 32-bit process)
23+
- mmap page alignment requirements and use of address space limits const db
24+
max size when created by 32-bit process. Sizes approaching 4 GB may fail.
25+
- arbitrary limit of each key or data set to (INT_MAX - 8 bytes; almost 2 GB)
26+
(djb cdb doc states there is no limit besides cdb fitting into 4 GB)
27+
(writev() on some platforms in 32-bit exe might also have 2 GB limit)
28+
29+
Incompatibilities with djb cdb
30+
- padding added at the end of key,value data to 16-byte align hash tables
31+
(incompatible with djb cdbdump)
32+
- initial table and hash tables have 8-byte values instead of 4-byte values
33+
in order to support cdb > 4 GB. cdb uses 24 bytes per record plus 2048,
34+
whereas mcdb uses 24 bytes per record plus 4096 when data section < 4 GB,
35+
and mcdb uses 40 bytes per record plus 4096 when data section >= 4 GB.
36+
- packing of integral lengths into char strings is done big-endian for
37+
performance in packing/unpacking integer data in 4-byte (or better)
38+
aligned addresses. (incompatible with all djb cdb* tools and cdb's)
39+
(djb cdb documents all 32-bit quantities stored in little-endian form)
40+
Memory load latency is limiting factor, not the x86 assembly instruction
41+
to convert uint32_t to and from big-endian (when data is 4-byte aligned).
42+
43+
Limitations
44+
- 2 billion keys
45+
As long as djb hash is 32-bit, mcdb_make.c limits number of hash keys to
46+
2 billion. cdb handles hash collisions, but there is a small expense each
47+
collision. As the key space becomes denser within the 2 billion, there is
48+
greater chance of collisions. Input strings also affect this probability,
49+
as do the sizes of the hash tables.
50+
- process must mmap() entire mcdb
51+
Each mcdb is mmap()d in its entirety into the address space. For 32-bit
52+
programs that means there is a 4 GB limit on size of mcdb, minus address
53+
space used by the program (including stack, heap, shared libraries, shmat
54+
and other mmaps, etc). Compile and link 64-bit to remove this limitation.
55+
56+
57+
References
58+
----------
59+
Dan Bernstein's (djb) reference implementation of cdb (public domain)
60+
http://cr.yp.to/cdb.html
61+
62+
63+
64+
There is plenty more information which will eventually (hopefully) be here.
65+
In the meantime, here are some snippets, and more will come after the
66+
initial release, as time allows.
67+
68+
69+
70+
Technical Asides
71+
----------------
72+
73+
74+
mcdbctl creates databases with limited permissions (0400)
75+
---------------------------------------------------------
76+
mcdbctl creates new databases with limited permissions (0400), important for
77+
security when creating mcdb for sensitive data, such as /etc/shadow data.
78+
mcdbctl recreates existing databases and preserves the permission modes that
79+
were applied to the previous mcdb (replaced by the recreated mcdb).
80+
81+
mcdb performant design tidbits
82+
------------------------------
83+
Data locality is one of the primary keys to performance: djb cdb linearly
84+
probed open hash tables minimize arbitrary jumps around memory.
85+
Endian tranformations are absorbed in the noise of memory load latency.
86+
87+
An immediate performance boost can be seen when using mcdb for nsswitch
88+
databases on machines that run Apache with suexec which was the reason behind
89+
using cdb, and now mcdb, for passwd and group databases. See INSTALL file.
90+
91+
smallest mcdb
92+
-------------
93+
Empty mcdb with no keys is 4K
94+
$ echo | ./mcdbctl make empty.mcdb -
95+
96+
mcdb supports empty keys and values
97+
-----------------------------------
98+
$ echo -e "+0,0:->\n" | ./mcdbctl make empty.mcdb -
99+
In practice, empty values can be useful to when interested only in existence,
100+
or not, of keys. However, more than one empty key is not generically useful,
101+
(or a use case is not immediately apparent to me).
102+
103+
mcdb supports one-char tag prefix on keys
104+
-----------------------------------------
105+
One method of storing multiple types of constant data in a single mcdb is
106+
to prefix the key with a single character that indicate the type of data in
107+
the key. This feature is used by nss_mcdb for various nsswitch.conf databases.
108+
For example, passwd databaes can be queried by name or by uid. Both are stored
109+
in the same mcdb database, with a different character prefixing the keys for
110+
names versus the keys for uids. This prefix tag character feature of mcdb
111+
allows for tagged-key queries without the extra work of having to create a
112+
temporary buffer to set the tag followed by a copy of the key.
113+
114+
mcdb creation speed
115+
-------------------
116+
mcdb creation supports the same input format as cdb. There is an overhead to
117+
formating the input as ASCII numbers, as well as parsing and translating the
118+
ASCII back to native numbers. (The cost adds up with lots of keys.) Another
119+
method to create mcdb which is even faster is for a program to link with and
120+
call mcdb_make_* routines directly, as is done in nss_mcdbctl.
121+
122+
mcdb in memory only
123+
-------------------
124+
Creating an mcdb in memory without filesystem backing is possible. This might
125+
be useful for testing, but in practice, a hash function customized to the
126+
specific purpose at hand would be faster. Setting the fd to -1 in the call to
127+
mcdb_make_start() is how to tell mcdb routines the mcdb is not backed by the
128+
filesystem. Then, caller must create mmap large enough for all keys and data
129+
(filling in nkeys (num keys), total_klen (key len), and total_dlen (data len)).
130+
struct mcdb_make m;
131+
size_t msz;
132+
mcdb_make_start(&m, -1, malloc, free);
133+
/* preallocated mcdb mmap to proper full size; (msz+15) & ~15 for alignment */
134+
msz = (MCDB_HEADER_SZ + nkeys*8 + total_klen + total_dlen + 15) & ~15;
135+
msz+= (msz < UINT_MAX ? nkeys*16 : nkeys*32);
136+
m.fsz = m.msz = (msz = (msz + ~m.pgalign) & m.pgalign); /*align to page size*/
137+
m.map = (char *)mmap(0, msz, PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
138+
if (m.map == MAP_FAILED) return false;
139+
/* ... loop and mcdb_make_add(), then mcdb_make_finish() */
140+
/* ... when finished using the mmap, then munmap(m.map, m.msz) */
141+
142+
compiler intrinsics/builtins not (yet) tested on all platforms
143+
--------------------------------------------------------------
144+
The compiler intrinsics/builtins in code_attributes.h have been tested on i686
145+
and x86_64 platforms, but have not (yet) been tested on all other platforms.
146+
They were written from vendor documentation available on the internet, but I do
147+
not have private access to all those platforms in order to test. Please send
148+
to me advice, suggestions, and corrections.

README

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
mcdb - fast, reliable, simple code to create and read constant databases
2+
3+
README - summary (this file)
4+
INSTALL - quick installation
5+
COPYING - copyright/license
6+
NOTES - technical details (if interested)
7+
8+
t/PERFORMANCE - performance notes
9+
10+
mcdb (mmap constant database) is based on the Public Domain cdb package, a:
11+
"fast, reliable, simple package for creating and reading constant databases."
12+
mcdb is almost 33% faster, provides support for use in threaded programs, and
13+
supports databases larger than 4 GB.
14+
15+
http://cr.yp.to/cdb.html provides information about cdb, on which mcdb is based.

code_attributes.h

+21
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,24 @@
1+
/*
2+
* code_attributes - portability macros for compiler-specific code attributes
3+
*
4+
* Copyright (c) 2010, Glue Logic LLC. All rights reserved. code()gluelogic.com
5+
*
6+
* This file is part of mcdb.
7+
*
8+
* mcdb is free software: you can redistribute it and/or modify it under
9+
* the terms of the GNU Lesser General Public License as published by
10+
* the Free Software Foundation, either version 2.1 of the License, or
11+
* (at your option) any later version.
12+
*
13+
* mcdb is distributed in the hope that it will be useful,
14+
* but WITHOUT ANY WARRANTY; without even the implied warranty of
15+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16+
* GNU Lesser General Public License for more details.
17+
*
18+
* You should have received a copy of the GNU Lesser General Public License
19+
* along with mcdb. If not, see <http://www.gnu.org/licenses/>.
20+
*/
21+
122
#ifndef INCLUDED_CODE_ATTRIBUTES_H
223
#define INCLUDED_CODE_ATTRIBUTES_H
324

mcdb.c

+23-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,26 @@
1-
/* License: GPLv3 */
1+
/*
2+
* mcdb - fast, reliable, simple code to create and read constant databases
3+
*
4+
* Copyright (c) 2010, Glue Logic LLC. All rights reserved. code()gluelogic.com
5+
*
6+
* This file is part of mcdb.
7+
*
8+
* mcdb is free software: you can redistribute it and/or modify it under
9+
* the terms of the GNU Lesser General Public License as published by
10+
* the Free Software Foundation, either version 2.1 of the License, or
11+
* (at your option) any later version.
12+
*
13+
* mcdb is distributed in the hope that it will be useful,
14+
* but WITHOUT ANY WARRANTY; without even the implied warranty of
15+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16+
* GNU Lesser General Public License for more details.
17+
*
18+
* You should have received a copy of the GNU Lesser General Public License
19+
* along with mcdb. If not, see <http://www.gnu.org/licenses/>.
20+
*
21+
*
22+
* mcdb is originally based upon the Public Domain cdb-0.75 by Dan Bernstein
23+
*/
224

325
#ifndef _XOPEN_SOURCE /* POSIX_MADV_RANDOM */
426
#define _XOPEN_SOURCE 600

mcdb.h

+19-54
Original file line numberDiff line numberDiff line change
@@ -1,64 +1,29 @@
1-
/* mmap constant database (mcdb)
1+
/*
2+
* mcdb - fast, reliable, simple code to create and read constant databases
23
*
3-
* Copyright 2010 Glue Logic LLC
4-
* License: GPLv3
5-
* Originally based upon the Public Domain 'cdb-0.75' by Dan Bernstein
4+
* Copyright (c) 2010, Glue Logic LLC. All rights reserved. code()gluelogic.com
65
*
7-
* - updated to C99 and POSIX.1-2001 (not available/portable when djb wrote cdb)
8-
* - optimized for mmap access to constant db (and avoid double buffering)
9-
* - redesigned for use in threaded programs (thread-safe interface available)
10-
* - convenience routines to check for updated constant db and to refresh mmap
11-
* - support cdb > 4 GB with 64-bit program (required to mmap() mcdb > 4 GB)
12-
* - 64-bit safe (for use in 64-bit programs)
6+
* This file is part of mcdb.
137
*
14-
* Advantages over external database
15-
* - performance: better; avoids context switch to external database process
16-
* Advantages over specialized hash map
17-
* - generic, reusable
18-
* - maintained (created and verified) externally from process (less overhead)
19-
* - shared across processes (though shared-memory could be used for hash map)
20-
* - read-only (though memory pages could also be marked read-only for hash map)
21-
* Disadvantages to specialized hash map
22-
* - performance: slightly slower than specialized hash map
23-
* Disadvantages to djb cdb
24-
* - mmap requires address space be available into which to mmap the const db
25-
* (i.e. large const db might fail to mmap into 32-bit process)
26-
* - mmap page alignment requirements and use of address space limits const db
27-
* max size when created by 32-bit process. Sizes approaching 4 GB may fail.
28-
* - arbitrary limit of each key or data set to (INT_MAX - 8 bytes; almost 2 GB)
29-
* (djb cdb doc states there is no limit besides cdb fitting into 4 GB)
30-
* (writev() on some platforms in 32-bit exe might also have 2 GB limit)
8+
* mcdb is free software: you can redistribute it and/or modify it under
9+
* the terms of the GNU Lesser General Public License as published by
10+
* the Free Software Foundation, either version 2.1 of the License, or
11+
* (at your option) any later version.
3112
*
32-
* Incompatibilities with djb cdb
33-
* - padding added at the end of key,value data to 16-byte align hash tables
34-
* (incompatible with djb cdbdump)
35-
* - initial table and hash tables have 8-byte values instead of 4-byte values
36-
* in order to support cdb > 4 GB. cdb uses 24 bytes per record plus 2048,
37-
* whereas mcdb uses 24 bytes per record plus 4096 when data section < 4 GB,
38-
* and mcdb uses 40 bytes per record plus 4096 when data section >= 4 GB.
39-
* - packing of integral lengths into char strings is done big-endian for
40-
* performance in packing/unpacking integer data in 4-byte (or better)
41-
* aligned addresses. (incompatible with all djb cdb* tools and cdb's)
42-
* (djb cdb documents all 32-bit quantities stored in little-endian form)
43-
* Memory load latency is limiting factor, not the x86 assembly instruction
44-
* to convert uint32_t to and from big-endian (when data is 4-byte aligned).
13+
* mcdb is distributed in the hope that it will be useful,
14+
* but WITHOUT ANY WARRANTY; without even the implied warranty of
15+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16+
* GNU Lesser General Public License for more details.
4517
*
46-
* Limitations
47-
* - 2 billion keys
48-
* As long as djb hash is 32-bit, mcdb_make.c limits number of hash keys to
49-
* 2 billion. cdb handles hash collisions, but there is a small expense each
50-
* collision. As the key space becomes denser within the 2 billion, there is
51-
* greater chance of collisions. Input strings also affect this probability,
52-
* as do the sizes of the hash tables.
53-
* - process must mmap() entire mcdb
54-
* Each mcdb is mmap()d in its entirety into the address space. For 32-bit
55-
* programs that means there is a 4 GB limit on size of mcdb, minus address
56-
* space used by the program (including stack, heap, shared libraries, shmat
57-
* and other mmaps, etc). Compile and link 64-bit to remove this limitation.
18+
* You should have received a copy of the GNU Lesser General Public License
19+
* along with mcdb. If not, see <http://www.gnu.org/licenses/>.
20+
*
21+
*
22+
* mcdb is originally based upon the Public Domain cdb-0.75 by Dan Bernstein
5823
*/
5924

60-
#ifndef MCDB_H
61-
#define MCDB_H
25+
#ifndef INCLUDED_MCDB_H
26+
#define INCLUDED_MCDB_H
6227

6328
#include <stdbool.h> /* bool */
6429
#include <stdint.h> /* uint32_t, uintptr_t */

mcdb_error.c

+24
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,27 @@
1+
/*
2+
* mcdb_error - mcdb error codes and messages
3+
*
4+
* Copyright (c) 2010, Glue Logic LLC. All rights reserved. code()gluelogic.com
5+
*
6+
* This file is part of mcdb.
7+
*
8+
* mcdb is free software: you can redistribute it and/or modify it under
9+
* the terms of the GNU Lesser General Public License as published by
10+
* the Free Software Foundation, either version 2.1 of the License, or
11+
* (at your option) any later version.
12+
*
13+
* mcdb is distributed in the hope that it will be useful,
14+
* but WITHOUT ANY WARRANTY; without even the implied warranty of
15+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16+
* GNU Lesser General Public License for more details.
17+
*
18+
* You should have received a copy of the GNU Lesser General Public License
19+
* along with mcdb. If not, see <http://www.gnu.org/licenses/>.
20+
*
21+
*
22+
* mcdb is originally based upon the Public Domain cdb-0.75 by Dan Bernstein
23+
*/
24+
125
#ifndef _POSIX_C_SOURCE /* 200112L for XSI-compliant sterror_r() */
226
#define _POSIX_C_SOURCE 200112L
327
#endif

0 commit comments

Comments
 (0)