Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Simplifications and clarifications to lsmusr.wiki. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
33eca2e1f4379d3030c278f53e6cc67c |
User & Date: | dan 2013-02-01 19:49:42.041 |
Context
2013-02-02
| ||
16:45 | Fix LSM single-process mode so that it holds an exclusive lock on the database file - preventing connections from within external processes. check-in: d6bd08ca0e user: dan tags: trunk | |
2013-02-01
| ||
19:49 | Simplifications and clarifications to lsmusr.wiki. check-in: 33eca2e1f4 user: dan tags: trunk | |
2013-01-31
| ||
05:58 | Add the definition of sqlite4_stricmp() to sqlite.h.in. Avoid multiple declarations the u8 and similar typedefs in the amalgmation. check-in: d966049dd6 user: drh tags: trunk | |
Changes
Changes to www/lsmusr.wiki.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | <title>LSM Users Guide</title> <nowiki> <h2>Table of Contents</h2> <div id=start_of_toc></div> <a href=#introduction_to_lsm style=text-decoration:none>1. Introduction to LSM</a><br> <a href=#using_lsm_in_applications style=text-decoration:none>2. Using LSM in Applications </a><br> <a href=#basic_usage style=text-decoration:none>3. Basic Usage</a><br> <a href=#opening_and_closing_database_connections style=text-decoration:none>3.1. Opening and Closing Database Connections </a><br> <a href=#writing_to_a_database style=text-decoration:none>3.2. Writing to a Database </a><br> <a href=#reading_from_a_database style=text-decoration:none>3.3. Reading from a Database </a><br> <a href=#database_transactions_and_mvcc style=text-decoration:none>3.4. Database Transactions and MVCC </a><br> <a href=#data_durability style=text-decoration:none>4. Data Durability </a><br> <a href=#compressed_and_encrypted_databases style=text-decoration:none>5. Compressed and Encrypted Databases </a><br> <a href=#performance_tuning style=text-decoration:none>6. Performance Tuning</a><br> | > > > > > > | | | < | | | | | | | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | <title>LSM Users Guide</title> <nowiki> <h2>Table of Contents</h2> <div id=start_of_toc></div> <a href=#introduction_to_lsm style=text-decoration:none>1. Introduction to LSM</a><br> <a href=#using_lsm_in_applications style=text-decoration:none>2. Using LSM in Applications </a><br> <a href=#basic_usage style=text-decoration:none>3. Basic Usage</a><br> <a href=#opening_and_closing_database_connections style=text-decoration:none>3.1. Opening and Closing Database Connections </a><br> <a href=#writing_to_a_database style=text-decoration:none>3.2. Writing to a Database </a><br> <a href=#reading_from_a_database style=text-decoration:none>3.3. Reading from a Database </a><br> <a href=#database_transactions_and_mvcc style=text-decoration:none>3.4. Database Transactions and MVCC </a><br> <a href=#data_durability style=text-decoration:none>4. Data Durability </a><br> <a href=#compressed_and_encrypted_databases style=text-decoration:none>5. Compressed and Encrypted Databases </a><br> <a href=#performance_tuning style=text-decoration:none>6. Performance Tuning</a><br> <a href=#overview_of_lsm_architecture style=text-decoration:none>6.1. Overview of LSM Architecture</a><br> <a href=#performance_related_configuration_options style=text-decoration:none>6.2. Performance Related Configuration Options </a><br> <a href=#work_and_checkpoint_scheduling style=text-decoration:none>6.3. Work and Checkpoint Scheduling</a><br> <a href=#automatic_scheduling style=text-decoration:none>6.3.1. Automatic Scheduling</a><br> <a href=#explicit_scheduling style=text-decoration:none>6.3.2. Explicit Scheduling</a><br> <a href=#compulsary_work_and_checkpoints style=text-decoration:none>6.3.3. Compulsary Work and Checkpoints</a><br> <a href=#database_file_optimization style=text-decoration:none>6.4. Database File Optimization</a><br> <div id=end_of_toc></div> <h2>Overview</h2> <p>This document describes the LSM embedded database library and use thereof. It is part user-manual and part tutorial. It is intended to complement the <a href=lsmapi.wiki>LSM API reference manual</a>. <p>The <a href=#introduction_to_lsm>first section</a> of this document contains a description of the LSM library and its features. <a href=#using_lsm_in_applications>Section 2</a> describes how to use LSM from within a C or C++ application (how to compile and link LSM, what to #include etc.). The <a href=#basic_usage>third section</a> describes the essential APIs that applications use to open and close database connections, and to read from and write to databases. <p>The three sections described above contain all the information required to create applications that use LSM. The remaining sections discuss more specialized topics. <a href=#data_durability>Section 4</a> discusses the configuration parameter that influences transaction durability (the guarantees offered with respect to recently committed transactions if a power failure occurs). <a href=#compressed_and_encrypted_databases>Section 5</a> explains the interface provided by LSM that allows external data compression and/or encryption functions to be used to create compressed and/or encrypted databases. <a href=#performance_tuning>Section 6</a> deals with performance tuning. <h1 id=introduction_to_lsm>1. Introduction to LSM</h1> <p>LSM is an embedded database library for key-value data, roughly similar in scope to <a href="http://www.oracle.com/technetwork/products/berkeleydb/overview/index.html">Berkeley DB</a>, <a href="http://code.google.com/p/leveldb/">LevelDB</a> or |
︙ | ︙ | |||
74 75 76 77 78 79 80 | <li> Iterating through a range of database keys (either forwards or backwards). </ul> <p>Other salient features are: <ul> | | | | | | | | | | | < < | | | | | | | | < | 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | <li> Iterating through a range of database keys (either forwards or backwards). </ul> <p>Other salient features are: <ul> <li><p>A <b>single-writer/multiple-reader MVCC</b> based transactional concurrency model. SQL style nested sub-transactions are supported. Clients may concurrently access a single LSM database from within a single process or multiple application processes. <li><p>An entire database is stored in a <b>single file on disk</b>. <li><p>Data <b>durability in the face of application or power failure</b>. LSM may optionally use a write-ahead log file when writing to the database to ensure committed transactions are not lost if an application or power failure occurs. <li>An API that <b>allows external data compression and/or encryption routines to be used </b> to create and access compressed and/or encrypted databases. </ul> <p>Many database systems that support range queries, including <a href=http://www.sqlite.org>SQLite 3</a>, Berkeley DB and Kyoto Cabinet, are based on one of many variants of the <a href="http://en.wikipedia.org/wiki/B-tree">b-tree data structure</a>. B-trees are attractive because a b-tree structure minimizes the number of disk sectors that must be read from disk when searching the database for a specific key. However, b-tree implementations usually suffer from poor write localization - updating the contents of a b-tree often involves modifying the contents of nodes scattered throughout the database file. If the database is stored on a spinning disk (HDD), then the disk heads must be moved between writing non-contiguous sectors, which is extremely slow. If the database is stored on solid state storage (SDD) a similar phenomena is encountered due to the large erase-block sizes. In general, writing to a series of contiguous disk sectors is orders of magnitude faster than updating to the same number of disk sectors scattered randomly throughout a large file. Additionally, b-tree structures are prone to fragmentation, reducing the speed of range queries. <p><i>TODO: fix the link in the next paragraph to point to a description of the log-structured-merge tree within lsm.wiki (or its successor).</i> <p>LSM uses a <a href=lsm.wiki>different data structure</a> that makes the following performance tradeoffs relative to a b-tree: <ul> <li> A very large percentage of the disk sectors modified when writing to the database are contiguous. Additionally, in many cases the total number of sectors written to disk is reduced. This makes writing to an LSM database faster than the equivalent b-tree. <li> LSM databases do not suffer from fragmentation to the same degree as b-trees. This means that the performance of large range queries does not degrade as the database is updated as it may with a b-tree. <li> Under some circumstances searching an LSM database for a given key will involve examining more disk sectors than it would with a b-tree. In terms of disk sectors accessed when searching a database of size N, both b-trees and LSM provide O(log(N)) efficiency, but the base of the logarithm is generally larger for a b-tree than for LSM. </ul> <p>In other words, writing to an LSM database should be very fast and scanning through large ranges of keys should also perform well, but searching the database for specific keys may be slightly slower than when using a b-tree based system. Additionally, avoiding random writes in favour of largely contiguous updates (as LSM does) can significantly reduce the wear on SSD or |
︙ | ︙ | |||
188 189 190 191 192 193 194 | a pointer to a <a href=lsmapi.wiki#lsm_env>database environment object</a> or NULL. Almost all applications should pass NULL. A database environment object allows the application to supply custom implementations of the various operating system calls that LSM uses to read and write files, allocate heap memory, and coordinate between multiple application threads and processes. This is normally only required if LSM is being used on a platform that is not supported by default. Passing NULL instructs the library to use the default | < | | | | 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | a pointer to a <a href=lsmapi.wiki#lsm_env>database environment object</a> or NULL. Almost all applications should pass NULL. A database environment object allows the application to supply custom implementations of the various operating system calls that LSM uses to read and write files, allocate heap memory, and coordinate between multiple application threads and processes. This is normally only required if LSM is being used on a platform that is not supported by default. Passing NULL instructs the library to use the default implementations of all these things. The second argument to lsm_new() is an output variable. Assuming the call is successful, *pDb is set to point to the new database handle before returning. <p>The first argument passed to lsm_open() must be an existing database handle. The second is the name of the database file to connect to. Once lsm_open() has been successfully called on a database handle, it can not be called again on the same handle. Attempting to do so is an LSM_MISUSE error. <p>For example, to create a new handle and connect it to database "test.db" |
︙ | ︙ | |||
225 226 227 228 229 230 231 | <verbatim> rc = lsm_close(db); </verbatim> <p>It is important that lsm_close() is called to close all database handles created with lsm_new(), particularly if the connection has written to the | | | | | | 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 | <verbatim> rc = lsm_close(db); </verbatim> <p>It is important that lsm_close() is called to close all database handles created with lsm_new(), particularly if the connection has written to the database. If an application writes to the database and then exits without closing its database connection, then subsequent clients may have to run "database recovery" when they open the database, slowing down the lsm_open() call. Additionally, not matching each successful lsm_new() call with a call to lsm_close() is a resource leak. <p>Counter-intuitively, an lsm_close() call may fail. In this case the database handle is not closed, so if the application exits it invites the "database recovery" performance problem mentioned above. The usual reason for an lsm_close() call failing is that the database handle has been used to create <a href=lsmapi.wiki#lsm_csr_open>database cursors</a> that have not been closed. Unless all database cursors are closed before lsm_close() is called, |
︙ | ︙ | |||
768 769 770 771 772 773 774 | </i> <h1 id=performance_tuning>6. Performance Tuning</h1> <p> This section describes the various measures that can be taken in order to fine-tune LSM in order to improve performance in specific circumstances. | | > > > | | | > | < | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | | | | | > > > > > > > > > > > > > > > > > > > > > > > | | | | | | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | < < < < < | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | < < < < < < < < < | < < < < < | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | | | | | > | < | > | > | > > > > > > > > > > > > > | | < | | > > > | | | | | | | | | | | | 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 | </i> <h1 id=performance_tuning>6. Performance Tuning</h1> <p> This section describes the various measures that can be taken in order to fine-tune LSM in order to improve performance in specific circumstances. Sub-section 6.1 contains a high-level overview of the <a href=#overview_of_lsm_architecture>system architecture</a> intended to help in understanding the various performance tradeoffs and optimizations available to LSM applications. Sub-section 6.2 identifies the <a href=#performance_related_configuration_options> configuration parameters</a> that can be used to influence database performance. Sub-section 6.3 discusses options and methods for scheduling the time-consuming processes of actually <a href=#work_and_checkpoint_scheduling>writing and syncing the database file </a>to disk. Finally, 6.4 introduces "<a href=#database_file_optimization>database optimization</a>" - the process of reorganizing a database file internally so that it is as small as possible and optimized for search queries. <h2 id=overview_of_lsm_architecture>6.1. Overview of LSM Architecture</h2> <p>The following steps describe the journey taken by data written to the database from the application to the database file on disk: <ol> <li> <p> When an application writes to an LSM database, the new data is first written to a log file and stored in an in-memory tree structure. The log file is used for database recovery - if an application crash or power failure occurs and the contents of the in-memory tree are lost, data is recovered by reading the log file. <li> <p> Once sufficient data has been accumulated in an in-memory tree (by default "sufficient data" means 1MB, including data structure overhead), it is marked as "old" and a new "live" in-memory tree created. An old in-memory tree is immutable - new data is always inserted into the live tree. There may be at most one old tree in memory at any time. <li> <p> The contents of an old in-memory tree may be written into the database file at any point. Once its contents have been written (or "flushed") to the database file, the in-memory tree may be discarded. Flushing an in-memory tree to the database file creates a new database "segment". A database segment is an immutable b-tree structure stored within the database file. A single database file may contain up to 64 segments. <li> <p> At any point, two or more existing segments within the database may be merged together into a single segment. Once their contents has been merged into the new segment, the original segments may be discarded. <li> <p> After the set of segments in a database file has been modified (either by flushing an in-memory tree to disk or by merging existing segments together), the changes may be made persistent by "checkpointing" the database. Checkpointing involves syncing the contents of the database file to disk and updating the database file header. </ol> <p>Steps 3 and 4 above are known as "working" on the database. Step 5 is refered to as "checkpointing". By default, database connections perform work and checkpoint operations periodically from within calls to API functions <code>lsm_insert</code>, <code>lsm_delete</code>, <code>lsm_delete_range</code> and <code>lsm_commit</code> (i.e. functions that write to the database). Alternatively, work and checkpoint operations may be performed on demand using the <code>lsm_work</code> and <code>lsm_checkpoint</code> APIs. By opening a second database connection, these operations may be moved to a background thread or process. <p>Optimizing database write throughput and responsiveness is done by configuring and scheduling work and checkpoint operations, and by configuring a few other parameters, as described in the following two sections. <p>The speed of database read operations is largely determined by the number of segments in the database file. So optimizing read operations is also linked to the configuring and scheduling of database write operations, as these policies determine the number of segments that are present in the database file at any time. <p>Any data written to the database file since the last checkpoint may be lost if a power or application failure occurs. As a result of this, regular database checkpoints are required to ensure that unused space within the log file and database file can be reused in a timely fashion. Specifically: <ul> <li><p>Space within the log file cannot be recycled until the corresponding data has been written into a database segment and a checkpoint performed. <li><p>When two or more existing segments are merged into a new segment within the database file, the space occupied by the original segments may not be recycled until after a checkpoint has been performed. </ul> <p> In other words, without checkpoints the system will function, but both the log and database files will grow indefinitely as the database is modified (even if the size of the dataset remains constant). Additionally, if a crash or power failure occurs, the next client to open the database file has to process all data written to the log file since the most recent checkpoint. If checkpoints are performed infrequently, this can be a time consuming exercise. <p>If a connection attempts to open a write transaction on the database when another connection already has an open write transaction, the attempt fails and LSM_BUSY is returned to the caller. This is because to write to the database, the connection must first obtain the WRITER lock - and at most one connection may simultaneously hold the WRITER lock. As well as the WRITER lock, there are two other exclusive locks that may be obtained on the database - the WORKER and CHECKPOINTER locks. These are used, not surprisingly, to ensure that at most one connection attempts to work on or checkpoint the database at a time. More specifically, the roles of the three locks are: <table valign=top> <tr><td valign=top>WRITER<td style="width:3ex"><td> The WRITER lock is required to modify the contents of the in-memory tree. Including marking an in-memory tree as "old" and starting a new live tree. It is also required to write to the log file. <tr><td valign=top>WORKER<td><td> The WORKER lock is required to write to segments within the database file. Either when merging two or more existing segments within the database, or when flushing an in-memory tree to disk to create a new segment. <tr><td valign=top>CHECKPOINTER<td><td> The CHECKPOINTER lock is required to write to the database file header. </table> <p>The three locks are independent. It is possible to simultaneously have one client writing to the database, another working on the database file and a third performing a checkpoint operation. <h2 id=performance_related_configuration_options>6.2. Performance Related Configuration Options </h2> <p>The options in this section are all set to integer values. They may be set and queried using the <a href=lsmapi.wiki#lsm_config>lsm_config()</a> function. To set an option to a value, lsm_config() is used as follows: <verbatim> /* Set the LSM_CONFIG_AUTOFLUSH option to 1MB (1024 KB) */ int iVal = 1024; rc = lsm_config(db, LSM_CONFIG_AUTOFLUSH, &iVal); </verbatim> <p>In order to query the current value of an option, the initial value of the parameter (iVal in the example code above) should be set to a negative value. Or any other value that happens to be out of range for the parameter - negative values just happen to be out of range for all integer lsm_config() parameters. <verbatim> /* Set iVal to the current value of LSM_CONFIG_AUTOFLUSH */ int iVal = -1; rc = lsm_config(db, LSM_CONFIG_AUTOFLUSH, &iVal); </verbatim> <dl> <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOCHECKPOINT>LSM_CONFIG_AUTOCHECKPOINT</a> <dd> <p style=margin-top:0> This option determines how often the database is checkpointed (synced to disk). A checkpoint is performed after N KB (approximately) have been written to the database file, where N is the value of this option. Increasing this value (say to 4MB or even 8MB) may improve overall write throughput. <p>The default value is 2048 (2MB). <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOFLUSH>LSM_CONFIG_AUTOFLUSH</a> <dd> <p style=margin-top:0> This option determines how much data in KB is allowed to accumulate in a live in-memory tree before it is marked as "old" (and made eligible to be flushed through to the database file). Increasing this value may improve overall write throughput. Decreasing it reduces memory usage. <p>The default value is 1024 (1MB). <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOMERGE>LSM_CONFIG_AUTOMERGE</a> <dd> <p style=margin-top:0> <p>If auto-work (the LSM_CONFIG_AUTOWORK option below) is enabled, then this option is set to the number of segments that the library attempts to merge simultaneously. Increasing this value may reduce the total amount of data written to the database file. Decreasing it increases the amount of data written to the file, but also decreases the average number of segments present in the file, which can improve the performance of database read operations. <p><span style=color:red>If auto-work is not enabled...</span> <p>The default value is 4. This option must be set to a value between 2 and 8, inclusive. <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOWORK>LSM_CONFIG_AUTOWORK</a> <dd> <p style=margin-top:0> <p>This option may be set to either 1 (true) or 0 (false). If it is set to true, then work and checkpoint operations are automatically scheduled within calls to lsm_insert(), lsm_delete(), lsm_delete_range() and lsm_commit(). Otherwise, if it is set to false, these operations must be explicitly invoked by the application. See <span style=color:red>some link here</span> for details. <p>The default value is 1. <dt> <a href=lsmapi.wiki#LSM_CONFIG_MMAP>LSM_CONFIG_MMAP</a> <dd> <p style=margin-top:0> If LSM is running on a system with a 64-bit address space, this option may be set to either 1 (true) or 0 (false). On a 32-bit platform, it is always set to 0. <p> If it is set to true, the entire database file is memory mapped. Or, if it is false, data is accessed using ordinary OS file read and write primitives. Memory mapping the database file can significantly improve the performance of read operations, as database pages do not have to be copied from operating system buffers into user space buffers before they can be examined. <p>This option can only be set before lsm_open() is called on the database connection. <p>The default value is 1 (true) on a 64-bit platform, and 0 otherwise. <dt> <a href=lsmapi.wiki#LSM_CONFIG_MULTIPLE_PROCESSES>LSM_CONFIG_MULTIPLE_PROCESSES</a> <dd> <p style=margin-top:0> This option may also be set to either 1 (true) or 0 (false). The default value is 1 (true). If it is set to false, then the library assumes that all database clients are located within the same process (have access to the same memory space). Assuming this means the library can avoid using OS file locking primitives to lock the database file, which speeds up opening and closing read and write transactions. <p>This option can only be set before lsm_open() is called on the database connection. <p>If this option is set to false and there is already a connection to the database from another process when lsm_open() is called, the lsm_open() call fails with error code LSM_BUSY. <span style=color:red>todo: It doesn't actually do this yet. But it should...</span> <dt> <a href=lsmapi.wiki#LSM_CONFIG_SAFETY>LSM_CONFIG_SAFETY</a> <dd> <p style=margin-top:0> The effect of this option on <a href=#data_durability>data durability</a> is described above. <p>From a performance point of view, this option determines how often the library pauses to wait for data written to the file-system to be stored on the persistent media (e.g. hard disk or solid-state memory). This is also known as "syncing" data to disk. Since this is orders of magnitude slower than simply copying data into operating system buffers, the value of this option has a large effect on write performance. <p>If LSM_CONFIG_SAFETY is set to 2 (FULL), then the library syncs the data written to the log file to disk whenever a transaction is committed. Or, if LSM_CONFIG_SAFETY is set to 1 (NORMAL), then data is only synced to disk when a checkpoint is performed (see above). Finally, if it is set to 0 (OFF), then no data is ever synced to disk. <p>The default value is 1 (NORMAL). <dt> <a href=lsmapi.wiki#LSM_CONFIG_USE_LOG>LSM_CONFIG_USE_LOG</a> <dd> <p style=margin-top:0> This is another option that may be set to either 1 (true) or 0 (false). The default value is 1 (true). If it is set to false, then the library does not write data into the database log file. This makes writing faster, but also means that if an application crash or power failure occurs, it is very likely that any recently committed transactions will be lost. <p>If this option is set to true, then an application crash cannot cause data loss. Whether or not data loss may occur in the event of a power failure depends on the value of the <a href=#data_durability> LSM_CONFIG_SAFETY</a> parameter. <p>This option can only be set if the connection does not currently have an open write transaction. </dl> <h2 id=work_and_checkpoint_scheduling>6.3. Work and Checkpoint Scheduling</h2> <h3 id=automatic_scheduling>6.3.1. Automatic Scheduling</h3> <p>This section describes how work and checkpoint operations are scheduled if the boolean LSM_CONFIG_AUTOWORK parameter is set to true. Automatic work operations may occur within calls to any of the following API functions: <ul> <li>lsm_insert() <li>lsm_delete() <li>lsm_delete_range() <li>lsm_commit() </ul> <p>Each time a transaction is committed in auto-work mode, the library checks to see if there exists an "old" in-memory tree (see the LSM_CONFIG_AUTOFLUSH option above). If so, it attempts to flush it to disk immediately. Unlike merges of existing segments, the entire in-memory tree must be flushed to disk before control is returned to the user. It is not possible to incrementally flush an in-memory tree in the same ways as it is possible to incrementally merge existing database segments together. <p>Each segment in the database file is assigned an "age" - an integer zero or greater indicating how many times the data in the segment has been merged. A segment created by flushing the in-memory tree to disk is assigned an age of 1. When two or more segments with age=1 are merged together to create a larger segment, it is assigned an age of 2. And so on. <p>Assuming auto-work is enabled, the library periodically checks the state of the database file to see if there exist N or more segments with the same age value A, where N is the value assigned to the LSM_CONFIG_AUTOMERGE parameter. If so, work is done to merge all such segments with age=A into a new, larger segment assigned age=A+1. At present, "periodically" as used above means roughly once for every 32KB of data (including overhead) written to the in-memory tree. The merge operation is not necessarily completed within a single call to a write API (this would result in blocking the writer thread for too long in many cases - in large databases segments may grow to be many GB in size). Currently, the amount of data written by a single auto-work operation is roughly 32KB multiplied by the number of segments in the database file. This formula may change - the point is that the library attempts to limit the amount of data written in order to avoid blocking the writer thread for too long within a single API call. <p>Checkpoint operations are scheduled based on the value assigned to the LSM_CONFIG_AUTOCHECKPOINT configuration parameter. <p>In order to automatically perform work and checkpoint operations, the client must obtain the WORKER and CHECKPOINTER locks, respectively. If an attempt to obtain either of these locks fails (because some other client is already holding them), it is not an error, the scheduled work or checkpoint is simply not performed. <h3 id=explicit_scheduling>6.3.2. Explicit Scheduling</h3> <p>The alternative to automatic scheduling of work and checkpoint operations is to explicitly schedule them - possibly in a background thread or dedicated application process. In order to disable automatic work, a client must set the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of a database connection, not of a database itself, so it must be cleared separately by all processes that may write to the database. Otherwise, they may attempt automatic database work or checkpoints. <verbatim> /* Disable auto-work on connection db */ int iVal = 0; lsm_config(db, LSM_CONFIG_AUTOWORK, &iVal); </verbatim> <p>The lsm_work() function is used to explicitly perform work on the database: <verbatim> int lsm_work(lsm_db *db, int nMerge, int nKB, int *pnWrite); </verbatim> <p>Parameter nKB is passed a limit on the number of KB of data that should be written to the database file before the call returns. It is a hint only, the library does not honor this limit strictly. <p>If the database has an old in-memory tree when lsm_work() is called, it is flushed to disk. If this means that more than nKB KB of data is written to the database file, no further work is performed. Otherwise, the number of KB written is subtracted from nKB before proceeding. <p>If parameter nMerge is greater than 1, then the library searches for nMerge or more segments of the same age within the database file and performs up to nKB KB of work to merge them together. If the merge is completed before the nKB limit is exceeded, the library searches for another set of nMerge or more segments to work on, and so on. If at any point no such set of nMerge segments can be found, the call returns without performing any further work. <p>Calling lsm_work() with the nMerge argument set to 1 is used to "optimize" the database (see below). Passing a value of zero or less for the nMerge parameter is an error. <p>In any case, before returning the value of *pnWrite is set to the actual number of KB written to the database file. <p>The example code below might be executed in a background thread or process in order to perform database work and checkpointing. In this case all other clients should set the LSM_CONFIG_AUTOWORK parameter to zero. <verbatim> int rc; |
︙ | ︙ | |||
1263 1264 1265 1266 1267 1268 1269 | ** time, things may have changed (the other process may have relinquished ** the WORKER lock, or an in-memory tree may have been marked as old). */ if( nWrite==0 ) sleep(1); } </verbatim> | > > > > > > > | > > > > > > | > > | | < | | | < > | | < > > > > > | < | | > | > > | | | | | | | < < < < | | > | | < < | < < < < < < < < < | | > | < < < | < > > | 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 | ** time, things may have changed (the other process may have relinquished ** the WORKER lock, or an in-memory tree may have been marked as old). */ if( nWrite==0 ) sleep(1); } </verbatim> <p>The mechanism associated with the LSM_CONFIG_AUTOCHECKPOINT configuration parameter applies to data written by both automatically scheduled work and work performed by calls to the lsm_work() function. The amount of uncheckpointed data that has been written into the database file is a property of the database file, not a single connection, so checkpoints occur at the configured interval even if multiple connections are used to work on the database. <p>Alternatively, checkpoint operations may be scheduled separately. If the LSM_CONFIG_AUTOCHECKPOINT parameter is set to zero, then a connection never performs a database checkpoint, regardless of how much data it or any other connection writes into the database file. As with LSM_CONFIG_AUTOWORK, this parameter must be zeroed for all connections that may perform work on the database. Otherwise, they may perform a checkpoint operation. <p>The <a href=lsmapi.wiki#lsm_checkpoint>lsm_checkpoint()</a> API is used to expicitly request a checkpoint. <verbatim> int lsm_checkpoint(lsm_db *db, int *pnKB); </verbatim> <p>If no work has been performed on the database since the previous checkpoint, lsm_checkpoint() sets *pnKB to zero and returns immediately. Otherwise, it checkpoints the database and sets *pnKB to the number of KB of data written to the database file since the previous checkpoint. <p>A database may be queried for the number of KB written to the database since the most recent checkpoint using the <a href=lsmapi.wiki#lsm_info>lsm_info()</a> API function. As follows: <verbatim> int nCkpt; rc = lsm_info(db, LSM_INFO_CHECKPOINT_SIZE, &nCkpt); </verbatim> <p>It may also be queried for the size of the in-memory tree or trees. The following block of code sets variable nLive to the size of the current live in-memory tree in KB, and nOld to the size of the old in-memory tree in KB (or 0 if there is no old in-memory tree). <verbatim> int nOld, nLive; rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive); </verbatim> <h3 id=compulsary_work_and_checkpoints>6.3.3. Compulsary Work and Checkpoints</h3> <p>There are three scenarios where database work or checkpointing may be performed automatically, regardless of the value of the LSM_CONFIG_AUTOWORK parameter. <ul> <li> When closing a database connection, and <li> When the number of segments with a common age in the database file grows unacceptably high. <li> When the total number of segments in the database file grows unacceptably high. </ul> <p>Whenever an lsm_close() call would mean that the total number of connections to a database drops to zero, the connection checks if the in-memory tree is empty. If not, it is flushed to disk. Both the live and old in-memory trees are flushed to disk in this case. It also checks if the database file has been modified since the most recent checkpoint was performed. If so, it also performs a checkpoint. Finally, assuming no error has occurred, it deletes the log file. <p>Additionally, whenever a worker wishes to flush an in-memory tree to a new age=1 segment, it must first ensure that there are less than (N+1) existing age=1 segments, where N is the value that the LSM_CONFIG_AUTOMERGE parameter is set to. If there are already (N+1) or more age=1 segments, they must be merged into an age=2 segment before a new age=1 segment can be created within the database file. Similar rules apply to segments of other ages - it is not possible to create a new age=I segment if there are already (N+1) segments with age=I in the database file. This has two implications: <p>This scenario should never come about if all connections that write to the database have auto-work enabled. It only occurs if auto-work is disabled and the lsm_work() function is called too infrequently. In this case it is possible that flushing an in-memory tree may require writing a tremendous amount of data to disk (possibly even rewriting the entire database file). <p>Finally, regardless of age, a database is limited to a maximum of 64 segments in total. If an attempt is made to flush an in-memory tree to disk when the database already contains 64 segments, two or more existing segments must be merged together before the new segment can be created. <h2 id=database_file_optimization>6.4. Database File Optimization</h2> <p>Database optimization transforms the contents of database file so that the following are true: <ul> <li> <p>All database content is stored in a single <a href=#architectural_overview>segment</a>. This makes the database effectively equivalent to an optimally packed b-tree stucture for search operations - minimizing the number of disk sectors that need to be visted when searching the database. <li> <p>The database file contains no (or as little as possible) free space. In other words, it is no larger than required to contain the single segment. </ul> <p>In order to optimize the database, lsm_work() should be called with the nMerge argument set to 1 and the third parameter set to a negative value (interpreted as - keep working until there is no more work to do). For example: <verbatim> rc = lsm_work(db, 1, -1, 0); </verbatim> <p>When optimizing the database as above, either the LSM_CONFIG_AUTOCHECKPOINT parameter should be set to a non-zero value or lsm_checkpoint() should be called periodically. Otherwise, no checkpoints will be performed, preventing the library from reusing any space occupied by old segments even after their content has been merged into the new segment. The result - a database file that is optimized, except that it is up to twice as large as it otherwise would be. |