SQLite4
Check-in [1ea9187820]
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
SHA1 Hash:1ea91878201f0998b35a6843da7c9e4c51bceb3d
Date: 2012-11-14 20:09:24
User: dan
Comment:Updates to lsmusr.wiki.
Tags And Properties
Changes
hide diffs unified diffs patch

Changes to www/lsmusr.wiki

2 <title>LSM Users Guide</title> 2 <title>LSM Users Guide</title> 3 <nowiki> 3 <nowiki> 4 4 5 <h2>Table of Contents</h2> 5 <h2>Table of Contents</h2> 6 6 7 7 8 8 > 9 9 10 10 <div id=start_of_toc></div> 11 <div id=start_of_toc></div> 11 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#introduction_to_lsm style=text-deco 12 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#introduction_to_lsm style=text-deco 12 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#using_lsm_in_applications style=tex 13 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#using_lsm_in_applications style=tex 13 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#basic_usage style=text-decoration:n 14 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#basic_usage style=text-decoration:n 14 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= 15 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= 15 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= 16 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= 16 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= 17 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= 17 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= 18 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= 18 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#data_durability style=text-decorati 19 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#data_durability style=text-decorati 19 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#compressed_and_encrypted_databases 20 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#compressed_and_encrypted_databases 20 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#performance_tuning style=text-decor 21 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#performance_tuning style=text-decor 21 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= | 22 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= 22 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= | 23 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= > 24 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n 23 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n | 25 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n 24 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n | 26 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n 25 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n | 27 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n 26 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= | 28 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= 27 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href= < 28 29 29 <div id=end_of_toc></div> 30 <div id=end_of_toc></div> 30 31 31 <h2>Overview</h2> 32 <h2>Overview</h2> 32 33 33 <p>This document describes the LSM embedded database library and use thereof. 34 <p>This document describes the LSM embedded database library and use thereof. 34 It is intended to be part user-manual and part tutorial. It is intended to 35 It is intended to be part user-manual and part tutorial. It is intended to 35 to complement the <a href=lsmapi.wiki>LSM API reference manual</a>. | 36 complement the <a href=lsmapi.wiki>LSM API reference manual</a>. 36 37 37 <p>The <a href=#introduction_to_lsm>first section</a> of this document contains 38 <p>The <a href=#introduction_to_lsm>first section</a> of this document contains 38 a description of the LSM library and its features. 39 a description of the LSM library and its features. 39 <a href=#using_lsm_in_applications>Section 2</a> describes how to use LSM from 40 <a href=#using_lsm_in_applications>Section 2</a> describes how to use LSM from 40 within a C or C++ application (how to compile and link LSM, what to #include 41 within a C or C++ application (how to compile and link LSM, what to #include 41 etc.). The <a href=#basic_usage>third section</a> describes the essential APIs 42 etc.). The <a href=#basic_usage>third section</a> describes the essential APIs 42 that applications use to open and close database connections, and to read from 43 that applications use to open and close database connections, and to read from ................................................................................................................................................................................ 766 <p><i>Maybe there should be a way to register a mismatch-handler callback. 767 <p><i>Maybe there should be a way to register a mismatch-handler callback. 767 Otherwise, applications have to handle LSM_MISMATCH everywhere... 768 Otherwise, applications have to handle LSM_MISMATCH everywhere... 768 </i> 769 </i> 769 770 770 771 771 <h1 id=performance_tuning>6. Performance Tuning</h1> 772 <h1 id=performance_tuning>6. Performance Tuning</h1> 772 773 > 774 <p> This section describes the various measures that can be taken in order to > 775 fine-tune LSM in order to improve performance in specific circumstances. > 776 Sub-section 6.1 identifies the > 777 <a href=#performance_related_configuration_options> configuration > 778 parameters</a> that can be used to influence database performance. > 779 Sub-section 6.2 discusses methods for shifting the time-consuming processes of > 780 actually writing and syncing the database file to > 781 <a href=#using_worker_threads_or_processes>background threads or processes</a> > 782 in order to make writing to the database more responsive. Finally, 6. > 783 3 introduces "<a href=#database_file_optimization>database optimization</a>" > 784 - the process of reorganizing a database file internally so that it is as small > 785 as possible and optimized for search queries. > 786 > 787 <h2 id=performance_related_configuration_options>6.1. Performance Related Config > 788 > 789 <p>The options in this section all take integer values. They may be both > 790 set and queried using the <a href=lsmapi.wiki#lsm_config>lsm_config()</a> > 791 function. To set an option to a value, lsm_config() is used as follows: > 792 > 793 <verbatim> > 794 /* Set the LSM_CONFIG_AUTOFLUSH option to 1MB */ > 795 int iVal = 1 * 1024 * 1024; > 796 rc = lsm_config(db, LSM_CONFIG_AUTOFLUSH, &iVal); > 797 </verbatim> > 798 > 799 <p>In order to query the current value of an option, the initial value of > 800 the parameter (iVal in the example code above) should be set to a negative > 801 value. Or any other value that happens to be out of range for the parameter - > 802 negative values just happen to be out of range for all integer lsm_config() > 803 parameters. > 804 > 805 <verbatim> > 806 /* Set iVal to the current value of LSM_CONFIG_AUTOFLUSH */ > 807 int iVal = -1; > 808 rc = lsm_config(db, LSM_CONFIG_AUTOFLUSH, &iVal); > 809 </verbatim> > 810 > 811 <dl> > 812 <dt> <a href=lsmapi.wiki#LSM_CONFIG_MMAP>LSM_CONFIG_MMAP</a> > 813 <dd> <p style=margin-top:0> > 814 This option may be set to either 1 (true) or 0 (false). If it is set to > 815 true and LSM is running on a system with a 64-bit address space, the > 816 entire database file is memory mapped. Or, if it is false or LSM is > 817 running in a 32-bit address space, data is accessed using ordinary > 818 OS file read and write primitives. Memory mapping the database file > 819 can significantly improve the performance of read operations, as database > 820 pages do not have to be copied from operating system buffers into user > 821 space buffers before they can be examined. > 822 > 823 <p>This option can only be set before lsm_open() is called on the database > 824 connection. > 825 > 826 <p>The default value is 1 (true). > 827 > 828 <dt> <a href=lsmapi.wiki#LSM_CONFIG_MULTIPLE_PROCESSES>LSM_CONFIG_MULTIPLE_PRO > 829 <dd> <p style=margin-top:0> > 830 This option may also be set to either 1 (true) or 0 (false). If it is > 831 set to 0, then the library assumes that all database clients are located > 832 within the same process (have access to the same memory space). Assuming > 833 this means the library can avoid using OS file locking primitives to lock > 834 the database file, which speeds up opening and closing read and write > 835 transactions. > 836 > 837 <p>This option can only be set before lsm_open() is called on the database > 838 connection. > 839 > 840 <p>The default value is 1 (true). > 841 > 842 <dt> <a href=lsmapi.wiki#LSM_CONFIG_USE_LOG>LSM_CONFIG_USE_LOG</a> > 843 <dd> <p style=margin-top:0> > 844 This is another option may also be set to either 1 (true) or 0 (false). > 845 If it is set to false, then the library does not write data into the > 846 database log file. This makes writing faster, but also means that if > 847 an application crash or power failure occurs, it is very likely that > 848 any recently committed transactions will be lost. > 849 > 850 <p>If this option is set to true, then an application crash cannot cause > 851 data loss. Whether or not data loss may occur in the event of a power > 852 failure depends on the value of the <a href=#data_durability> > 853 LSM_CONFIG_SAFETY</a> parameter. > 854 > 855 <p>This option can only be set if the connection does not currently have > 856 an open write transaction. > 857 > 858 <p>The default value is 1 (true). > 859 > 860 <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOFLUSH>LSM_CONFIG_AUTOFLUSH</a> > 861 <dd> <p style=margin-top:0> > 862 > 863 <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOCHECKPOINT>LSM_CONFIG_AUTOCHECKPOINT</ > 864 <dd> <p style=margin-top:0> > 865 > 866 </dl> > 867 > 868 <h2 id=using_worker_threads_or_processes>6.2. Using Worker Threads or Processes > 869 > 870 <p><i>Todo: Fix the following </p> > 871 > 872 <p>The section above describes the three stages of transfering data written > 873 to the database from the application to persistent storage. A "writer" > 874 client writes the data into the in-memory tree and log file. Later on a > 875 "worker" client flushes the data from the in-memory tree to a new segment > 876 in the the database file. Additionally, a worker client must periodically > 877 merge existing database segments together to prevent them from growing too > 878 numerous. > 879 773 <h2 id=architectural_overview>6.1. Architectural Overview </h2> | 880 <h3 id=architectural_overview>6.2.1. Architectural Overview </h3> 774 881 775 <p> The LSM library implements two separate data structures that are used 882 <p> The LSM library implements two separate data structures that are used 776 together to store user data. When the database is queried, the library 883 together to store user data. When the database is queried, the library 777 actually runs parallel queries on both of these data stores and merges the 884 actually runs parallel queries on both of these data stores and merges the 778 results together to return to the user. The data structures are: 885 results together to return to the user. The data structures are: 779 886 780 <ul> 887 <ul> ................................................................................................................................................................................ 903 database file header (to checkpoint the database). 1010 database file header (to checkpoint the database). 904 </table> 1011 </table> 905 1012 906 <p>The tasks associated with each of the locks above may be performed 1013 <p>The tasks associated with each of the locks above may be performed 907 concurrently by multiple database connections, located either in the same 1014 concurrently by multiple database connections, located either in the same 908 application process or different processes. 1015 application process or different processes. 909 1016 910 <h2 id=work_and_checkpoint_scheduling>6.2. Work and Checkpoint Scheduling </h2> < 911 < 912 <p>The section above describes the three stages of transfering data written < 913 to the database from the application to persistent storage. A "writer" < 914 client writes the data into the in-memory tree and log file. Later on a < 915 "worker" client flushes the data from the in-memory tree to a new segment < 916 in the the database file. Additionally, a worker client must periodically < 917 merge existing database segments together to prevent them from growing too < 918 numerous. < 919 < 920 <h3 id=automatic_work_and_checkpoint_scheduling>6.2.1. Automatic Work and Checkp | 1017 <h3 id=automatic_work_and_checkpoint_scheduling>6.2.2. Automatic Work and Checkp 921 1018 922 <p>By default, database "work" (the flushing and merging of segments, performed 1019 <p>By default, database "work" (the flushing and merging of segments, performed 923 by clients holding the WORKER lock) and checkpointing are scheduled and 1020 by clients holding the WORKER lock) and checkpointing are scheduled and 924 performed automatically from within calls to "write" API functions. The 1021 performed automatically from within calls to "write" API functions. The 925 "write" functions are: 1022 "write" functions are: 926 1023 927 <ul> 1024 <ul> ................................................................................................................................................................................ 1014 than zero, after performing database work, the library automatically checks 1111 than zero, after performing database work, the library automatically checks 1015 how many bytes of raw data have been written to the database file since the 1112 how many bytes of raw data have been written to the database file since the 1016 last checkpoint (by any client, not just by the current client). If this 1113 last checkpoint (by any client, not just by the current client). If this 1017 value is greater than the value of the LSM_CONFIG_AUTOCHECKPOINT parameter, 1114 value is greater than the value of the LSM_CONFIG_AUTOCHECKPOINT parameter, 1018 a checkpoint is attempted. It is not an error if the attempt fails because the 1115 a checkpoint is attempted. It is not an error if the attempt fails because the 1019 CHECKPOINTER lock cannot be obtained. 1116 CHECKPOINTER lock cannot be obtained. 1020 1117 1021 <h3 id=explicit_work_and_checkpoint_scheduling>6.2.2. Explicit Work and Checkpoi | 1118 <h3 id=explicit_work_and_checkpoint_scheduling>6.2.3. Explicit Work and Checkpoi 1022 1119 1023 <p>The alternative to automatic scheduling of work and checkpoint operations 1120 <p>The alternative to automatic scheduling of work and checkpoint operations 1024 is to explicitly schedule them. Possibly in a background thread or dedicated 1121 is to explicitly schedule them. Possibly in a background thread or dedicated 1025 application process. In order to disable automatic work, a client must set 1122 application process. In order to disable automatic work, a client must set 1026 the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of 1123 the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of 1027 a database connection, not of a database itself, so it must be cleared 1124 a database connection, not of a database itself, so it must be cleared 1028 separately by all processes that may write to the database. Otherwise, they 1125 separately by all processes that may write to the database. Otherwise, they ................................................................................................................................................................................ 1135 rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive); 1232 rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive); 1136 </verbatim> 1233 </verbatim> 1137 1234 1138 <verbatim> 1235 <verbatim> 1139 int lsm_flush(lsm_db *db); 1236 int lsm_flush(lsm_db *db); 1140 </verbatim> 1237 </verbatim> 1141 1238 1142 <h3 id=compulsary_work_and_checkpoint_scheduling>6.2.3. Compulsary Work and Chec | 1239 <h3 id=compulsary_work_and_checkpoint_scheduling>6.2.4. Compulsary Work and Chec 1143 1240 1144 <p>Apart from the scenarios described above, there are two there are two 1241 <p>Apart from the scenarios described above, there are two there are two 1145 scenarios where database work or checkpointing may be performed automatically, 1242 scenarios where database work or checkpointing may be performed automatically, 1146 regardless of the value of the LSM_CONFIG_AUTOWORK parameter. 1243 regardless of the value of the LSM_CONFIG_AUTOWORK parameter. 1147 1244 1148 <ul> 1245 <ul> 1149 <li> When closing a database connection, and 1246 <li> When closing a database connection, and ................................................................................................................................................................................ 1180 </ul> 1277 </ul> 1181 1278 1182 <p>Finally, regardless of age, a database is limited to a maximum of 64 1279 <p>Finally, regardless of age, a database is limited to a maximum of 64 1183 segments in total. If an attempt is made to flush an in-memory tree to disk 1280 segments in total. If an attempt is made to flush an in-memory tree to disk 1184 when the database already contains 64 segments, two or more existing segments 1281 when the database already contains 64 segments, two or more existing segments 1185 must be merged together before the new segment can be created. 1282 must be merged together before the new segment can be created. 1186 1283 1187 <h2 id=database_optimization>6.3. Database Optimization</h2> | 1284 <h2 id=database_file_optimization>6.3. Database File Optimization</h2> 1188 1285 1189 <p>Database optimization transforms the contents of database file so that 1286 <p>Database optimization transforms the contents of database file so that 1190 the following are true: 1287 the following are true: 1191 1288 1192 <ul> 1289 <ul> 1193 <li> All database content is stored in a single segment. | 1290 <li> <p>All database content is stored in a single > 1291 <a href=#architectural_overview>segment</a>. This makes the > 1292 database effectively equivalent to an optimally packed b-tree stucture > 1293 for search operations - minimizing the number of disk sectors that need > 1294 to be visted when searching the database. > 1295 1194 <li> The database file contains no (or as little as possible) free space. | 1296 <li> <p>The database file contains no (or as little as possible) free space. 1195 In other words, it is no larger than required to contain the single 1297 In other words, it is no larger than required to contain the single 1196 segment. 1298 segment. 1197 </ul> 1299 </ul> > 1300 > 1301 <p><i> Should we add a convenience function lsm_optimize() that does not > 1302 return until the database is completely optimized? One that more or less does > 1303 the same as the example code below and deals with the AUTOCHECKPOINT issue? > 1304 This would help with this user manual if nothing else, as it means a method > 1305 for database optimization can be presented without depending on the previous > 1306 section. > 1307 > 1308 </i> 1198 1309 1199 <p>In order to optimize the database, lsm_work() should be called repeatedly 1310 <p>In order to optimize the database, lsm_work() should be called repeatedly 1200 with the nMerge argument set to 1 until it returns without writing any data 1311 with the nMerge argument set to 1 until it returns without writing any data 1201 to the database file. For example: 1312 to the database file. For example: 1202 1313 1203 <verbatim> 1314 <verbatim> 1204 int nWrite; 1315 int nWrite; 1205 int rc; 1316 int rc; 1206 do { 1317 do { 1207 rc = lsm_work(db, 1, 2*1024*1024, &nWrite); 1318 rc = lsm_work(db, 1, 2*1024*1024, &nWrite); 1208 }while( rc==LSM_OK && nWrite>0 ); 1319 }while( rc==LSM_OK && nWrite>0 ); 1209 </verbatim> 1320 </verbatim> 1210 1321 1211 <p>When optimizing the database as above, the LSM_CONFIG_AUTOCHECKPOINT | 1322 <p>When optimizing the database as above, either the LSM_CONFIG_AUTOCHECKPOINT 1212 parameter should be set to a non-zero value, or otherwise lsm_checkpoint() | 1323 parameter should be set to a non-zero value or lsm_checkpoint() should be 1213 should be called periodically. Otherwise, no checkpoints will be performed, | 1324 called periodically. Otherwise, no checkpoints will be performed, preventing 1214 preventing the library from reusing any space occupied by old segments even | 1325 the library from reusing any space occupied by old segments even after their 1215 after their content has been merged into the new segment. The result - a | 1326 content has been merged into the new segment. The result - a database file that 1216 database file that is optimized, except that it is up to twice as large as | 1327 is optimized, except that it is up to twice as large as it otherwise would be. 1217 it otherwise would be. < 1218 1328 1219 <h2 id=other_parameters>6.4. Other Parameters </h2> < 1220 1329 1221 <i> < 1222 <p>Mention other configuration options that can be used to tune performance < 1223 here. < 1224 < 1225 <ul> < 1226 <li> LSM_CONFIG_MMAP < 1227 <li> LSM_CONFIG_MULTIPLE_PROCESSES < 1228 <li> LSM_CONFIG_USE_LOG < 1229 </ul> < 1230 < 1231 </i> < 1232 < 1233 < 1234 < 1235 < 1236 <