Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add a documentation page that overviews Lemon, its history, and its importance to SQLite. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
106ae9b8dfcb58082e3fa402a7dd12a1 |
User & Date: | drh 2018-01-04 16:33:54.898 |
Context
2018-01-04
| ||
16:37 | Fix typo in the new lemon document. (check-in: ca3748636f user: drh tags: trunk) | |
16:33 | Add a documentation page that overviews Lemon, its history, and its importance to SQLite. (check-in: 106ae9b8df user: drh tags: trunk) | |
03:33 | Update the change log for the 3.22.0 release. (check-in: a897222d15 user: drh tags: trunk) | |
Changes
Changes to pages/amalgamation.in.
︙ | ︙ | |||
31 32 33 34 35 36 37 | in the [https://www.sqlite.org/src | SQLite version control system] and are edited manually in an ordinary text editor. But some of the C-language files are generated using scripts or auxiliary programs. For example, the [https://www.sqlite.org/src/artifact?ci=trunk&filename=src/parse.y|parse.y] file contains an LALR(1) grammar of the SQL language which is compiled down into are parser in files "parse.c" and "parse.h" by the | | | 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | in the [https://www.sqlite.org/src | SQLite version control system] and are edited manually in an ordinary text editor. But some of the C-language files are generated using scripts or auxiliary programs. For example, the [https://www.sqlite.org/src/artifact?ci=trunk&filename=src/parse.y|parse.y] file contains an LALR(1) grammar of the SQL language which is compiled down into are parser in files "parse.c" and "parse.h" by the [Lemon parser generator]. </p> <p>The makefiles for SQLite have an "sqlite3.c" target for building the file we call "the amalgamation". The amalgamation is a single C code file, named "sqlite3.c", that contains all C code for the core SQLite library and the [FTS3], [FTS5], [RTREE], |
︙ | ︙ |
Changes to pages/arch.in.
︙ | ︙ | |||
75 76 77 78 79 80 81 | the tokenizer call the parser is better, though, because it can be made threadsafe and it runs faster.</p> <h1>Parser</h1> <p>The parser assigns meaning to tokens based on their context. The parser for SQLite is generated using the | < | | 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | the tokenizer call the parser is better, though, because it can be made threadsafe and it runs faster.</p> <h1>Parser</h1> <p>The parser assigns meaning to tokens based on their context. The parser for SQLite is generated using the [Lemon parser generator]. Lemon does the same job as YACC/BISON, but it uses a different input syntax which is less error-prone. Lemon also generates a parser which is reentrant and thread-safe. And Lemon defines the concept of a non-terminal destructor so that it does not leak memory when syntax errors are encountered. The grammar file that drives Lemon and that defines the SQL language that SQLite understands is found in <file>parse.y</file>. |
︙ | ︙ |
Changes to pages/changes.in.
︙ | ︙ | |||
18 19 20 21 22 23 24 | global nChng aChng xrefChng set aChng($nChng) [list $date $desc $options] set xrefChng($date) $nChng incr nChng } chng {2018-02-00 (3.22.0)} { | | | 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | global nChng aChng xrefChng set aChng($nChng) [list $date $desc $options] set xrefChng($date) $nChng incr nChng } chng {2018-02-00 (3.22.0)} { <li> The output of [sqlite3_trace_v2()] now shows each individual SQL statements run within a trigger. <li> Add the ability to read from [WAL mode] databases even if the application lacks write permission on the database and its containing directory, as long as the -shm and -wal files exist in that directory. <li> Added the [rtreecheck()] scalar SQL function to the [R-Tree extension]. <li> Query planner enhancements: <ol type='a'> |
︙ | ︙ | |||
43 44 45 46 47 48 49 | <li> Omit unused LEFT JOINs even if they are not the right-most joins of a query. </ol> <li> Other performance optimizations: <ol type='a'> <li> A smaller and faster implementation of text to floating-point conversion subroutine: sqlite3AtoF(). | | | 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | <li> Omit unused LEFT JOINs even if they are not the right-most joins of a query. </ol> <li> Other performance optimizations: <ol type='a'> <li> A smaller and faster implementation of text to floating-point conversion subroutine: sqlite3AtoF(). <li> The [Lemon parser generator] creates a faster parser. <li> Use the strcspn() C-library routine to speed up the LIKE and GLOB operators. str</ol> <li> Improvements to the [command-line shell]: <ol type='a'> <li> The ".schema" command shows the structure of virtual tables inside of a comment. |
︙ | ︙ | |||
443 444 445 446 447 448 449 | extension. <li>In the [command-line shell], enhance the ".mode" command so that it restores the default column and row separators for modes "line", "list", "column", and "tcl". <li>Enhance the [SQLITE_DIRECT_OVERFLOW_READ] option so that it works in [WAL mode] as long as the pages being read are not in the WAL file. <li>Enhance the | | | 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 | extension. <li>In the [command-line shell], enhance the ".mode" command so that it restores the default column and row separators for modes "line", "list", "column", and "tcl". <li>Enhance the [SQLITE_DIRECT_OVERFLOW_READ] option so that it works in [WAL mode] as long as the pages being read are not in the WAL file. <li>Enhance the [Lemon parser generator] so that it can store the parser object as a stack variable rather than allocating space from the heap and make use of that enhancement in the [amalgamation]. <li>Other performance improvements. Uses about [CPU cycles used|6.5% fewer CPU cycles]. <p><b>Bug Fixes:</b> <li>Throw an error if the ON clause of a LEFT JOIN references tables to the right of the ON clause. This is the same behavior as |
︙ | ︙ | |||
659 660 661 662 663 664 665 | <li>Added the [SQLITE_DBSTATUS_CACHE_USED_SHARED] option to [sqlite3_db_status()]. <li>Add the [https://www.sqlite.org/src/artifact?ci=trunk&filename=ext/misc/vfsstat.c|vfsstat.c] loadable extension - a VFS shim that measures I/O together with an [eponymous virtual table] that provides access to the measurements. <li>Improved algorithm for running queries with both an ORDER BY and a LIMIT where only the inner-most loop naturally generates rows in the correct order. | | | 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 | <li>Added the [SQLITE_DBSTATUS_CACHE_USED_SHARED] option to [sqlite3_db_status()]. <li>Add the [https://www.sqlite.org/src/artifact?ci=trunk&filename=ext/misc/vfsstat.c|vfsstat.c] loadable extension - a VFS shim that measures I/O together with an [eponymous virtual table] that provides access to the measurements. <li>Improved algorithm for running queries with both an ORDER BY and a LIMIT where only the inner-most loop naturally generates rows in the correct order. <li>Enhancements to [Lemon parser generator], so that it generates a faster parser. <li>The [PRAGMA compile_options] command now attempts to show the version number of the compiler that generated the library. <li>Enhance [PRAGMA table_info] so that it provides information about [eponymous virtual tables]. <li>Added the "win32-none" VFS, analogous to the "unix-none" VFS, that works like the default "win32" VFS except that it ignores all file locks. |
︙ | ︙ | |||
782 783 784 785 786 787 788 | <p><b>Potentially Disruptive Change:</b> <li>The [SQLITE_DEFAULT_PAGE_SIZE] is increased from 1024 to 4096. The [SQLITE_DEFAULT_CACHE_SIZE] is changed from 2000 to -2000 so the same amount of cache memory is used by default. See the application note on the [version 3.12.0 page size change] for further information. <p><b>Performance enhancements:</b> | | | | 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 | <p><b>Potentially Disruptive Change:</b> <li>The [SQLITE_DEFAULT_PAGE_SIZE] is increased from 1024 to 4096. The [SQLITE_DEFAULT_CACHE_SIZE] is changed from 2000 to -2000 so the same amount of cache memory is used by default. See the application note on the [version 3.12.0 page size change] for further information. <p><b>Performance enhancements:</b> <li>Enhancements to the [Lemon parser generator] so that it creates a smaller and faster SQL parser. <li>Only create [master journal] files if two or more attached databases are all modified, do not have [PRAGMA synchronous] set to OFF, and do not have the [journal_mode] set to OFF, MEMORY, or WAL. <li>Only create [statement journal] files when their size exceeds a threshold. Otherwise the journal is held in memory and no I/O occurs. The threshold can be configured at compile-time using [SQLITE_STMTJRNL_SPILL] or at start-time using [sqlite3_config]([SQLITE_CONFIG_STMTJRNL_SPILL]). |
︙ | ︙ | |||
1598 1599 1600 1601 1602 1603 1604 | the filename argument to [ATTACH]. <li>Allow a [VALUES clause] to be used anywhere a [SELECT] statement is valid. <li>Reseed the PRNG used by [sqlite3_randomness(N,P)] when invoked with N==0. Automatically reseed after a fork() on unix. <li>Enhance the [spellfix1] virtual table so that it can search efficiently by rowid. <li>Performance enhancements. <li>Improvements to the comments in the VDBE byte-code display when running [EXPLAIN]. | | | | 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 | the filename argument to [ATTACH]. <li>Allow a [VALUES clause] to be used anywhere a [SELECT] statement is valid. <li>Reseed the PRNG used by [sqlite3_randomness(N,P)] when invoked with N==0. Automatically reseed after a fork() on unix. <li>Enhance the [spellfix1] virtual table so that it can search efficiently by rowid. <li>Performance enhancements. <li>Improvements to the comments in the VDBE byte-code display when running [EXPLAIN]. <li>Add the "%token_class" directive to [Lemon parser generator] and use it to simplify the grammar. <li>Change the [Lemon] source code to avoid calling C-library functions that OpenBSD considers dangerous. (Ex: sprintf). <li>Bug fix: In the [command-line shell] CSV import feature, do not end a field when an escaped double-quote occurs at the end of a CRLN line. <li>SQLITE_SOURCE_ID: "2014-02-03 13:52:03 e816dd924619db5f766de6df74ea2194f3e3b538" <li>SHA1 for sqlite3.c: 98a07da78f71b0275e8d9c510486877adc31dbee } |
︙ | ︙ | |||
2512 2513 2514 2515 2516 2517 2518 | [SQLITE_CONFIG_LOG] verb to [sqlite3_config()]. The ".log" command is added to the [Command Line Interface]. <li> Improvements to [FTS3]. <li> Improvements and bug-fixes in support for [SQLITE_OMIT_FLOATING_POINT]. <li> The [integrity_check pragma] is enhanced to detect out-of-order rowids. <li> The ".genfkey" operator has been removed from the [Command Line Interface]. | | | 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 | [SQLITE_CONFIG_LOG] verb to [sqlite3_config()]. The ".log" command is added to the [Command Line Interface]. <li> Improvements to [FTS3]. <li> Improvements and bug-fixes in support for [SQLITE_OMIT_FLOATING_POINT]. <li> The [integrity_check pragma] is enhanced to detect out-of-order rowids. <li> The ".genfkey" operator has been removed from the [Command Line Interface]. <li> Updates to the co-hosted [Lemon LALR(1) parser generator]. (These updates did not affect SQLite.) <li> Various minor bug fixes and performance enhancements. } chng {2010-01-06 (3.6.22)} { <li>Fix bugs that can (rarely) lead to incorrect query results when the CAST or OR operators are used in the WHERE clause of a query. |
︙ | ︙ | |||
3827 3828 3829 3830 3831 3832 3833 | } chng {2003-12-04 (2.8.7)} { <li>Added experimental sqlite_bind() and sqlite_reset() APIs.</li> <li>If the name of the database is an empty string, open a new database in a temporary file that is automatically deleted when the database is closed.</li> | | | 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 | } chng {2003-12-04 (2.8.7)} { <li>Added experimental sqlite_bind() and sqlite_reset() APIs.</li> <li>If the name of the database is an empty string, open a new database in a temporary file that is automatically deleted when the database is closed.</li> <li>Performance enhancements in the [Lemon]-generated parser</li> <li>Experimental date/time functions revised.</li> <li>Disallow temporary indices on permanent tables.</li> <li>Documentation updates and typo fixes</li> <li>Added experimental sqlite_progress_handler() callback API</li> <li>Removed support for the Oracle8 outer join syntax.</li> <li>Allow GLOB and LIKE operators to work as functions.</li> <li>Other minor documentation and makefile changes and bug fixes.</li> |
︙ | ︙ | |||
4180 4181 4182 4183 4184 4185 4186 | <li>Change the name of the sanity_check PRAGMA to <b>integrity_check</b> and make it available in all compiles.</li> <li>SELECT min() or max() of an indexed column with no WHERE or GROUP BY clause is handled as a special case which avoids a complete table scan.</li> <li>Automatically generated ROWIDs are now sequential.</li> <li>Do not allow dot-commands of the command-line shell to occur in the middle of a real SQL command.</li> | | | 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 | <li>Change the name of the sanity_check PRAGMA to <b>integrity_check</b> and make it available in all compiles.</li> <li>SELECT min() or max() of an indexed column with no WHERE or GROUP BY clause is handled as a special case which avoids a complete table scan.</li> <li>Automatically generated ROWIDs are now sequential.</li> <li>Do not allow dot-commands of the command-line shell to occur in the middle of a real SQL command.</li> <li>Modifications to the [Lemon parser generator] so that the parser tables are 4 times smaller.</li> <li>Added support for user-defined functions implemented in C.</li> <li>Added support for new functions: <b>coalesce()</b>, <b>lower()</b>, <b>upper()</b>, and <b>random()</b> <li>Added support for VIEWs.</li> <li>Added the subquery flattening optimizer.</li> <li>Modified the B-Tree and Pager modules so that disk pages that do not |
︙ | ︙ | |||
4507 4508 4509 4510 4511 4512 4513 | <li>Added limited support for transactions. At this point, transactions will do table locking on the GDBM backend. There is no support (yet) for rollback or atomic commit.</li> <li>Added special column names ROWID, OID, and _ROWID_ that refer to the unique random integer key associated with every row of every table.</li> <li>Additional tests added to the regression suite to cover the new ROWID feature and the TCL interface bugs mentioned below.</li> | | | 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 | <li>Added limited support for transactions. At this point, transactions will do table locking on the GDBM backend. There is no support (yet) for rollback or atomic commit.</li> <li>Added special column names ROWID, OID, and _ROWID_ that refer to the unique random integer key associated with every row of every table.</li> <li>Additional tests added to the regression suite to cover the new ROWID feature and the TCL interface bugs mentioned below.</li> <li>Changes to the [Lemon parser generator] to help it work better when compiled using MSVC.</li> <li>Bug fixes in the TCL interface identified by Oleg Oleinick.</li> } chng {2001-03-20 (1.0.27)} { <li>When doing DELETE and UPDATE, the library used to write the record numbers of records to be deleted or updated into a temporary file. |
︙ | ︙ |
Changes to pages/compile.in.
︙ | ︙ | |||
1083 1084 1085 1086 1087 1088 1089 | } COMPILE_OPTION {SQLITE_ENABLE_UPDATE_DELETE_LIMIT} { This option enables an optional ORDER BY and LIMIT clause on [UPDATE] and [DELETE] statements. <p>If this option is defined, then it must also be | | | 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 | } COMPILE_OPTION {SQLITE_ENABLE_UPDATE_DELETE_LIMIT} { This option enables an optional ORDER BY and LIMIT clause on [UPDATE] and [DELETE] statements. <p>If this option is defined, then it must also be defined when using the [Lemon parser generator] tool to generate a parse.c file. Because of this, this option may only be used when the library is built from source, not from the [amalgamation] or from the collection of pre-packaged C files provided for non-Unix like platforms on the website. </p> } COMPILE_OPTION {SQLITE_ENABLE_UNKNOWN_SQL_FUNCTION} { |
︙ | ︙ | |||
1202 1203 1204 1205 1206 1207 1208 | compilation switches all have the same effect:<br> -DSQLITE_OMIT_ALTERTABLE<br> -DSQLITE_OMIT_ALTERTABLE=1<br> -DSQLITE_OMIT_ALTERTABLE=0 </p> <p>If any of these options are defined, then the same set of SQLITE_OMIT_* | | > | 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 | compilation switches all have the same effect:<br> -DSQLITE_OMIT_ALTERTABLE<br> -DSQLITE_OMIT_ALTERTABLE=1<br> -DSQLITE_OMIT_ALTERTABLE=0 </p> <p>If any of these options are defined, then the same set of SQLITE_OMIT_* options must also be defined when using the [Lemon parser generator] tool to generate the parse.c file and when compiling the 'mkkeywordhash' tool which generates the keywordhash.h file. Because of this, these options may only be used when the library is built from canonical source, not from the [amalgamation]. Some SQLITE_OMIT_* options might work, or appear to work, when used with the [amalgamation]. But this is not guaranteed. In general, always compile from canonical sources in order to take advantage of SQLITE_OMIT_* options. |
︙ | ︙ |
Added pages/lemon.in.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | <title>The Lemon LALR(1) Parser Generator</title> <tcl>hd_keywords {Lemon parser generator} {Lemon} \ {Lemon LALR(1) parser generator}</tcl> <table_of_contents> <h1>Overview</h1> <p>The SQL language parser for SQLite is generated using a code-generator program called "Lemon". The Lemon program reads a grammar of the input language and emits C-code to implement a parser for that langauge. <h2>Lemon Source Files And Documentation</h2> <p>Lemon does not have its own source repository. Rather, Lemon consists of a few files in the SQLite source tree: <ul> <li><p> [https://sqlite.org/src/doc/trunk/doc/lemon.html|lemon.html] → The original detailed usage documentation and programmers reference for Lemon. <li><p> [https://sqlite.org/src/file/tool/lemon.c|lemon.c] → The source code for the utility program that reads a grammar file and generates corresponding parser C-code. <li><p> [https://sqlite.org/src/file/tool/lempar.c|lempar.c] → A template for the generated parser C-code. The "lemon" utility program reads this template and inserts additional code in order to generate a parser. </ul> <h1>Advantages of Lemon</h1> <p>Lemon generates an LALR(1) parser. It's operation is similar to the more familiar tools [https://en.wikipedia.org/wiki/Yacc|Yacc] and [https://en.wikipedia.org/wiki/GNU_bison|Bison], but Lemon include important improvements, including: <ul> <li><p> The grammar syntax is less error prone - using symbol names for semantic values rather that the "$1"-style positional notation of Yacc. <li><p> In Lemon, the tokenizer calls the parser. Yacc operates the other way around, with the parser calling the tokenizer. The Lemon approach is reentrant and threadsafe, whereas Yacc uses global variables and is therefore neither. Reentrancy is especially important for SQLite since some SQL statements make recursive calls to the parser. For example, when parsing a CREATE TABLE statement, SQLite invokes the parser recursively to generate an INSERT statement to make a new entry in the [sqlite_master] table. <li><p> Lemon has the concept of a non-terminal destructor that can be used to reclaim memory or other resources following an syntax error or other aborted parse. </ul> <h2>Use of Lemon Within SQLite</h2> <p>Lemon is used in two places in SQLite. <p>The primary use of Lemon is to create the SQL language parser. A grammar file ([https://sqlite.org/src/file/src/parse.y|parse.y]) is compiled by Lemon into parse.c and parse.h. The parse.c file is incorporated into the [amalgamation] without further modification. The parse.h file is post-processed by the [https://sqlite.org/src/file/tool/addopcodes.tcl|addopcodes.tcl] script before being incorporated into the [amalgamation]. <p>Lemon is also used to generate parse for the query pattern expressions in the [FTS5] extension. In this case, the input grammar file is [https://sqlite.org/src/file/ext/fts5/fts5parse.y|fts5parse.y]. <h2>Lemon Customizations Especially For SQLite</h2> <p>One of the advantages of hosting code generator tool as part of the project is that the tools can be optimized to serve specific needs of the overall project. Lemon has benefited from this effect. Over the years, the Lemon parser generator has been extended and enhanced to provide new capabilities and improved performance to SQLite. A few of the specific enhancements to Lemon that are specifically designed for use by SQLite include: <ul> <li><p> Lemon has the concept of a "fallback" tokens. The SQL language contains a large number of keywords and these keywords have the potential to collide with identifier names. Lemon has the ability to designate some keywords has being able to "fallback" to an indentifier. If the keyword appears in the input token stream in a context that would otherwise be a syntax error, the token is automatically transformed into its fallback before the syntax error is raised. This feature allows the parser to be very forgiving of reserved words used as identifiers, which is a problem that comes up frequently in the SQL language. <li><p> In support of the [MC/DC|100% MC/DC testing] goal for SQLite, the parser code generated by Lemon has no unreachable branches, and contains extra (compile-time selected) instrumentation useful for measuring test coverage. <li><p> Lemon supports conditional compilation of grammar file rules, so that a different parser can be generated depending on compile-time options. <li><p> As a performance optimization, reduce actions in the Lemon input grammar are allowed to contain comments of the form "/*A-overwrites-Z*/" to indicate that the semantic value "A" on the right-hand side of the rule is allowed to directly overwrite the semantic value "Z" on the left-hand side. This simple optimization reduces the number of stack operations in the push-down automaton used to parse the input grammar, and thus improve performance of the parser. It also makes the generated code a little smaller. </ul> <p>The parsing of SQL statements is a significant consumer of CPU cycles in any SQL database engine. On-going efforts to optimize SQLite have caused the developers to spend a lot of time tweaking Lemon to generate faster parsers. These efforts have benefited all users of the Lemon parser generator, not just SQLite. But if Lemon had been a separately maintained tool, it would have been more difficulty to make coordinated changes to both SQLite and Lemon, and as a result not as much optimization would have been accomplished. Hence, the fact that the parser generator tool is included in the source tree for SQLite has turned out to be a net benefit for both the tool itself and for SQLite. <h1>History Of Lemon</h1> <p>Lemon was original written by D. Richard Hipp (also the creator of SQLite) while he was in graduate school at Duke University between 1987 and 1992. The original creation date of Lemon has been lost, but was probably sometime around 1990. Lemon generates an LALR(1) parser. There was companion LL(1) parser generator tool named "Lime", but the source code for Lime has been lost. <p>The Lemon source code was originally written as separate source files, and only later merged into a single "lemon.c" source file. <p>The author of Lemon and SQLite (Hipp) reports that his C programming skills were greatly enhanced by studing John Ousterhout's original source code to Tcl. Hipp discovered and studied Tcl in 1993. Lemon was written before then, and SQLite afterwards. There is a clear difference in the coding styles of these two products, with SQLite seeming to be cleaner, more readable, and easier to maintain. |