SQLite Forum

Timeline
Login

50 most recent forum posts by user drh

2021-12-01
17:51 Edit reply: SQLIte 3.37.0 breaks SQLITE_OPEN_NOMUTEX - TSAN failure (artifact: 7e9a9e3d4b user: drh)

Your forum doesn't appear to support attaching files, so I've saved the attachment on my own bug tracking system (which has wide open public read access).

https://stroika.atlassian.net/browse/STK-753 (*note this valgrind output contains misleading warnings about the stdc++ atomics library - just ignore these).

But in case you have trouble accessing that - I will paste a small subset with the highlights here..

==661746==  Lock at 0x5C7C278 was first observed
==661746==    at 0x4C39F2A: pthread_mutex_init (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==661746==    by 0x9BDC36: pthreadMutexAlloc (sqlite3.c:27382)
==661746==    by 0x9BD5E0: sqlite3MutexAlloc (sqlite3.c:26902)
==661746==    by 0x9F1C75: sqlite3BtreeOpen (sqlite3.c:68144)
==661746==    by 0xAD13E4: openDatabase (sqlite3.c:169926)
==661746==    by 0xAD1745: sqlite3_open_v2 (sqlite3.c:170049)
==661746==    by 0x454A22: Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_::Rep_(Stroika::Foundation::Database::SQL::SQLite::Options const&) (SQLite.cpp:203)


==661746==  Lock at 0x5C4EB88 was first observed
==661746==    at 0x4C39F2A: pthread_mutex_init (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==661746==    by 0x9BDC36: pthreadMutexAlloc (sqlite3.c:27382)
==661746==    by 0x9BD5E0: sqlite3MutexAlloc (sqlite3.c:26902)
==661746==    by 0x9F1C75: sqlite3BtreeOpen (sqlite3.c:68144)
==661746==    by 0xAD13E4: openDatabase (sqlite3.c:169926)
==661746==    by 0xAD1745: sqlite3_open_v2 (sqlite3.c:170049)
==661746==    by 0x454B5C: Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_::Rep_(Stroika::Foundation::Database::SQL::SQLite::Options const&) (SQLite.cpp:209)
==661746==    by 0x460746: void __gnu_cxx::new_allocator<Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_>::construct<Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_, Stroika::Foundation::Database::SQL::SQLite::Options const&>(Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_*, Stroika::Foundation::Database::SQL::SQLite::Options const&) (new_allocator.h:136)
==661746==    by 0x45FB07: void std::allocator_traits<std::allocator<Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_> >::construct<Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_, Stroika::Foundation::Database::SQL::SQLite::Options const&>(std::allocator<Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_>&, Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_*, Stroika::Foundation::Database::SQL::SQLite::Options const&) (alloc_traits.h:475)
==661746==    by 0x45ED3F: std::_Sp_counted_ptr_inplace<Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_, std::allocator<Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<Stroika::Foundation::Database::SQL::SQLite::Options const&>(std::allocator<Stroika::Foundation::Database::SQL::SQLite::Connection::Rep_>, Stroika::Foundation::Database::SQL::SQLite::Options const&) (shared_ptr_base.h:545)


==661746== Possible data race during read of size 4 at 0x149FE78 by thread #3
==661746== Locks held: 1, at address 0x5C7C278
==661746==    at 0xAC9E08: keywordCode (sqlite3.c:165359)
==661746==    by 0xACB253: sqlite3GetToken (sqlite3.c:165813)
==661746==    by 0xACB6B4: sqlite3RunParser (sqlite3.c:165914)
==661746==    by 0xA82D0D: sqlite3Prepare (sqlite3.c:132931)
==661746==    by 0xA83021: sqlite3LockAndPrepare (sqlite3.c:133006)
==661746==    by 0xA833A7: sqlite3_prepare_v2 (sqlite3.c:133091)
==661746==    by 0x455E54: Stroika::Foundation::Database::SQL::SQLite::Statement::MyRep_::MyRep_(Stroika::Foundation::Database::SQL::SQLite::Connection::Ptr const&, Stroika::Foundation::Characters::String const&) (SQLite.cpp:367)

==661746== This conflicts with a previous write of size 4 by thread #2
==661746== Locks held: 1, at address 0x5C4EB88
==661746==    at 0xA26AC8: sqlite3VdbeExec (sqlite3.c:90711)
==661746==    by 0xA1957E: sqlite3Step (sqlite3.c:85145)
==661746==    by 0xA19896: sqlite3_step (sqlite3.c:85202)
==661746==    by 0x457150: Stroika::Foundation::Database::SQL::SQLite::Statement::MyRep_::GetNextRow() (SQLite.cpp:500)

Please let me know if you need more information to fix this. I'm happy to help. Lewis.

2021-11-30
14:46 Reply: Faster way to insert into WITHOUT ROWID table? (artifact: d024d66154 user: drh)

is it worthwhile to add such optimizations for without-rowid tables too?

Such an "optimization" actually slows things down in the common case. If we had a way to determine when the optimization would benefit, then it would be worth adding. But I don't yet have an algorithm to do that.

2021-11-29
18:56 Reply: Proposed new date/time function to return a unix timestamp (artifact: 77f8abc354 user: drh)

The actual implementation on trunk is similar: https://sqlite.org/src/artifact/d0f09f7924a27e0d?ln=968-984

Preliminary documentation of the new enhancements can be seen at https://sqlite.org/draft/lang_datefunc.html.

18:00 Reply: Faster way to insert into WITHOUT ROWID table? (artifact: 73061e9c2f user: drh)

I think you must have made your modifications on an older version of SQLite. The btree "moveto" routines have now been split up into separate procedures for table-btrees and index-btrees (for better performance) and the index-btree version does not have an append-bias flag. See check-in 3b0d34e5e5f9a16c for the details of the change.

15:51 Reply: Proposed new date/time function to return a unix timestamp (artifact: 822eeaf4ed user: drh)

And after some experimentation, I think the new unixepoch() function should return an integer - discarding the milliseconds if any.

So for now, that's what I'm going with. But the decision is not set in stone so if you want to argue for an alternative, please do so.

15:31 Reply: Proposed new date/time function to return a unix timestamp (artifact: e52a02bf85 user: drh)

Another name option:

  • unixepoch()
14:35 Post: Proposed new date/time function to return a unix timestamp (artifact: 4963fc51e3 user: drh)

I'd like to add a new date/time function that converts its argument into a unix timestamp (the number of seconds since 1970-01-01 00:00:00 UTC). So the result would be the same as "strftime('%s',ARGS...)" except that it would be an integer result rather than a string result.

Questions:

  1. What should the new function be called:

    • unixtime()
    • unixtimestamp()
    • secondssince1970()
    • Something else?

  2. Should the value returned always be an integer, or can it have a fractional part? So, for example, should the return value from "unixtime('2021-11-29 14:31:14.25')" be 1638196274 or 1638196274.25?

2021-11-26
12:39 Reply: sqlar and the sqlite3 tcl interface (artifact: 051dc91077 user: drh)

You will need to load the sqlar.c extension and then use it to access or create an SQLAR file. The only documentation is in comments in the source file, or some vague hints at https://www.sqlite.org/cli.html#_sql_used_to_implement_sqlite_archive_operations_

11:26 Reply: Faster way to insert into WITHOUT ROWID table? (artifact: 7609baddcb user: drh)

How are you executing the SQL that does the inserting? Are you preparing a new SQL statement for each row? Or do you prepare a single INSERT statement using parameters, then bind data to the parameters and rerun that one statement for each row?

2021-11-23
17:51 Reply: How to free max heap size on execution of sqlite query using sqlite3_exec? (artifact: a0b578cf9c user: drh)

I hope you get understand the example

I understand from the example that you probably have an SQL Injection Vulnerability that is way more important to fix than your alleged memory leak.

I think you need to fix your SQL injections first. These will be the most important class of bugs in your application. Once you have cleaned up your injections, then and only then should you start looking into memory usage concerns.

12:31 Reply: 1.0.115.5 and SEE License (artifact: 41ec8032df user: drh)

Please use the SEE forum to ask questions about SEE. You can log in using the credentials you were provided when you purchased your license.

12:30 Edit: 1.0.115.5 and SEE License (artifact: c2df132125 user: drh)

We include the System.Data.SQLite.Core 1.0.114 and Stub.System.Data.SQLite.Core.NetFramework 1.0.114 in our software, and it works perfectly fine.

We recently released our product to the field, and so we took the opportunity to update both NuGet packages to 1.0.115.5 (latest) as part of the spin up for the new dev cycle.

We discovered a problem, but I don't know if it's really a problem.

We are getting the following when the SQLite dll is used for the first time:

Exception thrown: 'System.IO.FileNotFoundException' in mscorlib.dll Could not load file or assembly 'System.Data.SQLite.SEE.License, Version=1.0.115.5, Culture=neutral, PublicKeyToken=************' or one of its dependencies. The system cannot find the file specified.

In a scratch project, it looks like this exception occurs, but then is ignored?

What is causing the exception and is there a way around it, perhaps through a configuration setting?

2021-11-21
18:55 Reply: Proposed slightly-incompatible change to date-time functions (artifact: 540b085f21 user: drh)

Current algorithm:

  1. If the timevalue is TEXT → ISO8601 ("YYYY-MM-DD HH:MM:SS.SSS")
  2. If the timevalue is numeric and followed by 'unixepoch' → Seconds since 1970-01-01
  3. If the timevalue is numeric → Days since -4713-11-24 12:00.

Proposed New Algorithm (2nd Edition):

  1. If the timevalue is TEXT → ISO8601 as before
  2. If the timevalue is numeric and followed by 'unixepoch' → Seconds since 1970-01-01
  3. If the timevalue is numeric and greater than or equal to 5373484.5 → Seconds since 1970-01-01
  4. If the timevalue is numeric → Days since -4713-11-24 12:00.

In prose: if the timevalue is numeric, but would not make sense as a Julian day number because it codes for a day beyond 9999-12-31, then interpret the time value as a unix timestamp even if the 'unixepoch' modifier is omitted. Otherwise, everything is as before.

12:44 Reply: Proposed slightly-incompatible change to date-time functions (artifact: 6a3ef923c7 user: drh)

For that matter, do you know of any application (other than Fossil) that uses Julian dates? Must apps that I have seen use Unix time. They all already use the ‘unixepoch’ modifier and require no changes.

I’m interested to hear about any exceptions…

11:37 Reply: Proposed slightly-incompatible change to date-time functions (artifact: 1ab8f87722 user: drh)

Numeric arguments are no longer interpreted as Julian Date numbers.

No. REAL values are interpreted as Julian Dates and INT values are interpreted as unix timestamps. Do you know of an application that stores an unix timestamp as a REAL or a Julian date as an INT? I do not. So, I'm wondering if the change will have any impact at all.

all pre-existing code that depends on the current behaviour [must be] modified

The only applications that would need to be modified are those that store Julian Dates as INT, or that store Unix timestamps from the first three months of 1970 as REAL. Do you know of any such applications? I'm guessing that the number of such applications is zero, at least to a good approximation. Do you know of any counter-examples?

2021-11-20
13:23 Reply: IS NULL optimization on NOT NULL constraints breaks on corrupted databases (artifact: c34550d653 user: drh)

The documentation has been updated to make it clearer and to state more directly that constraints are checked during changes to the database but not during queries from the database.

11:35 Reply: IS NULL optimization on NOT NULL constraints breaks on corrupted databases (artifact: 0038ebd08e user: drh)

No, sqlite3_get_table() will not invoke the callback after hitting an OOM. So that particular failure mode will not happen with sqlite3_get_table(). If one of the pazResult[] items passed into sqlite3_get_table() is NULL, that means the value really is NULL. Or, at least that is what it appears to me, assuming I didn't overlook something.

Even so, applications should not rely on a NOT NULL constraint in the schema to ensure that a pazResult[] value is never NULL. An attacker might change the schema of the database, without the application knowing it, to remove the NOT NULL constraint, for example. It does not require a corrupt database to cause the problems in GDAL - just a database with a maliciously modified schema.

02:58 Reply: Proposed slightly-incompatible change to date-time functions (artifact: 54f40d17c9 user: drh)

The point I am making is that because it does this, you can't tell if the original was inserted as an integer or as a real if the affinity of the column allows this transform.

If the affinity is INTEGER, then you can insert 22 or 22.0 and you will always get back 22. If the affinity is REAL, you can insert 22 or 22.0 and you will always get back 22.0. But if the affinity is NUMERIC or BLOB then you always get back what you inserted. If you insert 22 you get back 22, and if you insert 22.0 you get back 22.0.

02:48 Reply: IS NULL optimization on NOT NULL constraints breaks on corrupted databases (artifact: dd4d62e69a user: drh)

The issue here was that GDAL segfaulted when opening that database because of the null dereference of a corrupted record.

Did SQLite segfault? If so that would be a bug that we need to look into.

Or, did GDAL do an sqlite3_column_text() and get back a NULL pointer, then dereference the NULL pointer, thus causing a segfault. That would be a bug in GDAL. The sqlite3_column_text() might return NULL, even for a NOT NULL column, for example following an OOM error. That's part of the specification of how sqlite3_column_text() works. If you were not checking for a NULL return from sqlite3_column_text() and it always worked before, that is because you were lucky. It would be to same if you failed to check for a NULL return from malloc().

2021-11-19
21:42 Reply: IS NULL optimization on NOT NULL constraints breaks on corrupted databases (artifact: 5dbb5ad543 user: drh)

This causes a regression in the GDAL open source project

Please help me to understand why the GDAL project is concerned that you are getting an incorrect answer from a query against an admittedly corrupt database file? Why is this not just a case of GIGO?

15:49 Reply: Proposed slightly-incompatible change to date-time functions (artifact: 6df0d89eaa user: drh)

SQLite could take times that just happen to be exactly at noon, and thus equal to an integer, and store it in the database as an integer as a form of compression

SQLite does indeed do this. But it also converts the integer back into a floating point number as it reads it back off the disk, so by the time it gets to the date/time SQL functions, it is a REAL again.

Maybe adding an 'Auto' modifier

That's a good idea. I'll consider that.

Recognize that all of this discussion is for 3.38.0 or later. This is not a change that is being considered for the upcoming 3.37.0 release.

13:16 Reply: Is it possible to restrict VACUUM INTO? (artifact: a2e181c2a9 user: drh)

You can add an sqlite3_set_authorizer() restriction on the ATTACH command that causes it to fail.

VACUUM INTO is really something of a "meta" command. It works by using SQLite APIs recursively. It first runs "ATTACH" to create the new output database. Then it does a bunch of "CREATE" and "INSERT" commands on that attached database to perform the transfer, followed by a "DETACH" at the end. So if you block the initial ATTACH, you effectively prevent VACUUM INTO from running.

The sqlite3_set_authorizer() has the full name of the file being attached, so you can selectively block ATTACH (and hence VACUUM INTO). In this way, you could make it so that VACUUM INTO only works for output files in some limited subset of the filesystem namespace.

13:07 Post: Proposed slightly-incompatible change to date-time functions (artifact: 92d9dbc028 user: drh)

As you know, SQLite currently supports three different representations for date/time values:

  1. TEXTYYYY-MM-DD HH:MM:SS.SSS

  2. REAL → The Julian day number, the number of days since noon (in Greenwich) on -4714-11-24, according to the proleptic Gregorian calendar.

  3. INT → Unix-time or the number of seconds since 1970-01-01 00:00:00.

In all cases, SQLite supports date ranges from 0000-01-01 00:00:00 through 9999-12-31 23:59:59. For the various date/time functions (date(), datetime(), time(), julianday(), strftime()) the date/time argument can be either TEXT or REAL and everything just works. The date/time argument can also be INT, but such INT values will still be interpreted as a Julian day number unless followed by the 'unixepoch' modifier.

Proposed change

If the date/time argument is an INT value that is not within the range of 1721060..5373483, then it is automatically interpreted as a Unix timestamp, without needing the 'unixepoch' modifier. The 'unixepoch' modifier can still be used, but it would be redundant in this case.

Benefits

With this enhancement, queries could invoke SQL functions like datetime() on a column of type DATETIME without having to worry about the underlying storage format. The conversion would be completely automatic. Indeed, columns that are intended to store date/time values could hold a mixture of TEXT, REAL, and INT values, and queries using the date/time functions would always be able to interpret the values correctly.

Possible Breakage

SQLite only defines date/time values for 0000-01-01 through 9999-12-31. So for that date range, I don't see any way to mess this up. There is an ambiguity for numeric inputs - do we interpret them as Julian day or Unix timestamps - but this ambiguity only occurs for a limited range of values between 1970-01-20 22:04:19 and 1970-03-04 04:38:03. If you have unix timestamp value in that range, and you pass them into a datetime function as REAL numbers instead of as integers, they will get reinterpreted as Julian day numbers unless you use the 'unixepoch' modifier.

Perhaps a bigger problem would be Julian day numbers that are passed into into date/time functions as integers. That is only possible for dates that correspond to noon UTC. Such integer Julian day numbers would be interpreted as unix timestamps and transformed into date/times in vicinity of 1970-01-29. We could mitigate this by adding a new 'julianday' modifier that forces the date/time argument to be interpreted as a Julian day number even if it is an integer.

What do you think?

Is the benefit of this change worth the small chance of breaking legacy applications?

2021-11-17
16:01 Reply: SQLite + TCL on-line conference starts in less than 2 days (artifact: 8c22e1f3c1 user: drh)

Do STRICT tables enforce lengths of TEXT columns?

It does not. A question back at you → Why do you think that limiting the length of TEXT columns would be useful? What kinds of problems would that solve?

[T]he ANY data type ... appears to undo all that STRICT promises to deliver.

I do not understand that perspective. Can you explain more about how ANY is detrimental?

2021-11-15
16:39 Reply: version 3.37.0 has 2 OOM bugs in sqlite3StrAccumEnlarge and sqlite3BtreeIndexMoveto, could cause sqlite3 DOS. (artifact: d9ea17e953 user: drh)

Here is a link to the OP's tweet about this topic:

https://twitter.com/den80369019/status/1460277629668323334

14:30 Post: SQLite + TCL on-line conference starts in less than 2 days (artifact: 54324e566b user: drh)

The SQLite + TCL conference begins on Wednesday morning (2021-11-17). The conference is on-line (Zoom) and free. Sign-up at https://conf.tcl-lang.org/

I will be leading with a review of the enhancements to SQLite that have occurred over the previous 12 months, followed by a discussion of where we are planning to take SQLite in the future. I'll strive to be brief so as to to reserve plenty of time for Q&A. Looking forward to your feedback.

2021-11-12
17:33 Reply: version 3.37.0 has 2 OOM bugs in sqlite3StrAccumEnlarge and sqlite3BtreeIndexMoveto, could cause sqlite3 DOS. (artifact: 4867042314 user: drh)

The error appears to be inside of ASAN, probably at https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/sanitizer_common/sanitizer_common.cpp#L53, not something in SQLite. It appears that you are running out of memory on your system, and ASAN is choking because of that.

What does the command "prlimit" show on your system? In particular, what is your "address space limit"? Mine shows "unlimited", and when I run your scripts I get:

Error: near line 5: stepping, string or blob too big (18)

I modified your first scripts by adding ".stats on" and ".echo" giving the following:

.echo on
CREATE TABLE v0 ( v1 INTEGER PRIMARY KEY DESC ON CONFLICT REPLACE CHECK( max ( 'Al' , v1 ) + julianday ( ( v1 IN ( v1 , v1 , v1 ) ) ) <= 'x' ) ) ; INSERT INTO v0 ( v1 ) VALUES ( 0 ) ,( 42 ) ,( 10 ) ;
INSERT INTO v0 SELECT randomblob ( 1 ) FROM v0 , v0 AS t JOIN v0 ; SELECT DISTINCT v1 AS REVENUE FROM v0 AS r WHERE v1 = v1 AND v1 = 10 AND v1 = 0 ORDER BY v1 ;
.stats on
UPDATE v0 SET v1 = ( SELECT group_concat ( substr ( v0 . v1 , 'GERMANY' ) , v1 ) FROM v0 AS p ) ;

I compile thusly:

CC=clang CFLAGS='-fsanitize=address -fno-omit-frame-pointer -Os -g' ./configure --enable-all --enable-debug
make sqlite3

Then running your script, I get the error above, and the ".stat" output shows that the process used a bit of 9GB of RAM at its peak, before giving up and reporting the error. My desktop has 32GB of RAM. How much RAM is on your test system?

It seems to me that SQLite is doing exactly the right thing here, and that the problem is in your test procedure or in your copy of libasan.so. Why do you think this is a problem with SQLite? What do you think it should be doing differently?

2021-11-09
13:03 Reply: Odd behaviour of UNION (artifact: 8450d6b3c2 user: drh)

The ORDER BY, LIMIT, and OFFSET apply to the UNION, not to the second SELECT. You want this, I think:

SELECT max(c2) FROM A
UNION
SELECT * FROM (SELECT DISTINCT c2 FROM A ORDER BY c2 DESC LIMIT 1 OFFSET 1);
10:35 Reply: Wasted file space estimate (artifact: 7cb6a4e088 user: drh)

You really shouldn't open() and close() SQLite database files using your own code. Use SQLite APIs only. The reason for this is that if another thread in your application has the database open using SQLite, the close() call you make will clear the posix advisor locks. The SQLite connection has no way to prevent or detect this. Clearing locks out from under an SQLite database connection could lead to database corruption.

A better solution would be to open the database using sqlite3_open() or similar, and then run "PRAGMA page_size; PRAGMA freelist_count;" to get the values you are looking for.

2021-11-08
23:27 Reply: A segmentation fault in SQLite latest release build (artifact: ac825c1563 user: drh)

Plans did change.

The end solution (seen at check-in 74aec5dd1df95b56) was to refactor the logic that figures out which query in a cascade of nested and correlated queries a particular aggregate function belongs too. The new logic does not depend on subquery cursor numbers being greater than outer query cursor numbers. That constraint is no longer on the parse tree, and so no new assert() logic to check is was required.

Should be working now.

14:52 Reply: A segmentation fault in SQLite latest release build (artifact: 82e3696053 user: drh)

Thanks for the bug report.

The crash is due to a NULL pointer dereference in the byte-code engine caused by incorrect byte-code. The incorrect byte-code results for a fault in the code generator.

Each table or subquery in a complex SELECT statement is assigned a cursor number. The name resolution logic for aggregate functions depends on the fact that cursor numbers for subqueries are always greater than cursor numbers in outer queries. But that assumption was violated by a new UNION ALL optimization that as added on 2020-12-19. The query in question invokes that optimization, causing cursor numbers to be misordered, resulting in incorrect byte-code, and ultimately the NULL pointer dereference.

A simplified query is this:

SELECT * FROM (
  SELECT 1 FROM rt0 AS q3
  UNION ALL
  SELECT 2 FROM rt0 AS q4
) LEFT JOIN (
  SELECT * 
    FROM (
           SELECT (SELECT sum(q2.c1) + rank() OVER () FROM rt0 AS q1) AS ca1
             FROM rt0 AS q2
         ) AS q5
   WHERE q5.ca1=0
);

The UNION ALL optimization transforms this query into the following:

SELECT 1, q7.* 
  FROM rt0 AS q3
  LEFT JOIN (
    SELECT * 
      FROM (
             SELECT (SELECT sum(q2.c1) + rank() OVER () FROM rt0 AS q1a) AS ca1
               FROM rt0 AS q2a
           ) AS q5a
     WHERE q5a.ca1=0
  ) AS q7
UNION ALL
SELECT 2, q8.*
  FROM rt0 AS q4
  LEFT JOIN (
    SELECT * 
      FROM (
             SELECT (SELECT sum(q2.c1) + rank() OVER () FROM rt0 AS q1b) AS ca1
               FROM rt0 AS q2b
           ) AS q5b
     WHERE q5b.ca1=0
  ) q8;

If you enter the second optimized SQL directly, it works (because all of the cursor numbers are well ordered). But when the query optimizer makes that translation, it ends up with the cursor number for q2b being less than the cursor number for q5b.

Multiple things need to be fixed here:

  1. The UNION ALL optimization needs to be fixed so that it yields a parse tree where the cursor numbers of all subquerys greater than than the cursor number of any outer query.

  2. New assert() statements need to be added to check every parse tree transformation and assure that the cursor numbers are always well-ordered.

I'm working on these changes now.

Everything in this post is the result of a preliminary analysis and is subject to change or correction in follow-up posts.

2021-11-06
11:13 Reply: SQLITE_OPEN_SHARED_SCHEMA == SQLITE_OPEN_NOFOLLOW (artifact: 3b6f80143f user: drh)

No. Last night I mistakenly merged the reuse-schema branch into trunk when I had intended to merge trunk into the reuse-schema branch. That mistake has now been corrected.

We had not noticed the collision between the NOFOLLOW and SHARED_SCHEMA open flags. That will need to be addressed, but we'll do that on the branch.

2021-11-05
17:19 Reply: App file format, large image texture blobs? (artifact: 48608cd49c user: drh)

The document that best answers your question is probably:

https://www.sqlite.org/fasterthanfs.html

That study was done to figure out if it is more performant to put 10KB thumbnails in an SQLite database or as separate files on disk. (The answer: an SQLite database.)

Your situation is different in that you are using 2MB blobs instead of 10KB blobs. I'm not sure where the cross-over point is. You should run experiments to see. The source code for the performance measurement software is linked in the article above.

Is raw performance your only consideration? Do power-safe transactions, a simple single-file storage format, and a powerful query language figure into your decision at all? If raw performance is the only factor, then I'm guessing direct-to-disk will be the best choice for 2MB blobs (but that is only a guess - you should verify). But if other factors come into play, then you will have an engineering tradeoff to consider. Only you can make that decision.

12:35 Reply: Error when updating sqlite3.c using trunk (artifact: d6a99c93e4 user: drh)
12:03 Reply: An assertion failure in function isSimpleCount() (artifact: 284602513a user: drh)

Thanks for the report. Now fixed on trunk.

Wang Ke probably already knows this, but for the benefit of other readers, the NEVER() macro in SQLite basically means "we don't have a test case for this condition but we should probably check it anyhow." On debug builds, the NEVER() raises an assertion fault if it is ever true, but for release builds it is a no-op. In that way, NEVER() macros alert us to new untested branches. Release builds are unaffected.

Hence, this is not a problem in release builds.

The simplest fix here would be to simply delete the NEVER(). But I took the opportunity to also clean up the isSimpleCount() function a little. Test cases were added to TH3 and dbsqlfuzz.

2021-11-04
15:47 Reply: Question about memory management (artifact: 7b851367fd user: drh)

As I import csv files into databases - one for each subject I am importing - my computer memory fills right up.

Can you be more specific about what you are doing to accomplish this import? Are you using the CLI? If so, what commands are you using? Or did you write your own program/script to do the import? Can we see it?

2021-11-03
23:56 Reply: Question regarding dangling commas in CREATE VIRTUAL TABLE queries (artifact: fa65232c87 user: drh)

It is not a dangling comma. The examples you show all merely have the last parameter equal to an empty string. For both FTS4, FTS5, and RTREE, this results in a column named by an empty string ("").

If you are writing your own virtual tables and want to disallow this, just check the parameters and raise an error if any of them are empty. Perhaps we should have done this when we created FTS4, FTS5, and RTREE. But the time for that decision has passed. Those virtual tables have accepted and allowed column names that are the empty string for many years, and so we cannot change it now without breaking compatibility.

SQLite also allows you to create tables and columns named by an empty string. For example:

CREATE TABLE ""(a INT, "" INT, c INT);
INSERT INTO "" VALUES(1,2,3);
SELECT "" FROM "" WHERE a=1;

You are allowed to do this, but you probably ought not.

2021-11-02
15:43 Reply: gcc 11 compiler warning in sqlite3Fts5IndexQuery (artifact: 1b863fcba6 user: drh)

Didn't use a VM. I ended up wiping the disk and doing a fresh install of Ubuntu 20.04.

2021-10-29
14:20 Reply: License File where to find? (artifact: 62845ebcc1 user: drh)

Can you do us the favor of posting this question on the SEE Forum?

10:03 Reply: gcc 11 compiler warning in sqlite3Fts5IndexQuery (artifact: 746635d5ba user: drh)

Ugh. When I ran "apt install gcc-11" on my Ubuntu 20.04 machine in order to try to reproduce this problem, it broke the clang installation so that now I can not longer run "clang -fsanitize=fuzzer".

Yet another example of compiler warnings causing harm rather than preventing it...

2021-10-28
16:10 Reply: gcc 11 compiler warning in sqlite3Fts5IndexQuery (artifact: 61173f136d user: drh)

Unable to repro. The warnings do not occur on Ubuntu 20.04 with gcc-11 (Ubuntu 11-20210417-1ubuntu1) 11.0.1 20210417 using compiler options -O0 -Wall.

2021-10-26
10:21 Reply: No ".open" option in version 3.7.17 2013-05-20 (artifact: 6733f1d20a user: drh)

noting that 3.7 is now more than 8 years old and will be missing all sorts of more current features.

Yeah. A simpler solution would be to download and use the latest 3.36.0 CLI.

2021-10-21
17:28 Reply: Truncated Words bug with fts5vocab (artifact: fb1e6b3c25 user: drh)

The Porter Stemmer algorithm does that. It is a feature, not a bug. The idea of a "stemmer" is to map words that share a common root into the same base form, so that they appear to the search algorithm as the same word.

The Porter Stemmer algorithm (named for its inventor, Martin Porter), only works for English. But it has been widely used for over 4 decades and works fairly well within its domain.

If you don't want to use the Porter Stemmer, leave off the "porter" keyword when you declare the FTS5 table.

2021-10-20
13:48 Reply: BUG carray.c memcpy() buffer overflow (artifact: 191ad0797f user: drh)

Now fixed on trunk. Thanks for the bug report.

11:26 Reply: How do i submit a bug report (artifact: d04f5871e3 user: drh)

Lemon is a command-line tool. Nobody runs Lemon who does not already have full shell access on the target machine. So memory safety issues in Lemon don't really matter that much.

Lemon generates code that is used by SQLite. But Lemon is not itself part of SQLite. If you find faults in the code that Lemon generates, that is an issue. But faults in Lemon itself are scarcely a concern.

2021-10-14
14:04 Reply: Pragma_Function_List() (artifact: 48b468b7e4 user: drh)

The flags column is an internal implementation detail and is subject to change. But a few of the bits are fixed. From sqlite3.h:

#define SQLITE_DETERMINISTIC    0x000000800
#define SQLITE_DIRECTONLY       0x000080000
#define SQLITE_SUBTYPE          0x000100000
#define SQLITE_INNOCUOUS        0x000200000

Thus, for example, to see a list of all functions that are not allowed to be used inside of triggers and views (SQLITE_DIRECTONLY functions) you could write:

SELECT DISTINCT name
  FROM pragma_function_list
 WHERE flags & 0x80000
 ORDER BY name;

And this gives you:

┌────────────────────┐
│        name        │
├────────────────────┤
│ fts3_tokenizer     │
│ geopoly_debug      │
│ icu_load_collation │
│ load_extension     │
│ readfile           │
│ sha3_query         │
│ writefile          │
└────────────────────┘
2021-10-13
14:48 Reply: Opening a DB with SQLITE_OPEN_EXCLUSIVE (artifact: cebd13c65b user: drh)

That check-in has been backed out.

  1. The change only worked for unix, not Windows. Such asymmetry, while occasionally necessary, is unwelcomed.

  2. There are corner cases where it does not work on unix either. And those corner cases would be difficult to fix.

  3. We don't see a compelling need for this feature (there are better ways to accomplish the same thing) and so we do not want to risk the stability of the project in order to implement it.

12:10 Reply: binary .dump format (artifact: f96d8d6001 user: drh)

Is there a way to get a file similar to what .dump outputs in a binary format?

Yes. The https://www.sqlite.org/src/file/ext/misc/dbdump.c file contains a stand-alone implementation of ".dump". You can make a copy of that file and modify it to generate whatever binary output format you desire.

How do you propose to get your binary-dump format back into an SQLite database?

11:59 Reply: Build fails with SQLITE_OMIT_WAL and SQLITE_MAX_MMAP_SIZE=0 (artifact: e1f858f5bb user: drh)

I'd rather not remove SQLITE_OMIT_WAL from the configuration.

Why not? What harm is WAL mode causing you?

2021-10-12
14:03 Reply: sqlite3_serialize / deserialize 2GB limit on 64 bit (artifact: 31a4fa7c69 user: drh)

We have discussed making it work during the next release cycle. No promises, though.

More ↓