SQLite

View Ticket
Login
Ticket Hash: 5eaa61ea1881040b17449ca043b6f8fd9ca55dc3
Title: sigbus on disk ful in WAL mode
Status: Fixed Type: Portability
Severity: Minor Priority: Low
Subsystem: Unknown Resolution: Fixed
Last Modified: 2013-05-06 18:49:42
Version Found In: 2993ca20207f8dac02f58d01e31d68c84328356a
Description:
Attempts to prepare a query on a WAL database when the disk space is critically low result in the process killed with BUS. The crash happens in walIndexWriteHdr invoked from walIndexRecover:
 643 static void walIndexWriteHdr(Wal *pWal){
 [...]
 650   walChecksumBytes(1, (u8*)&pWal->hdr, nCksum, 0, pWal->hdr.aCksum);
 651   memcpy((void *)&aHdr[1], (void *)&pWal->hdr, sizeof(WalIndexHdr)); << sigbus here

Since i don't know if I'll be able to attache a PoC after submitting this ticket, I'm adding it here in base64.

base64 -d > mountme.bz2 << EOF
QlpoOTFBWSZTWdl2FE4AAEd////////v7/5e/////L9v/+xv38ZSbAdOQV4FRCTR3/3s0ARNY3on
aVG5ydncNEgTQE09IyngieU8gh6TINNPUAGmjQ2o0HpDQNNHqHqAAADTQ9RoeU0GR5Q8kBoZNqeo
NEJpkRojCniamZNJo02oAzUNNGmyg0DIAAAAABoAAAAAANAaAAAxFE0Jij0J6hp5NR6Rk0ANAZBo
GgAAAADQAAAAAAAAABoDQAgNDINMmmgYTQGhpk0aYEZGTQZDJkGRpkaNMg0aAGCNNBiGgAZNDJgI
0GjQJFJRNNNDIGjTIbU0HqABoaaZDINAAAADQBiAGIGjQyNNGgAZMI0ZNDJoU3MeFxkkc3Z41hbW
8jDkSjtOhgCTZvZDyAuQe9MlnyUJRsnBSKCqYknJpalrzfoZNLCdV70TdiIXxMRuU2IRkQJqG8QZ
OQb+iYlGMFPlV0OmpxTALbroIwQMEAHxal5pbiPEEKHH0kN2QtWKoGkgJsW5O0ayrdAnCOCFYLWw
lLfOMYqAyp0kAifKYYjJCJmGkTTtwu6OsOY0RUoOiYDGtSQctLi3zirl93O1IcPEMJd2yfKOo5VC
tWwG8FR62hLdnPEziKCWNRAajF4aK+BaiAUlAZ2KQdmyNlQkNBaRNnI7pJPgEa24whyQSEgCKdFC
/aaQiZaWUYc9gZLodGIx76jQRmC3DBQYgKCBI5zBjgkBPQAMNSLg5jVDAwp0lCg6hCZ3cBnTAKCr
cB3YAkh1CQjQRAjhACk7EMgIYG/qJECI59jLeOZbEgNUsCyBI1VUxJGKqGBEkyVYgcIAm0kNpA2k
Ek0adiIMAxDQmwQlHkcPEajEqCMs0KPm/J9/zgpwGR8bJkqz2cyCKk0efK1+mDyI2XAXOKDBngnh
ufJHQ/UIOjK+VQ2Co9F59shqwue3JMARlIR+wwrBKYWf7aqVbNpnE2YHnSYYO78LCCQlXsSBlmIB
gq+Mhwc5o4DY3koqRgDooKXZs6xZl9Pu99c/JOtbNB8swjv0DLMzBv1CaRMihk7i65cnPsysYte/
g2qofXNK7kUHLlkQDswL2sD2Qxg18u1zVCzMcRKmYKwk+dyYVqIGgU9AWSAGFQncaikWEUsdh1cy
DMTB1MUJJmoPKklNQgJliXgmCc48b5DvnmlACYCBG40dGAieggokSkAKv+aEPAwFF0hGpNk0JRQM
hQpUWBdCAbFtYhjq4Qnk2IF16/kvnNfAx5nFlgNSp3At7mGEVQwmTWM2Pah6bczg7MK84eAyvC0S
qZiDAVigjMRQUGiL+XmKkF1Sm6e/kUVALAMBBOMBKbkBCQEbxxkumARAAyffwCvK4lyRZC1cImG/
O6CB4G0G2dmooW+OATX2TIF8lYiCl4hSdrvCgNAxQgtgsr8iCzICwdQCJPjc8hDYS67AulWQxxmw
1WMNwqrviGcCT9CYWMaONSepUu5bLC9KfhepOKSBVBmQ8AZBSyB1ntos5HzRZMwLjGu6tPVUQKOd
ozKDukqH+aYanaMEJAE+T6suVBNc/fNGBlP1+ZqyEaJCdpvPRVi/VJMHdoKy2QHNOxo3k8kUx7rg
w8E49w6q6osXLYPmkQ2hELLO5TWp7D8RhybGAo6WXzt66MIRIkZ1kgghKRKYMAgB7YcebZbV7ada
B/Asgf4nIUJ8iKZ1qbCYDBkg7H8sTf6oDqczF/X79LiIqxb1boNG662tp/Vz7JfwSCL6w4+2fqk3
ym9/ynQJUzxwAZf7EwajzMXWBxQgiIjYJJ0YJATYEIr+KoYlEwFWSnnjkI3piIP1ycNM6496w1SD
V/Xf8tkMAB4Ffbf/i7kinChIbLsKJwA=
EOF

Then:
bunzip2 mountme.bz2
mount -o loop mountme /mnt

Finally:
# sqlite3 /mnt/db.sqlite3 
SQLite version 3.7.14.1 2012-10-04 19:37:12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .tables
Bus error

drh added on 2012-11-13 01:28:57:
Unable to recreate the malfunction here. In fact, gdb says that the walIndexWriteHdr() function never gets called when following the steps outlined in the original bug report (which is exactly what you would expect if you are only doing a query.)


anonymous added on 2012-11-13 09:02:21:
Are you running sqlite3/gdb as root?


drh added on 2012-11-13 09:28:13:
Yes, the sqlite3 shell was run as run. I also repeated the experiment multiple times with valgrind, for what it is worth. No issues observed.

Furthermore, I tried creating write-ahead log files that needed to be recovered (since the original problem statement said that the segfault occurred during recovery) and recover them, as root, with zero space left on the device. Still no problems.

The segfault occurs on a line that is attempting to write into a newly allocated mmap-ed file (the db.sqlite-shm file, specifically). It appears that Linux will allow disk space for mmap-ed files to be overcommitted. That is to say, based on my experiments, you can mmap more space than you have on disk and the mmap() call still works. Perhaps you system is configured in some way different from mine (perhaps it is also under memory pressure) so that the mmap-ed region is becoming unmapped somehow?


drh added on 2012-11-13 10:34:17:
Able to reproduce the problem now....


anonymous added on 2012-11-13 12:14:38:
Confirmed resolved. Thanks a lot for the very quick fix!

User Comments:
anonymous added on 2012-12-15 11:09:50:
I'm really sorry, but I have to reopen the ticket.

The following commit 597333f1024092b94bcd8772541e19a0f707bd40 (http://www.sqlite.org/src/info/597333f102) breaks the compilation on systems with no posix_fallocate available (e.g. systems using uClibc as libc-implementation, i.e. a lot of embedded systems).

Inverting the result of configure test is incorrect, i.e. the value of HAVE_POSIX_FALLOCATE should be left as is.

I didn't test it myself, but from reading the code I would say the problem reported in this ticket is only solved for systems with posix_fallocate available. posix_fallocate is optional (s. http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html), therefore the ticket is only partially solved. Thus reopened.

Regards,
Gene