Documentation Source Text

Artifact [cfbf5c9679]
Login

Artifact cfbf5c9679105526b3553fcd16b03ed14258277d3f4091fc7dd4e1301d06705e:

Wiki page [checkin/393a3d19ae2e7b56d1909d4225cc098c7825556473a5df9704a54c6925c1e42b] by drh 2019-03-01 13:37:39.
D 2019-03-01T13:37:39.430
L checkin/393a3d19ae2e7b56d1909d4225cc098c7825556473a5df9704a54c6925c1e42b
U drh
W 1440
JSON parsing performance was measured by this script:

<blockquote><verbatim>
.param init
INSERT INTO temp.[$Parameters](key,value) 
VALUES('$json',readfile('/home/drh/tmp/gsoc-2018.json'));
SELECT length(printf('{"a":%d,"b":%s}',50,$json));
.timer on
WITH RECURSIVE c(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM c WHERE x<1000)
SELECT DISTINCT json_valid(printf('{"a":%d,"b":%s}',x,$json)) FROM c;

WITH RECURSIVE c(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM c WHERE x<1000)
SELECT DISTINCT substr(printf('{"a":%d,"b":%s}',x,$json),1,5) FROM c;
</verbatim></blockquote>

It is necessary to feed slightly different JSON strings into the parser
on each cycle in order to overcome the json cache.  The first query (after
starting the timer) measure the parser speed.  The second query measure all
of the extraneous non-parsing overhead of the first query.  The idea is that
the time used by the parser is the time of the first query minus the overhead
time of the second query.  Running the script above on an optimized version
of SQLite on a 4-year-old Ubuntu desktop gives:

<blockquote><verbatim>
3327844
1
Run Time: real 3.218 user 3.176000 sys 0.040000
{"a":
Run Time: real 0.275 user 0.268000 sys 0.008000
</verbatim></blockquote>

So roughly 3 seconds were used to parse 3,327,844,000 bytes of JSON, which
gives a parsing speed in excess of 1.1 GB/s.  Round it down to an even
1 GB/s to be conservative.
Z aa21b612791d9f25b099808d121854a4