SQLite

Check-in [6c77c2d5e1]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Just don't run tolower() on hi-bit characters. This shouldn't cause us to break any UTF-8 code points, unless they were already broken in the input. (CVS 3376)
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 6c77c2d5e15e9d3efed3e274bc93cd5a4868f574
User & Date: shess 2006-08-30 21:40:30.000
Context
2006-08-31
15:07
Refactor the FTS1 module so that its name is "fts1" instead of "fulltext", so that all symbols with external linkage begin with "sqlite3Fts1", and so that all filenames begin with "fts1". (CVS 3377) (check-in: e1891f0dc5 user: drh tags: trunk)
2006-08-30
21:40
Just don't run tolower() on hi-bit characters. This shouldn't cause us to break any UTF-8 code points, unless they were already broken in the input. (CVS 3376) (check-in: 6c77c2d5e1 user: shess tags: trunk)
2006-08-29
18:46
Bug fix: Get INSERT INTO ... SELECT working when the target is a virtual table. (CVS 3375) (check-in: 7cdc41e748 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to ext/fts1/simple_tokenizer.c.
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
  ** track such information in the database, then we'd only want this
  ** information on the initial create.
  */
  if( argc>1 ){
    t->zDelim = string_dup(argv[1]);
  } else {
    /* Build a string excluding alphanumeric ASCII characters */
    char zDelim[256];               /* nul-terminated, so nul not a member */
    int i, j;
    for(i=1, j=0; i<0x100; i++){
      if( i>=0x80 || !isalnum(i) ){
        zDelim[j++] = i;
      }
    }
    zDelim[j++] = '\0';
    assert( j<=sizeof(zDelim) );
    t->zDelim = string_dup(zDelim);
  }







|

|
|







58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
  ** track such information in the database, then we'd only want this
  ** information on the initial create.
  */
  if( argc>1 ){
    t->zDelim = string_dup(argv[1]);
  } else {
    /* Build a string excluding alphanumeric ASCII characters */
    char zDelim[0x80];               /* nul-terminated, so nul not a member */
    int i, j;
    for(i=1, j=0; i<0x80; i++){
      if( !isalnum(i) ){
        zDelim[j++] = i;
      }
    }
    zDelim[j++] = '\0';
    assert( j<=sizeof(zDelim) );
    t->zDelim = string_dup(zDelim);
  }
130
131
132
133
134
135
136



137

138
139
140
141
142
143
144
  while( c->pCurrent-c->pInput<c->nBytes ){
    int n = (int) strcspn(c->pCurrent, t->zDelim);
    if( n>0 ){
      if( n+1>c->nTokenAllocated ){
        c->zToken = realloc(c->zToken, n+1);
      }
      for(ii=0; ii<n; ii++){



        c->zToken[ii] = tolower(c->pCurrent[ii]);

      }
      c->zToken[n] = '\0';
      *ppToken = c->zToken;
      *pnBytes = n;
      *piStartOffset = (int) (c->pCurrent-c->pInput);
      *piEndOffset = *piStartOffset+n;
      *piPosition = c->iToken++;







>
>
>
|
>







130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
  while( c->pCurrent-c->pInput<c->nBytes ){
    int n = (int) strcspn(c->pCurrent, t->zDelim);
    if( n>0 ){
      if( n+1>c->nTokenAllocated ){
        c->zToken = realloc(c->zToken, n+1);
      }
      for(ii=0; ii<n; ii++){
        /* TODO(shess) This needs expansion to handle UTF-8
        ** case-insensitivity.
        */
        char ch = c->pCurrent[ii];
        c->zToken[ii] = (unsigned char)ch<0x80 ? tolower(ch) : ch;
      }
      c->zToken[n] = '\0';
      *ppToken = c->zToken;
      *pnBytes = n;
      *piStartOffset = (int) (c->pCurrent-c->pInput);
      *piEndOffset = *piStartOffset+n;
      *piPosition = c->iToken++;