/ Check-in [7b6de5c3]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Minor tweaks to the hebrew transliteration tables.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | translit-tokenizer
Files: files | file ages | folders
SHA1: 7b6de5c35d1c2e141b1eb666c8dd5ef6201ab579
User & Date: drh 2012-05-04 13:22:42
Context
2012-05-04
13:22
Minor tweaks to the hebrew transliteration tables. Leaf check-in: 7b6de5c3 user: drh tags: translit-tokenizer
02:58
Add an experimental tokenizer to FTS3/4: one that transliterates latin, greek, cyrillic, and hebrew characters into pure ascii. check-in: 93011569 user: drh tags: translit-tokenizer
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to ext/fts3/fts3_tokenizer2.c.

   952    952     0,                                       /* 891 */
   953    953     0,                                       /* 892 */
   954    954     0,                                       /* 893 */
   955    955     0,                                       /* 894 */
   956    956     0,                                       /* 895 */
   957    957     0,                                       /* 896 */
   958    958     0,                                       /* 897 */
   959         -  (52*4 + 1),    /* u05D0 (א)  ->  '    */ /* 898 */
          959  +  ( 1*4 + 0),    /* u05D0 (א)  ->       */ /* 898 */
   960    960     (53*4 + 1),    /* u05D1 (ב)  ->  b    */ /* 899 */
   961    961     (32*4 + 1),    /* u05D2 (ג)  ->  g    */ /* 900 */
   962    962     (20*4 + 1),    /* u05D3 (ד)  ->  d    */ /* 901 */
   963    963     ( 3*4 + 1),    /* u05D4 (ה)  ->  h    */ /* 902 */
   964    964     ( 7*4 + 1),    /* u05D5 (ו)  ->  v    */ /* 903 */
   965    965     (21*4 + 1),    /* u05D6 (ז)  ->  z    */ /* 904 */
   966         -  ( 4*4 + 2),    /* u05D7 (ח)  ->  ch   */ /* 905 */
          966  +  ( 3*4 + 1),    /* u05D7 (ח)  ->  h    */ /* 905 */
   967    967     (13*4 + 1),    /* u05D8 (ט)  ->  t    */ /* 906 */
   968    968     ( 9*4 + 1),    /* u05D9 (י)  ->  y    */ /* 907 */
   969    969     (54*4 + 1),    /* u05DA (ך)  ->  k    */ /* 908 */
   970    970     (54*4 + 1),    /* u05DB (כ)  ->  k    */ /* 909 */
   971    971     (11*4 + 1),    /* u05DC (ל)  ->  l    */ /* 910 */
   972    972     (55*4 + 1),    /* u05DD (ם)  ->  m    */ /* 911 */
   973    973     (55*4 + 1),    /* u05DE (מ)  ->  m    */ /* 912 */
   974    974     (31*4 + 1),    /* u05DF (ן)  ->  n    */ /* 913 */
   975    975     (31*4 + 1),    /* u05E0 (נ)  ->  n    */ /* 914 */
   976    976     ( 1*4 + 1),    /* u05E1 (ס)  ->  s    */ /* 915 */
   977         -  (52*4 + 1),    /* u05E2 (ע)  ->  '    */ /* 916 */
          977  +  ( 1*4 + 0),    /* u05E2 (ע)  ->       */ /* 916 */
   978    978     ( 0*4 + 1),    /* u05E3 (ף)  ->  p    */ /* 917 */
   979    979     ( 0*4 + 1),    /* u05E4 (פ)  ->  p    */ /* 918 */
   980    980     (42*4 + 2),    /* u05E5 (ץ)  ->  ts   */ /* 919 */
   981    981     (42*4 + 2),    /* u05E6 (צ)  ->  ts   */ /* 920 */
   982    982     (56*4 + 1),    /* u05E7 (ק)  ->  q    */ /* 921 */
   983    983     (57*4 + 1),    /* u05E8 (ר)  ->  r    */ /* 922 */
   984    984     ( 2*4 + 2),    /* u05E9 (ש)  ->  sh   */ /* 923 */

Changes to ext/fts3/translit01.tcl.

  1013   1013     05BE 0000 {} {HEBREW PUNCTUATION MAQAF}
  1014   1014     05BF 0000 e  {HEBREW POINT RAFE}
  1015   1015     05C0 0000 *  {HEBREW PUNCTUATION PASEQ}
  1016   1016     05C1 0000 sh {HEBREW POINT SHIN DOT}
  1017   1017     05C2 0000 s  {HEBREW POINT SIN DOT}
  1018   1018     05C3 0000 *  {HEBREW PUNCTUATION SOF PASUQ}
  1019   1019     05C4 0000 {} {HEBREW MARK UPPER DOT}
  1020         -  05D0 0000 '  {HEBREW LETTER ALEF}
         1020  +  05D0 0000 {} {HEBREW LETTER ALEF}
  1021   1021     05D1 0000 b  {HEBREW LETTER BET}
  1022   1022     05D2 0000 g  {HEBREW LETTER GIMEL}
  1023   1023     05D3 0000 d  {HEBREW LETTER DALET}
  1024   1024     05D4 0000 h  {HEBREW LETTER HE}
  1025   1025     05D5 0000 v  {HEBREW LETTER VAV}
  1026   1026     05D6 0000 z  {HEBREW LETTER ZAYIN}
  1027         -  05D7 0000 ch {HEBREW LETTER HET}
         1027  +  05D7 0000 h  {HEBREW LETTER HET}
  1028   1028     05D8 0000 t  {HEBREW LETTER TET}
  1029   1029     05D9 0000 y  {HEBREW LETTER YOD}
  1030   1030     05DA 0000 k  {HEBREW LETTER FINAL KAF}
  1031   1031     05DB 0000 k  {HEBREW LETTER KAF}
  1032   1032     05DC 0000 l  {HEBREW LETTER LAMED}
  1033   1033     05DD 0000 m  {HEBREW LETTER FINAL MEM}
  1034   1034     05DE 0000 m  {HEBREW LETTER MEM}
  1035   1035     05DF 0000 n  {HEBREW LETTER FINAL NUN}
  1036   1036     05E0 0000 n  {HEBREW LETTER NUN}
  1037   1037     05E1 0000 s  {HEBREW LETTER SAMEKH}
  1038         -  05E2 0000 '  {HEBREW LETTER AYIN}
         1038  +  05E2 0000 {} {HEBREW LETTER AYIN}
  1039   1039     05E3 0000 p  {HEBREW LETTER FINAL PE}
  1040   1040     05E4 0000 p  {HEBREW LETTER PE}
  1041   1041     05E5 0000 ts {HEBREW LETTER FINAL TSADI}
  1042   1042     05E6 0000 ts {HEBREW LETTER TSADI}
  1043   1043     05E7 0000 q  {HEBREW LETTER QOF}
  1044   1044     05E8 0000 r  {HEBREW LETTER RESH}
  1045   1045     05E9 0000 sh {HEBREW LETTER SHIN}

Changes to test/fts3translit01.test.

    37     37     \u0427\u0430\u0439\u043a\u043e\u0301\u0432\u0441\u043a\u0438\u0439
    38     38                      chaikovskii
    39     39     \u0391\u1f30\u03c3\u03c7\u03cd\u03bb\u03bf\u03c2
    40     40                      aschylos
    41     41     \u03a3\u03c9\u03ba\u03c1\u03ac\u03c4\u03b7\u03c2
    42     42                      sokratis
    43     43     \u05d1\u05b5\u05bc\u05d9\u05ea\u05dc\u05b6\u05d7\u05b6\u05dd
    44         -                   beaytlechem
           44  +                   beaytlehem
    45     45     \u05d9\u05b0\u05e8\u05d5\u05bc\u05e9\u05b8\u05c1\u05dc\u05b7\u05d9\u05b4\u05dd
    46     46                      yervashashlayim
    47     47   }                         
    48     48   
    49     49   # Create a full-text index to use for testing the stemmer.
    50     50   #
    51     51   db close