/ Check-in [7b6de5c3]
Login
SQLite training in Houston TX on 2019-11-05 (details)
Part of the 2019 Tcl Conference

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Minor tweaks to the hebrew transliteration tables.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | translit-tokenizer
Files: files | file ages | folders
SHA1: 7b6de5c35d1c2e141b1eb666c8dd5ef6201ab579
User & Date: drh 2012-05-04 13:22:42
Context
2012-05-04
13:22
Minor tweaks to the hebrew transliteration tables. Leaf check-in: 7b6de5c3 user: drh tags: translit-tokenizer
02:58
Add an experimental tokenizer to FTS3/4: one that transliterates latin, greek, cyrillic, and hebrew characters into pure ascii. check-in: 93011569 user: drh tags: translit-tokenizer
Changes
Hide Diffs Unified Diffs Show Whitespace Changes Patch

Changes to ext/fts3/fts3_tokenizer2.c.

952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
  0,                                       /* 891 */
  0,                                       /* 892 */
  0,                                       /* 893 */
  0,                                       /* 894 */
  0,                                       /* 895 */
  0,                                       /* 896 */
  0,                                       /* 897 */
  (52*4 + 1),    /* u05D0 (א)  ->  '    */ /* 898 */
  (53*4 + 1),    /* u05D1 (ב)  ->  b    */ /* 899 */
  (32*4 + 1),    /* u05D2 (ג)  ->  g    */ /* 900 */
  (20*4 + 1),    /* u05D3 (ד)  ->  d    */ /* 901 */
  ( 3*4 + 1),    /* u05D4 (ה)  ->  h    */ /* 902 */
  ( 7*4 + 1),    /* u05D5 (ו)  ->  v    */ /* 903 */
  (21*4 + 1),    /* u05D6 (ז)  ->  z    */ /* 904 */
  ( 4*4 + 2),    /* u05D7 (ח)  ->  ch   */ /* 905 */
  (13*4 + 1),    /* u05D8 (ט)  ->  t    */ /* 906 */
  ( 9*4 + 1),    /* u05D9 (י)  ->  y    */ /* 907 */
  (54*4 + 1),    /* u05DA (ך)  ->  k    */ /* 908 */
  (54*4 + 1),    /* u05DB (כ)  ->  k    */ /* 909 */
  (11*4 + 1),    /* u05DC (ל)  ->  l    */ /* 910 */
  (55*4 + 1),    /* u05DD (ם)  ->  m    */ /* 911 */
  (55*4 + 1),    /* u05DE (מ)  ->  m    */ /* 912 */
  (31*4 + 1),    /* u05DF (ן)  ->  n    */ /* 913 */
  (31*4 + 1),    /* u05E0 (נ)  ->  n    */ /* 914 */
  ( 1*4 + 1),    /* u05E1 (ס)  ->  s    */ /* 915 */
  (52*4 + 1),    /* u05E2 (ע)  ->  '    */ /* 916 */
  ( 0*4 + 1),    /* u05E3 (ף)  ->  p    */ /* 917 */
  ( 0*4 + 1),    /* u05E4 (פ)  ->  p    */ /* 918 */
  (42*4 + 2),    /* u05E5 (ץ)  ->  ts   */ /* 919 */
  (42*4 + 2),    /* u05E6 (צ)  ->  ts   */ /* 920 */
  (56*4 + 1),    /* u05E7 (ק)  ->  q    */ /* 921 */
  (57*4 + 1),    /* u05E8 (ר)  ->  r    */ /* 922 */
  ( 2*4 + 2),    /* u05E9 (ש)  ->  sh   */ /* 923 */







|






|










|







952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
  0,                                       /* 891 */
  0,                                       /* 892 */
  0,                                       /* 893 */
  0,                                       /* 894 */
  0,                                       /* 895 */
  0,                                       /* 896 */
  0,                                       /* 897 */
  ( 1*4 + 0),    /* u05D0 (א)  ->       */ /* 898 */
  (53*4 + 1),    /* u05D1 (ב)  ->  b    */ /* 899 */
  (32*4 + 1),    /* u05D2 (ג)  ->  g    */ /* 900 */
  (20*4 + 1),    /* u05D3 (ד)  ->  d    */ /* 901 */
  ( 3*4 + 1),    /* u05D4 (ה)  ->  h    */ /* 902 */
  ( 7*4 + 1),    /* u05D5 (ו)  ->  v    */ /* 903 */
  (21*4 + 1),    /* u05D6 (ז)  ->  z    */ /* 904 */
  ( 3*4 + 1),    /* u05D7 (ח)  ->  h    */ /* 905 */
  (13*4 + 1),    /* u05D8 (ט)  ->  t    */ /* 906 */
  ( 9*4 + 1),    /* u05D9 (י)  ->  y    */ /* 907 */
  (54*4 + 1),    /* u05DA (ך)  ->  k    */ /* 908 */
  (54*4 + 1),    /* u05DB (כ)  ->  k    */ /* 909 */
  (11*4 + 1),    /* u05DC (ל)  ->  l    */ /* 910 */
  (55*4 + 1),    /* u05DD (ם)  ->  m    */ /* 911 */
  (55*4 + 1),    /* u05DE (מ)  ->  m    */ /* 912 */
  (31*4 + 1),    /* u05DF (ן)  ->  n    */ /* 913 */
  (31*4 + 1),    /* u05E0 (נ)  ->  n    */ /* 914 */
  ( 1*4 + 1),    /* u05E1 (ס)  ->  s    */ /* 915 */
  ( 1*4 + 0),    /* u05E2 (ע)  ->       */ /* 916 */
  ( 0*4 + 1),    /* u05E3 (ף)  ->  p    */ /* 917 */
  ( 0*4 + 1),    /* u05E4 (פ)  ->  p    */ /* 918 */
  (42*4 + 2),    /* u05E5 (ץ)  ->  ts   */ /* 919 */
  (42*4 + 2),    /* u05E6 (צ)  ->  ts   */ /* 920 */
  (56*4 + 1),    /* u05E7 (ק)  ->  q    */ /* 921 */
  (57*4 + 1),    /* u05E8 (ר)  ->  r    */ /* 922 */
  ( 2*4 + 2),    /* u05E9 (ש)  ->  sh   */ /* 923 */

Changes to ext/fts3/translit01.tcl.

1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
  05BE 0000 {} {HEBREW PUNCTUATION MAQAF}
  05BF 0000 e  {HEBREW POINT RAFE}
  05C0 0000 *  {HEBREW PUNCTUATION PASEQ}
  05C1 0000 sh {HEBREW POINT SHIN DOT}
  05C2 0000 s  {HEBREW POINT SIN DOT}
  05C3 0000 *  {HEBREW PUNCTUATION SOF PASUQ}
  05C4 0000 {} {HEBREW MARK UPPER DOT}
  05D0 0000 '  {HEBREW LETTER ALEF}
  05D1 0000 b  {HEBREW LETTER BET}
  05D2 0000 g  {HEBREW LETTER GIMEL}
  05D3 0000 d  {HEBREW LETTER DALET}
  05D4 0000 h  {HEBREW LETTER HE}
  05D5 0000 v  {HEBREW LETTER VAV}
  05D6 0000 z  {HEBREW LETTER ZAYIN}
  05D7 0000 ch {HEBREW LETTER HET}
  05D8 0000 t  {HEBREW LETTER TET}
  05D9 0000 y  {HEBREW LETTER YOD}
  05DA 0000 k  {HEBREW LETTER FINAL KAF}
  05DB 0000 k  {HEBREW LETTER KAF}
  05DC 0000 l  {HEBREW LETTER LAMED}
  05DD 0000 m  {HEBREW LETTER FINAL MEM}
  05DE 0000 m  {HEBREW LETTER MEM}
  05DF 0000 n  {HEBREW LETTER FINAL NUN}
  05E0 0000 n  {HEBREW LETTER NUN}
  05E1 0000 s  {HEBREW LETTER SAMEKH}
  05E2 0000 '  {HEBREW LETTER AYIN}
  05E3 0000 p  {HEBREW LETTER FINAL PE}
  05E4 0000 p  {HEBREW LETTER PE}
  05E5 0000 ts {HEBREW LETTER FINAL TSADI}
  05E6 0000 ts {HEBREW LETTER TSADI}
  05E7 0000 q  {HEBREW LETTER QOF}
  05E8 0000 r  {HEBREW LETTER RESH}
  05E9 0000 sh {HEBREW LETTER SHIN}







|






|










|







1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
  05BE 0000 {} {HEBREW PUNCTUATION MAQAF}
  05BF 0000 e  {HEBREW POINT RAFE}
  05C0 0000 *  {HEBREW PUNCTUATION PASEQ}
  05C1 0000 sh {HEBREW POINT SHIN DOT}
  05C2 0000 s  {HEBREW POINT SIN DOT}
  05C3 0000 *  {HEBREW PUNCTUATION SOF PASUQ}
  05C4 0000 {} {HEBREW MARK UPPER DOT}
  05D0 0000 {} {HEBREW LETTER ALEF}
  05D1 0000 b  {HEBREW LETTER BET}
  05D2 0000 g  {HEBREW LETTER GIMEL}
  05D3 0000 d  {HEBREW LETTER DALET}
  05D4 0000 h  {HEBREW LETTER HE}
  05D5 0000 v  {HEBREW LETTER VAV}
  05D6 0000 z  {HEBREW LETTER ZAYIN}
  05D7 0000 h  {HEBREW LETTER HET}
  05D8 0000 t  {HEBREW LETTER TET}
  05D9 0000 y  {HEBREW LETTER YOD}
  05DA 0000 k  {HEBREW LETTER FINAL KAF}
  05DB 0000 k  {HEBREW LETTER KAF}
  05DC 0000 l  {HEBREW LETTER LAMED}
  05DD 0000 m  {HEBREW LETTER FINAL MEM}
  05DE 0000 m  {HEBREW LETTER MEM}
  05DF 0000 n  {HEBREW LETTER FINAL NUN}
  05E0 0000 n  {HEBREW LETTER NUN}
  05E1 0000 s  {HEBREW LETTER SAMEKH}
  05E2 0000 {} {HEBREW LETTER AYIN}
  05E3 0000 p  {HEBREW LETTER FINAL PE}
  05E4 0000 p  {HEBREW LETTER PE}
  05E5 0000 ts {HEBREW LETTER FINAL TSADI}
  05E6 0000 ts {HEBREW LETTER TSADI}
  05E7 0000 q  {HEBREW LETTER QOF}
  05E8 0000 r  {HEBREW LETTER RESH}
  05E9 0000 sh {HEBREW LETTER SHIN}

Changes to test/fts3translit01.test.

37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
  \u0427\u0430\u0439\u043a\u043e\u0301\u0432\u0441\u043a\u0438\u0439
                   chaikovskii
  \u0391\u1f30\u03c3\u03c7\u03cd\u03bb\u03bf\u03c2
                   aschylos
  \u03a3\u03c9\u03ba\u03c1\u03ac\u03c4\u03b7\u03c2
                   sokratis
  \u05d1\u05b5\u05bc\u05d9\u05ea\u05dc\u05b6\u05d7\u05b6\u05dd
                   beaytlechem
  \u05d9\u05b0\u05e8\u05d5\u05bc\u05e9\u05b8\u05c1\u05dc\u05b7\u05d9\u05b4\u05dd
                   yervashashlayim
}                         

# Create a full-text index to use for testing the stemmer.
#
db close







|







37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
  \u0427\u0430\u0439\u043a\u043e\u0301\u0432\u0441\u043a\u0438\u0439
                   chaikovskii
  \u0391\u1f30\u03c3\u03c7\u03cd\u03bb\u03bf\u03c2
                   aschylos
  \u03a3\u03c9\u03ba\u03c1\u03ac\u03c4\u03b7\u03c2
                   sokratis
  \u05d1\u05b5\u05bc\u05d9\u05ea\u05dc\u05b6\u05d7\u05b6\u05dd
                   beaytlehem
  \u05d9\u05b0\u05e8\u05d5\u05bc\u05e9\u05b8\u05c1\u05dc\u05b7\u05d9\u05b4\u05dd
                   yervashashlayim
}                         

# Create a full-text index to use for testing the stemmer.
#
db close