Documentation Source Text

Check-in [d9cb224c9f]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add documentation on althttpd.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: d9cb224c9f58b749ea07a0d43e13b2882cfcb0621dba908e67f8eefbce9ca163
User & Date: drh 2018-02-27 21:29:46
Context
2018-02-27
22:16
Updates to the althttpd documentation. check-in: 5a87c618a8 user: drh tags: trunk
21:29
Add documentation on althttpd. check-in: d9cb224c9f user: drh tags: trunk
16:36
Create a change log entry for 3.23.0. check-in: 45bd3cdd39 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Added misc/althttpd.md.















































































































































































































































































































































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
The Althttpd Webserver
======================

Althttpd is a simple webserver that has run the <https://sqlite.org/> website
since 2004.  Althttpd strives for simplicity, security, and low resource
usage.

As of 2018, the althttpd instance for sqlite.org answers
about 500,000 HTTP requests per day (about 5 or 6 per second)
delivering about 50GB of content per day (about 4.6 megabits/second) 
on a $40/month [Linode](https://www.linode.com/pricing).  The load 
average on this machine normally stays around 0.1 or 0.2.  About 19%
of the HTTP requests are CGI to various [Fossil](https://fossil-scm.org/)
source-code repositories.

Design Philosopy
----------------

Althttpd was originally designed to be launched from 
[xinetd](https://en.wikipedia.org/wiki/Xinetd) or
[stunnel4](https://www.stunnel.org/).  A separate process
is started for each incoming connection, and that process is
wholly focused on serving that one connection.  A single althttpd
process will handle one or more HTTP requests over the same connection.
When the connection closes, the althttpd process exits.

Newer versions of althttpd can operate stand-alone.  Althttpd
itself listens on port 80 for incoming HTTP requests, then forks
a copy of itself to handle each inbound connection.  Each connection
is still handled using a separate process.  The only difference is
that the connection-handler process is now started by a master
althttpd instance rather than by xinetd or stunnel4.

Althttpd has no configuration files.  All configuration is handled
using command-line arguments.

Althttpd does not itself handle TLS connection.  For HTTPS, althttpd
relies on stunnel4 to handle TLS protocol negotiation, decryption, and
encryption.

Because each althttpd process only needs to service a single
connection, althttpd is single threaded.  Furthermore, each process
only lives for the duration of a single connection, which means that
althttpd does not need to worry too much about memory leaks.


Source Code
-----------

The complete source code for althttpd is contained within a single
C-code file with no dependences outside of the standard C library.
The source code file is named "[althttpd.c](/file/misc/althttpd.c)".
To build and install althttpd, run the following command:

>
     gcc -Os -o /usr/bin/althttpd althttpd.c

The althttpd source code is heavily commented and accessible.
It should be relatively easy to customize for specialized needs.

Setup Using Xinetd
------------------

Shown below is the complete text of the /etc/xinetd.d/http file on
sqlite.org that configures althttpd to server unencrypted
HTTP requests on both IPv4 and IPv6.
You can use this as a template to create your own installations.

>
    service http
    {
      port = 80
      flags = IPv4
      socket_type = stream
      wait = no
      user = root
      server = /usr/bin/althttpd
      server_args = -logfile /logs/http.log -root /home/www -user www-data
      bind = 45.33.6.223
    }
>    
    service http
    {
      port = 80
      flags = REUSE IPv6
      bind = 2600:3c00::f03c:91ff:fe96:b959
      socket_type = stream
      wait = no
      user = root
      server = /usr/bin/althttpd
      server_args = -logfile /logs/http.log -root /home/www -user www-data
    }
    

The key observation here is that each incoming TCP/IP connection on 
port 80 launches a copy of /usr/bin/althttpd with some additional
arguments that amount to the configuration for the webserver.

Notice that althttpd is run as the superuser. This is not required, but if it
is done, then althttpd will move itself into a chroot jail at the root of
of the web document hierarchy (/home/www in the example) and then drop
all superuser privileges prior to reading any content off of the wire.
The -user option tells althttpd to become user www-data after entering
the chroot jail.

The -root option tells althttpd where to find the document hierarchy.
In the case of sqlite.org, all content is served from /home/www.
At the top level of this document hierarchy is a bunch of directories
whose names end with ".website".  Each such directory is a separate
website.  The directory is chosen based on the Host: parameter of the
incoming HTTP request.  A _partial_ list of the directories on sqlite.org
is this:

>
    3dcanvas_tcl_lang_org.website
    3dcanvas_tcl_tk.website
    androwish_org.website
    canvas3d_tcl_lang_org.website
    canvas3d_tcl_tk.website
    cvstrac_org.website
    default.website
    fossil_scm_com.website
    fossil_scm_hwaci_com.website
    fossil_scm_org.website
    system_data_sqlite_org.website
    wapp_tcl_lang_org.website
    wapp_tcl_tk.website
    www2_alt_mail_net.website
    www_androwish_org.website
    www_cvstrac_org.website
    www_fossil_scm_com.website
    www_fossil_scm_org.website
    www_sqlite_org.website
    
For each incoming HTTP request, althttpd takes the text of the Host:
parameter in the request header, converts it to lowercase, and changes
all characters other than ASCII alphanumerics into "_".  The result
determines which subdirectory to use for content.  If nothing matches,
the "default.website" directory is used as a fallback.

For example, if the Host parameter is "www.SQLite.org" then the name is
translated into "www\_sqlite\_org.website" and that is the directory
used to serve content.  If the Host parameter is "fossil-scm.org" then
the "fossil\_scm\_org.website" directory is used.  Oftentimes, two more
more names refer to the same website.  For example, fossil-scm.org,
www.fossil-scm.org, fossil-scm.com, and www.fossil-scm.com are all the
same website.  In that case, typically only one of the directories is
a real directory and the others are symbolic links.

On a minimal installation that only hosts a single website, it suffices
to have a single subdirectory named "default.website".

Within the *.website directory, the file to be served is selected by
the HTTP request URI.  Files that are marked as executable are run
as CGI.  Non-executable files are delivered as-is.

If the request URI specifies the name of a directory within *.website,
then althttpd appends "/index.html" and "/index.cgi", in that order,
looking for a match.

If a prefix of a URI matches the name of an executable file then that
file is run as CGI.  For as-is content, the request URI must exactly
match the name of the file.

For content delivered as-is, the MIME-type is deduced from the filename
extension using a table that is compiled into althttpd.

Security Features
-----------------

To defend against mischief, there are restrictions on names of files that
althttpd will serve.  Within the request URI, all characters other than
alphanumerics and ",-./:~" are converted into a single "_".  Futhermore,
if any path element of the request URI begins with "." or "-" then
althttpd always returns a 404 Not Found error.  Thus is it safe to put
auxiliary files (databases or other content used by CGI, for example)
in the document hierarchy as long as the filenames being with "." or "-".

An exception:  Though althttpd normally returns 404 Not Found for any
request with a path element beginning with ".", it does allow requests
where the URI begins with "/.well-known/".  This exception is necessary
to allow LetsEncrypt to validate ownership of the website.

Log File
--------

If the -logfile option is given on the althttpd command-line, then a single
line is appended to the named file for each HTTP request.
The log file is in the Comma-Separated Value or CSV format specified
by [RFC4180](https://tools.ietf.org/html/rfc4180).
There is a comment in the source code that explains what each of the fields
in this output line mean.

The filename on the -logfile option may contain time-based characters 
that are expanded by [strftime()](https://linux.die.net/man/3/strftime).
Thus, to cause a new logfile to be used for each day, you might use
something like:

>
     -logfile /var/logs/althttpd/httplog-%Y%m%d.csv

Setup For HTTPS Using Stunnel4
------------------------------

Althttpd itself does not do any encryption.
To set up an encrypted website using althttpd, the recommended technique
is to use [stunnel4](https://www.stunnel.org/).

On the sqlite.org website, the relevant lines of the
/etc/stunnel/stunnel.conf file are:

>
    cert = /etc/letsencrypt/live/sqlite.org/fullchain.pem
    key = /etc/letsencrypt/live/sqlite.org/privkey.pem
    [https]
    accept       = :::443
    TIMEOUTclose = 0
    exec         = /usr/bin/althttpd
    execargs     = /usr/bin/althttpd -logfile /logs/http.log -root /home/www -user www-data -https 1

This setup is very similar to the xinetd setup.  One key difference is
the "-https 1" option is used to tell althttpd that the connection is
encrypted.  This is important so that althttpd will know to set the
HTTPS environment variable for CGI programs.

It is ok to have both xinetd and stunnel4 both configured to
run althttpd, at the same time. In fact, that is the way that the
SQLite.org website works.  Requests to <http://sqlite.org/> go through
xinetd and requests to <https://sqlite.org/> go through stunnel4.

Stand-alone Operation
---------------------

On the author's desktop workstation, in his home directory is a subdirectory
named ~/www/default.website.  That subdirectory contains a collection of
files and CGI scripts.  Althttpd can serve the content there by running
the following command:

>
    althttpd -root ~/www -port 8080

The "-port 8080" option is what tells althttpd to run in stand-alone
mode, listening on port 8080.

The author of althttpd has only ever used stand-alone mode for testing.
Since there is no provision to do TLS encryption within althttpd, the
stunnel4 setup is preferred for production websites.