Minor documentation comment

(1) By sodface on 2024-01-17 02:30:14 [source]

I re-read most of the documentation today because of the latest updates and the use of client in this section confused me a little (though it's from an older update).

Rather than add a dependency on a compression library to althttpd, it relies on the client to provide content in both compressed and uncompressed forms.

Maybe "client" should be "server admin", "sysadmin", "webmaster" or some such?

(2) By drh on 2024-01-17 11:56:51 in reply to 1 [link] [source]

An Idea For A New "Audit" Utility Program For Althttpd

The doc problem that started this thread has been fixed.

But the issue brings to mind an idea for a new accessory command-line program that does an audit of the website content tree, looking for anomalies. For example, one could run:

althttpd-audit --root /home/www

And the tool would scan all *.website subdirectories under /home/www looking for things like the following:

Files —— where the corresponding file ——.gz does not have the same content.
Identify and list all CGI files - files that are executable.
Find all -auth files. Perhaps show what each -auth file does. (Example: There is an -auth file at the top-level in SQLite that forces a redirect from HTTP to HTTPS.)
Find and list all files that cannot be served as content due to filename restrictions. For example, show all files whose names begin with "-" or ".".
Find all symlinks that point to out-of-tree files or folders, and symlinks that will break in the chroot environment in which althttpd normally runs.

Are there any other elements of the content tree that need to be checked?

Note that this audit program need not be written in C. This is the kind of thing that might be better implemented as a TCL script or similar.

(3) By sodface on 2024-01-18 01:01:54 in reply to 2 [link] [source]

This seems like it would be a nice addition. Speaking for myself, I'm never certain about anything I've configured so a tool like this would provide some reassurance.

Are there any other elements of the content tree that need to be checked?

I'm guessing the tool wouldn't know some context, like the althttpd command line arguments used unless informed of them at invocation? One check came to mind that I mentioned in this post is:

check for the existence of .website directories and warn if a default.website is absent

The presence of .website directories would seem to imply that all content is contained therein and anything above that should not be accessible, especially if the server is public facing.

If running in standalone mode you can expose content at the document root (above the .website directories) if the request reaches the server but doesn't match a .website entry, like using the server IP address for example, though I think it requires knowledge of the directory structure and filenames to be able to actually retrieve the content.

(4) By spindrift on 2024-01-18 10:19:32 in reply to 3 [link] [source]

If running in standalone mode you can expose content at the document root (above the .website directories) if the request reaches the server but doesn't match a .website entry

Is that true even if a default.website directory exists?

If so, I believe that's undocumented behaviour.

If not, then that is "working as described", I think.

I'm curious because I have such a setup which might be vulnerable but can't check on it today! (I do have the catch-all default.website directory for just this purpose).

(5) By sodface on 2024-01-18 12:18:46 in reply to 4 [link] [source]

Is that true even if a default.website directory exists?

No, if you have default.website directory you should be fine.

I used to have a default.website directory and then realized that some goofy domain name still resolved to the IP of my hosted VM and requesting that domain returned my website content out of the default.website directory. So I just deleted the default.website directory, not realizing at the time that since I was in standalone mode that a negative .website match would then look in "."

(6) By Stephan Beal (stephan) on 2024-01-18 12:40:24 in reply to 3 [link] [source]

If running in standalone mode you can expose content at the document root (above the .website directories) if the request reaches the server but doesn't match a .website entry

Just to clarify, "above the .website" means the parent dir which would be containing the .website, not arbitrarily far above that.

althttpd does not permit names starting with "..", so it could not reach up out of whatever the top-most web dir is. Additionally, by default it chroot()s into the top-most dir so cannot even serve content symlinked-to outside of that.

i've been trying to get althttpd to fail for, e.g., /some/path/../index.html, but both my browser and wget "helpfully" translate that before sending it to althttpd, so the .. part is never reaching althttpd.

(7) By sodface on 2024-01-18 12:55:00 in reply to 6 [link] [source]

Just to clarify, "above the .website" means the parent dir which would be containing the .website, not arbitrarily far above that.

Yes, correct, sorry to be imprecise. The directory passed to "--root" on the command line.

i've been trying to get althttpd to fail for

In the situation I describe it would have to be a direct link to some content in the --root dir. If you had a log directory for example at the document root next to your .website directories and the log file was name althttpd.log, you could get that log with /log/althttpd.log.

So it requires knowledge of specific paths.

(8) By Stephan Beal (stephan) on 2024-01-18 13:13:39 in reply to 7 [link] [source]

In the situation I describe it would have to be a direct link to some content in the --root dir.

Perhaps althttpd should immediately fail if no default.website is found unless some new flag which forcibly permits that is used?

(10.1) By spindrift on 2024-01-18 14:20:27 edited from 10.0 in reply to 8 [link] [source]

Perhaps if -page or -popup are not used?

My use case for local development is usually just to type *

althttpd .

Like a lazy Englishman, which exposes the local directory contents (as one would expect) but I quite like not being chastised for lacking default.website in this situation. Nor having to arrange the content into it.

* I think. As I said, I'm away from my computer at present.

(12) By Stephan Beal (stephan) on 2024-01-18 13:57:01 in reply to 10.0 [link] [source]

althttpd .

Huh. That actually works. i've been going through the more verbose:

althttpd -page index.html -max-age 1

(-max-age helps avoid being fed cached results)

Perhaps if -page or -popup are not used?

Definitely - those are cases where the default.website check would just be a hindrance.

In any case, this default.website check is hypothetical. Richard would need to approve such a change and it may have side-effects i've not yet considered.

(13.1) By spindrift on 2024-01-18 21:50:21 edited from 13.0 in reply to 12 [link] [source]

Awaiting Moderator Approval

(14) By sodface on 2024-01-18 14:50:30 in reply to 12 [link] [source]

I'd prefer it if althttpd would only serve content directly from --root dir when there are no .website directories at all (including default.website). Right now, in standalone mode, it's a bit of both, if there's .website match then content is served from there, or default.website if it exists and no more specific match was found, or "." as last resort.

In other words, as long as at least one .website directory exists, then content should never be served from "."

(15) By Stephan Beal (stephan) on 2024-01-18 14:58:53 in reply to 14 [link] [source]

I'd prefer it if althttpd would only serve content directly from --root dir when there are no .website directories at all

The problem with that is that to do that, althttpd would have to iterate through every filesystem entry in that dir looking for a .website dir. That's not difficult to do but it can be what's known as "computationally arbitrarily expensive," depending on how many files/dirs are in that directory and whether or not there are actually any .website dirs.

If it were doing this just once at startup it would not be a significant cost, but it would potentially need to do so on arbitrary requests. i hesitate to say that it could be used to facilitate a DoS attack, because i lack the "criminal creativity" for that type of thing, but perhaps it could.

(16.1) By sodface on 2024-01-18 23:46:39 edited from 16.0 in reply to 15 [link] [source]

I see, well maybe then what I suggested in the other thread and what I ended up patching in... something like a "--strict-match" flag which requires a specific domain.website match or immediately shun (reqiures --ipshun also) or drop connection, or something.

I don't really like a 404 in this case because that's like saying "yes, that domain is hosted on this server but the resource requested can't be found", which doesn't really fit the scenario.

(18) By sodface on 2024-01-19 06:32:31 in reply to 15 [link] [source]

I guess this is getting off topic for this thread but I tested this patch and I think it does what I described (though it returns not found rather than ipshunning). Borrowed heavily from this SO post.

Logic is if --port is passed as a command line argument then in addition to setting standalone to true, also glob for *.website and set dotwebsite to true if any matches are found (not currently checking that it's also a directory). Then when searching for content dirs, only return content from the --root if standalone is true and dotwebsite is false, else NotFound.

I believe this avoids the "computationally arbitrarily expensive" situation you warned about.

--- althttpd.c.orig
+++ althttpd.c
@@ -345,6 +345,7 @@
 #include <sys/sendfile.h>
 #endif
 #include <assert.h>
+#include <glob.h>
 
 /*
 ** Configure the server by setting the following macros and recompiling.
@@ -452,6 +453,7 @@
 static int useTimeout = 1;       /* True to use times */
 static int nTimeoutLine = 0;     /* Line number where timeout was set */
 static int standalone = 0;       /* Run as a standalone server (no inetd) */
+static int dotwebsite = 0;       /* True if *.website exists */
 static int ipv6Only = 0;         /* Use IPv6 only */
 static int ipv4Only = 0;         /* Use IPv4 only */
 static struct rusage priorSelf;  /* Previously report SELF time */
@@ -3353,7 +3355,7 @@
   if( stat(zLine,&statbuf) || !S_ISDIR(statbuf.st_mode) ){
     sprintf(zLine, "%s/default.website", zRoot);
     if( stat(zLine,&statbuf) || !S_ISDIR(statbuf.st_mode) ){
-      if( standalone ){
+      if( standalone && !dotwebsite ){
         sprintf(zLine, "%s", zRoot);
       }else{
         NotFound(350);  /* LOG: *.website permissions */
@@ -3853,6 +3855,13 @@
         }
       }
       standalone = 1 + (useHttps==2);
+      int dws(char const * epath, int eerrno) { return 0; }
+      glob_t globbuf = {0};
+      glob("*.website", GLOB_DOOFFS, dws, &globbuf);
+      if( globbuf.gl_pathc>0 ){
+        dotwebsite = 1;
+      }
+      globfree(&globbuf);
     }else
     if( strcmp(z, "-family")==0 ){
       if( strcmp(zArg, "ipv4")==0 ){

(19) By Stephan Beal (stephan) on 2024-01-19 14:19:10 in reply to 18 [link] [source]

glob("*.website", GLOB_DOOFFS, dws, &globbuf);

Today i learned that glob() is POSIX. i thought it was specific to glibc.

I believe this avoids the "computationally arbitrarily expensive" situation you warned about.

glob() is actually more expensive: unlike a "manual" check for a .website dir, glob() will unconditionally scan all directory entries, whereas a targeted check directly in althttpd could stop scanning at the first match.

So, wildly hypothetically, if /the-chroot-jail has 2000 files and two .website dirs, glob() will always examine 2002 entries, whereas a hand-written check might (depending on the ordering from the filesystem) check only 1 or it might go as far as 2001, stopping at the first match. Obviously, that would be a pathological case, and any argument of mine about computational costs wouldn't hold up in normal use cases.

(20) By sodface on 2024-01-19 14:51:17 in reply to 19 [link] [source]

glob() is actually more expensive

Hmmm, well I thought by putting it in the command line argument evaluation for --port and then setting the boolean that it would only run once on startup? So even if expensive, it would be a one time cost. Admittedly I get a bit confused with what happens when althttpd forks so I could be mistaken on that.

(21) By Stephan Beal (stephan) on 2024-01-19 14:54:01 in reply to 20 [link] [source]

... well I thought by putting it in the command line argument evaluation for --port...

Indeed you did - i misread it as part of the block above that, inside the per-request handling. My previous objection is null and void.

(22.1) By sodface on 2024-01-19 15:04:25 edited from 22.0 in reply to 21 [link] [source]

Now I'm wondering what path is being searched with the way I have it. I was testing with the althttpd binary in the same directory as the .website directories, I didn't test it in eg. /usr/bin.

Rookies on parade. //edit, meaning me not you!

(11) By spindrift on 2024-01-18 13:55:11 in reply to 7 [link] [source]

Hmmm. Clearly not best practice (logs in this directory) but quite commonly done I expect. Possibly with the (deliberately) terse instruction and confusion about where the log directory should like given the chroot behaviour.

I also keep my logs in this potential visible directory, but in -logs/ rather than logs/

It might be more useful for althttpd to point out that the logging location is servable. As well as the ip-ban directory? It would know this on startup.

(9) By spindrift on 2024-01-18 13:45:10 in reply to 6 [link] [source]

Thanks Stephan, that's really useful info 👍

(17) By sodface on 2024-01-18 23:42:29 in reply to 1 [link] [source]

Another minor documentation comment:

When a directory is requested and althttpd is looking for a default file to append, the current code uses:

"/home", "/index", "/index.html", "/index.cgi"

"/index" is omitted in the code comment and in the documentation.

(23) By sodface on 2024-02-10 20:52:40 in reply to 1 [link] [source]

Line numbering isn't lining up with the lines of text for me in Firefox so rather than provide line links I'll just give the filename.

althttpd.md:

thing s/b think

standalone-mode.md:

fails s/b files
they s/b the
server s/b serve (?)
comprise website s/b comprise a website

xinetd.md:

server s/b serve (?)

(24.1) By sodface on 2024-02-10 22:24:38 edited from 24.0 in reply to 1 [link] [source]

stunnel4.md:

users s/b years (?)

And:

Older versions of althttpd did not support encryption. The recommended way of encrypting website using althttpd was to use stunnel4. This advice has now changed. We now recommend that you update your althttpd to version 2.0 or later and use the xinetd technique described in the previous section.

encrypting website s/b encrypting website traffic (?)

Alternative suggestions (though they both still sound clunky to me):

Stunnel4 used to be the recommended way to encrypt website traffic with althttpd versions prior to 2.0, which lacked built-in encryption support. We now recommend using althttpd version 2.0 or later and the xinetd technique described in the previous section.

Due to the lack of built-in encryption support in althttpd prior to version 2.0, stunnel4 was recommended to encrypt website traffic. We now recommend using althttpd version 2.0 or later and the xinetd technique described in the previous section.