Skip to content

Web Bot Auth#1609

Open
mookums wants to merge 10 commits intomainfrom
web-bot-auth
Open

Web Bot Auth#1609
mookums wants to merge 10 commits intomainfrom
web-bot-auth

Conversation

@mookums
Copy link
Contributor

@mookums mookums commented Feb 19, 2026

This adds support for validating as a Web Bot using the Web Bot Auth protocol.

@mookums mookums marked this pull request as ready for review February 27, 2026 15:34
@mookums
Copy link
Contributor Author

mookums commented Feb 27, 2026

This works now, tested against https://crawltest.com/cdn-cgi/web-bot-auth.

@mookums
Copy link
Contributor Author

mookums commented Mar 2, 2026

Just a note: In order to test this, you need to stand up some infrastructure for validating your WebBotAuth keys.

Here is a Python script that will generate your private key and JWKS JSON file. This file needs to be served from your website at at yourdomain.com/.well-known/http-message-signatures-directory. The content type of this file must be application/http-message-signatures-directory+json. Two headers must also be attached to the response, these being Signature-Input and Signature. Here is a CloudFlare worker I used to properly serve the JWKS JSON file.

An easy way to test is to hit https://crawltest.com/cdn-cgi/web-bot-auth and it is what is recommended by CloudFlare. You should expect a 401 response (as you have WebBotAuth properly configured and just not registered with CloudFlare).

zig build run -- fetch https://crawltest.com/cdn-cgi/web-bot-auth \
  --web_bot_auth_key_file "./private_key.pem" \
  --web_bot_auth_keyid "$KEYID" \
  --web_bot_auth_domain "$DOMAIN"

transfer._redirecting = false;

if (status == 401 or status == 407) {
if ((status == 401 or status == 407) and transfer.client.use_proxy) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this added condition correct? I think it breaks authentication interception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fixed an issue I was running into (where we would misinterpret the 401 returned by the crawltest page). I can recheck this, do we have any specific authentication interception tests to run against?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some auth tests in demo, but I don't know if they test this specific path, @krichprollsch ?

for (b64) |ch| {
if (ch != '\n' and ch != '\r') {
clean[clean_len] = ch;
clean_len += 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that it can never be > 4096?

}

var der: [128]u8 = undefined;
const decoded_len = try std.base64.standard.Decoder.calcSizeForSlice(clean[0..clean_len]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't decoded_len > 128?

app.robots = RobotStore.init(allocator);

app.http = try Http.init(allocator, &app.robots, config);
if (config.webBotAuth()) |wba_cfg| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the app to own this and not the Http? (I think I have the same question about robots).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put WebBotAuth in App just because that's where Robots was. Robots being in the App was something that Pierre preferred from what I remember. @krichprollsch

return true;
}

if (std.mem.eql(u8, "--web_bot_auth_domain", opt)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these have well know meaning to users, but should we validate these? They're used very specifically, like this shouldn't include the protocol and shouldn't include the trailing slash. I realize that's what a "domain" is..but..

Also, keyid has to be base64 encoded. WE'll generate invalid JSON if this isn't well formed. Will that be an issue? Or will things just fail gracefully?

@mookums mookums force-pushed the web-bot-auth branch 2 times, most recently from ea056be to 7c2df23 Compare March 4, 2026 14:10
@karlseguin
Copy link
Collaborator

PR has gotten messed up...I think I've seen this before and it's just a github issue? But it makes it hard to review :/

@mookums
Copy link
Contributor Author

mookums commented Mar 6, 2026

The issue seems to be fixed? I just rebased again against main and I guess that was fine?

@krichprollsch
Copy link
Member

krichprollsch commented Mar 8, 2026

I deployed a validation server on a temp URL for now: https://lightpanda-wbauth.fly.dev/.well-known/http-message-signatures-directory

$ http-signature-directory https://lightpanda-wbauth.fly.dev/.well-known/http-message-signatures-directory
{
  "success": true,
  "message": "HTTP signature directory is valid!",
  "details": {
    "url": "https://lightpanda-wbauth.fly.dev/.well-known/http-message-signatures-directory",
    "keys_count": 1,
    "validated_keys": [
      {
        "thumbprint": "_f1ywmr7gZ-T2JBMHEAV8FRE7ntLjHFanj9kDccmacU",
        "valid": true,
        "signature_verified": true,
        "raw_key_data": {
          "kty": "OKP",
          "crv": "Ed25519",
          "x": "o7omQIOBiRF7qGjLbAA2BI2-hgxq5-AWFOgN8VcZf3w"
        },
        "error": null
      }
    ],
    "errors": [],
    "warnings": []
  }
}

Can we keep this fly.io subdir? Or should I create a https://test.wbauth.lightpanda.io? I have the argument 🤦

$ zig build run -- fetch https://crawltest.com/cdn-cgi/web-bot-auth --log_level debug   \
--web_bot_auth_key_file "private_key.pem" \
--web_bot_auth_keyid "_f1ywmr7gZ-T2JBMHEAV8FRE7ntLjHFanj9kDccmacU" \
--web_bot_auth_domain "lightpanda-wbauth.fly.dev"

DEBUG app : startup . . . . . . . . . . . . . . . . . . . . . [+0ms]
      mode = fetch
      dump_mode = null
      url = https://crawltest.com/cdn-cgi/web-bot-auth
      snapshot = false

DEBUG page : page.init . . . . . . . . . . . . . . . . . . .  [+17ms]

DEBUG scheduler : scheduler.add . . . . . . . . . . . . . . . [+19ms]
      name = page.runIdleTasks
      run_in_ms = 200
      low_priority = true

DEBUG browser : create page . . . . . . . . . . . . . . . . . [+19ms]

INFO  page : navigate . . . . . . . . . . . . . . . . . . . . [+19ms]
      url = https://crawltest.com/cdn-cgi/web-bot-auth
      method = GET
      reason = address_bar
      body = false
      req_id = 1
      type = root

DEBUG page : navigate header . . . . . . . . . . . . . . . .  [+83ms]
      url = https://crawltest.com/cdn-cgi/web-bot-auth
      status = 401
      content_type = text/plain
      type = root

Here is the private repository: https://github.com/lightpanda-io/wbauth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants