Mock goose-themed audio CAPTCHA: Next.js app in apps/web, FastAPI in backend, shared UI in packages/ui.
requack.biz [Vercel - https://re-quackcha-octccpjty-rat626s-projects.vercel.app/]
Our inspiration came from registering on Devpost for this hackathon, during which we encountered the all-too-familiar CAPTCHA. Performing “Are you a human?” checking to verify that bots cannot sign into websites takes a cumulative 500 years every single day worldwide, a process that is often frustrating and monotonous for users. We sought to make this a more engaging and enriching experience, from which QuackCHA was born. reQUACKCHA is an audio-based CAPTCHA system that still validates human login but, instead, plays audio of a specific goose species and asks users to “quack” back to the CAPTCHA. After validation, a fun fact about the goose is displayed. This way, the task becomes a little more engaging and teaches users something new about a goose species they may never have encountered before.
Shows screen of user logging into a particular website - after which CAPTCHA pops up Displays picture of goose species, along with playable reference audio of goose call Picture is pulled from research - grade iNaturalist posting Audio is pulled from xeno - canto wildlife sounds database User can record themselves imitating the call, after which it is validated for whether it is human via RMS bursts and volume modulation within 5 second window After audio is validated[doesn’t need to be exact match of bird], a fun fact of the bird is shown, and user can continue onto the desired website
Used Cursor as main IDE Created monorepo with Turbo and Tailwind CSS for HTML and styling Next.js: Pulls user audio recording via MediaRecorder/WebAudio API’s Stores bird recording in backend data Stores iNaturalist pictures, fun fact, bird name, species ID in metadata
Frontend - Tailwind CSS, shadcn/Radix UI for components Backend - FastAPI with REST endpoints for taking in species data and recordings Processes user recording in 5 second normalized window, evaluates RMS bursts and volume modulation, and runs through Silero deep learning model to differentiate speech from rhythmic taps/noise
QA - Playwright for end-end checks and pytest to check FastAPI Ran Next.js and Uvicorn simultaneously in terminal(npm run dev: all) and tested via local host Need to add Figma MCP integration with cursor
Multi - agent task split - one for frontend, one for backend, one to handle separation/handoff
Used Vercel to deploy web app and Render for FastAPI integration via backend web service saved as environment variable NEXT_PUBLIC_API_BASE_URL
We split up tasks between three agents, with two being for frontend and backend, and another used to ensure that the tasks were segregated and that handoff between them was smooth.
Throughout our project, we ran into some major/minor issues: The biggest issue was probably ensuring that our audio pipeline worked consistently, especially because the voice recording was often approved by the captcha system, even when we simply drummed on the table, and we had to experiment with using a deep - learning model, Silero, in the backend, to differentiate speech from noise after initial RMS burst/volume modulation processing. We also tried additional approaches to improve accuracy, such as ZCR, to further discriminate between the voicedness of stimuli, but found that this yielded diminishing returns, and instead opted for a simple approach.
We also found it initially difficult to communicate with the agent about the UI changes we wanted to make, for which we had to make our prompts detailed, and write “make no mistakes” at the end of each prompt to ensure correct and thorough output.
Finally, we had to make sure that initially, the agent did not make its own audio files, and ensured that the .mp3 audio files were pulled from xeno-canto and saved as static files in FastAPI to be computed alongside the user’s audio, pulled from MediaRecorder/WebAudio API’s in the browser, accessed by NextJS. Then, to extract readily accessible images, we searched iNaturalist for research-grade sightings that mapped to the geese species of a particular audio recording, and also integrated presentation of fun facts/conservation stats after a captcha was complete.
One accomplishment that we are particularly proud of is the implementation of human voice in the rather stale and tasteless system of traditional captchas in use today. By using audio, it allows the users to find captchas to be more fun and interesting rather than a source of rage and annoyance. Additionally, the use of different species of geese as the main theme of our captcha system allows for the awareness of wildlife preservation and recognition. With many geese populations (especially those classified as midcontinent light geese) showing a sharp decline over the years, our geese-themed project is able to focus on the public appreciation of the said species, contributing to an increased awareness by the general population. We were also proud of the fact that our application was built to pull real species recordings and research-grade images from open-source databases, providing the validity of our data and ability for our app to not only provide a fun alternative to captcha, but also educate users about species in a reliable manner.
What we learned
We learned about how a web application is developed, from choosing a tool to orchestrating between the frontend and backend, testing output on a local host server, creating a thorough AGENTS.md file to specify project requirements to the agent, how to think through every aspect of user design - from potential inconveniences a user may face, to simplifying the user experience as much as possible, and other skills that we hope to build upon in the future.
What’s next for reQUACKTCHA In the future, we hope to implement audio classification pipelines involving machine learning algorithms/neural networks capable of discerning human - produced speech from deepfakes, such as Siri or other chatbots, as our system currently discerns voice from background noise/rhythmic tapping at a similar frequency to the stimulus, but occasionally makes errors, and could benefit from a more robust model that runs in the browser. Additionally, just as captcha was originally used to digitize large volumes of news content, we hope that when users complete QuackCHA, the data from human voice recordings can be used to train voice generation models like ElevenLabs to generate more human-realistic content for mediums such as podcasts and audiobooks, in addition to highlighting the distinction between human and AI-generated voice, which will become increasingly relevant as improvement in generative AI models brings computer-generated text, audio, and video closer to human creation in a time where creativity is of utmost importance to maintaining human originality and authenticity, using AI as a tool for enhancing productivity and efficiency beyond what was originally possible.
| Area | Path | Role |
|---|---|---|
| Frontend | apps/web |
Next.js App Router, reQUACKCHA screens, waveform and mic capture, Web Audio decode, WAV upload to FastAPI for scoring |
| Shared UI | packages/ui |
Tailwind + shadcn/Radix components used by the web app |
| Backend (API + scoring) | backend/app |
FastAPI: species list, POST /api/score-audio/upload (WAV + heuristics + optional Silero VAD via vad_silero.py / onnxruntime), legacy JSON POST /api/score-audio, static /static audio |
| User voice processing | apps/web + backend/app/scoring.py |
Browser encodes mono 16-bit WAV; server decodes PCM, runs heuristics and Silero when silero_vad.onnx is present (see scripts/fetch_silero_model.py) |
| Reference clips (offline) | backend/scripts/fetch_xeno_canto_audio.py, backend/static/audio/ |
Xeno-Canto downloads trimmed to demo MP3s |
| Next.js API routes | apps/web/app/api |
iNaturalist photo lookup, /api/inaturalist-photo-batch, and same-origin image proxy |
In the challenge modal, “Can’t use a microphone? Match a bird photo instead.” loads a separate puzzle: a blurred iNaturalist photo of one species and three clear choices (drag or tap to match). Success uses the same Inside Swoop (/verified) flow for the correct bird. This is an accessibility / demo alternate, not a stronger anti-bot guarantee than the audio heuristics (blurred photos do not reliably block vision models).
From the repository root:
npm install
cd backend && python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt && python scripts/fetch_xeno_canto_audio.py && python scripts/fetch_silero_model.py && cd ..The second Python script downloads Silero VAD (silero_vad.onnx into static/models/, gitignored). Scoring works without it but uses heuristics only. See AGENTS.md for product and architecture detail.
Reference calls are real recordings from Xeno-Canto (trimmed for the demo). Respect each file’s license on the catalog page linked from the in-app attribution. If you cannot reach Xeno-Canto, you can generate placeholder synthetic WAVs with python scripts/build_demo_audio.py instead, then point species_data.py back at those .wav filenames (not the default MP3 layout).
Optional: copy apps/web/.env.example to apps/web/.env.local. Set NEXT_PUBLIC_API_URL if the API is not on port 8000. Species photos are loaded only from research-grade iNaturalist observations via the Next.js route /api/inaturalist-photo (curated taxon id per species, same taxon as the Xeno-Canto reference; no API key).
From the repository root (after setup above):
npm run dev:allThis starts Next.js at http://localhost:3000 and FastAPI at http://localhost:8000. Press Ctrl+C once to stop both.
Another process (often a previous uvicorn) is still bound to 8000. On macOS, see what it is:
lsof -nP -iTCP:8000 -sTCP:LISTENStop it (replace PID with the number in the second column), or stop all listeners on 8000:
kill $(lsof -tiTCP:8000 -sTCP:LISTEN)Then run npm run dev:all again.
To use a different API port without killing anything, set REQUACKCHA_API_PORT and point the web app at it:
export REQUACKCHA_API_PORT=8001
echo "NEXT_PUBLIC_API_URL=http://127.0.0.1:8001" > apps/web/.env.local
npm run dev:allUsually another next dev is still running (often on port 3000), or a stale lock was left after a crash.
- See what is using 3000 (and stop it from the terminal where it is running with
Ctrl+C, or kill it):
lsof -nP -iTCP:3000 -sTCP:LISTEN
kill $(lsof -tiTCP:3000 -sTCP:LISTEN)- If you are sure no dev server is running, remove the lock and retry:
rm -f apps/web/.next/dev/lock
# if it still fails:
rm -rf apps/web/.next/dev- Start again:
npm run dev:all
Shell alternative:
chmod +x scripts/dev-all.sh
./scripts/dev-all.shTerminal A (API):
cd backend && source .venv/bin/activate && uvicorn app.main:app --reload --host 127.0.0.1 --port 8000Terminal B (web):
npm run dev:webOpen http://localhost:3000. The browser calls http://localhost:8000 by default.
See HANDOFF.md for the API contract and env vars.
- Backend:
cd backend && pytest - E2E smoke:
cd apps/web && npx playwright test(requiresnpx playwright install chromiumonce).