Replies: 14 comments 26 replies
-
|
Nice initiative. Will try to take part. |
Beta Was this translation helpful? Give feedback.
-
|
hi! Can we get some background context on the datasets being used? I'm just curious as to what the data represents and if there's any data dictionary |
Beta Was this translation helpful? Give feedback.
-
|
Does the 6 hour time limit include training the generator model from scratch? Or is that just the synthetic data generation time. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @mplatzer, is Apple M architecture not supported with the |
Beta Was this translation helpful? Give feedback.
-
|
where to submit file |
Beta Was this translation helpful? Give feedback.
-
|
hi, I have made a submission more than 10 hours, it's still show this |
Beta Was this translation helpful? Give feedback.
-
|
When I am submitting my sequential data challenge, it says that the output csv file should have only 20,000 records. But, I see that the original file has around ~154k records. Does this mean that when we are generating the synthetic data, we just need to generate 20k records on the generator we trained or do we need to rain on randomly selected 20k records from the original dataset? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @mplatzer ! I'm working on the Flat Track challenge and have a quick question about the NNDR values coming out of the mostlyai-qa toolkit. When I run qa.report(), I'm seeing two different NNDR figures: In the JSON output (from metrics.model_dump_json()), the distances.nndr_trn_hol field gives one value (e.g., for a recent run of mine, it was around 0.063). Could you help clarify which of these NNDR values is the one we should be using for the official leaderboard ranking and to make sure we're meeting the > 0.5 privacy target for NNDR? Is it the nndr_trn_hol from the JSON, or the one displayed in the HTML summary? Just want to make sure I'm tracking the right number! Thanks a bunch, |
Beta Was this translation helpful? Give feedback.
-
|
📣 To provide full transparency, we are now keeping an up-to-date list of all submissions available for everyone. E.g. you can analyze these submissions via the Assistant by following that link.
|
Beta Was this translation helpful? Give feedback.
-
|
I want to try out the MOSTLY AI web app! Is there a way to buy the credits? I don't see an option |
Beta Was this translation helpful? Give feedback.
-
|
Dear participants, It’s exciting to see the final days of Stage 1 of The MOSTLY AI Prize coming up! Here are a couple of details on the further process:
Note, that all corresponding Stage 2 synthetic datasets, code submissions, and evaluation scores will be recorded and shared publicly, for maximum transparency. On behalf of the jury, wishing you all the best for the final days, Michael |
Beta Was this translation helpful? Give feedback.
-
|
Stage 1 concluded 🎉 Big kudos to Everyone who participated! You are fantastic, and helped push the limits further. We highly encourage all of you to share your approaches or your experience with the community, so that each can learn from each other. And of course big congratulations to the Top 5 of the Final Leaderboards!
You will now advance to Stage 2, and have time until July 6, 23:59 UTC to share your code submission for final evaluation. Please grant access to your (initial private) repositories to the members of the Jury Board: @mplatzer @scriminaci @psitronic @suhaskowshik @adivekar-utexas @shree-gade. All the best for Stage 2! 🤞 |
Beta Was this translation helpful? Give feedback.
-
|
The competition has concluded and we have a single winner for both FLAT and SEQUENTIAL challenge: @Gandagorn ! 🎉 Once again, big congratulations and a big THANK YOU to all participants, for pushing the boundaries of synthetic data further, and achieving a new state-of-the-art accuracy for large scale synthetic datasets! ⭐ Open Source for the Win! 💪 All datasets, all results as well as links to all open-sourced code submissions can be found at mostly-ai/the-prize-eval! 🙌 |
Beta Was this translation helpful? Give feedback.
-
|
Also sharing here for wider visibility: The challenge datasets were derived from these sources:
We had to heavily obfuscate column names and values to prevent re-identification of those data sources to ensure a fair competition. The datasets were picked to push the boundaries in terms of scale and rich real-world patterns. We would have loved to scale it even further, but the privacy evaluations became the bottleneck, as for DCRs every training record had to be compared to every synthetic record, which exhibits non-linear scaling in terms of compute.
|
Beta Was this translation helpful? Give feedback.















Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
👉 mostlyaiprize.com: Join the synthetic data challenge, and win up to $100k.
Feel free to use this discussion thread for questions and/or feedback of any kind.
Beta Was this translation helpful? Give feedback.
All reactions