Skip to content

Latest commit

 

History

History
52 lines (42 loc) · 2.13 KB

File metadata and controls

52 lines (42 loc) · 2.13 KB

data/external/

EnzyExtract 1 See scripts/alarms/alarm_hallucination.py

  • scientific notation
    • rows: 20890, PMIDs: 3802
    • The kcat/km method: parse (kcat), (km), and (kcat/km). If kcat/km does not match up with kcat and km, then something went wrong with scientific notation
      • 8692, PMIDs: 1511
  • hallucination > 0.5
    • at > 0.5, rows: 5720, PMIDS: 806
  • repetition
    • rows: 9630, PMIDs: 784

EnzyExtract 2

  • too many sigfigs ( see scripts/alarms/step2_alarm_sigfig.py)
    • 111 pmids, of which most (via manual inspection) look like they are actually correct
  • out of distribution: use the intersection of BRENDA+EnzyExtract as "super-reliable". Consider points that are out of distribution.

BRENDA

  • do the BRENDA/EnzyExtract correlation plot but color based on those above
  • BRENDA/EnzyExtract correlation plot ( see scripts/alarms/alarm_correlation.py)
    • at kcat_diff > 1.1: 4510, PMIDs: 1592
  • out of distribution: use the intersection of BRENDA+EnzyExtract as "super-reliable". Consider points that are out of distribution.
  1. Use grand_biblio to add DOI to brenda

EnzyExtract

  • scientific notation, 3802 PMIDs
  • scientific notation with kcat, km, and kcat/km: 1511 PMIDs
    • can give LLM the "calculation" tool`
    1. verify that the kcat and Km values match what is provided in the image.
    2. Use the calculation tool to ensure that indeed the purported
    3. If not, try flipping the signs of all exponents (for instance, 4 x 10^5 to 4 x 10^-5) and try again.
  • hallucination threshold > 0.5: 806 PMIDs
  • repetition threshold > 0.5: 784 PMIDs
  • too many sigfigs: 111 PMIDs, of which most (via manual inspection) look like they are actually correct

BRENDA

  • BRENDA/EnzyExtract correlation plot: 1592 PMIDs kcat differs more than 1.1-fold
  • out of distribution: use the intersection of BRENDA+EnzyExtract as "super-reliable". Consider points that are out of distribution.

Flags

  • abbreviated substrate