COCO and VOC use different protocols for assigning tp / fp labels for predicted boxes.
VOC uses "greedy" strategy, i.e. finds best match (using IoU criteria) for current pred box and if it is already matched it marks the current pred box as false: https://github.com/weiliu89/VOCdevkit/blob/master/VOCcode/VOCevaldet.m#L94
While in COCO the search continues if the current best match already matched: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L280
Also, there are other categories for gt boxes: "crowd" (which look like VOC's "difficult") and "ignore". There might be other differences as well.
Your code implements only VOC-style evaluation, while README suggests to use it for both flavors.
COCO and VOC use different protocols for assigning tp / fp labels for predicted boxes.
VOC uses "greedy" strategy, i.e. finds best match (using IoU criteria) for current pred box and if it is already matched it marks the current pred box as false: https://github.com/weiliu89/VOCdevkit/blob/master/VOCcode/VOCevaldet.m#L94
While in COCO the search continues if the current best match already matched: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L280
Also, there are other categories for gt boxes: "crowd" (which look like VOC's "difficult") and "ignore". There might be other differences as well.
Your code implements only VOC-style evaluation, while README suggests to use it for both flavors.