Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train an object detection model to detect price tags from shelf images #352

Open
raphael0202 opened this issue Oct 31, 2024 · 6 comments
Open

Comments

@raphael0202
Copy link
Contributor

Open Prices is a database of food prices from around the world. For data collections, we rely on volunteers that take pictures of individual price tags or shop shelves and complete the price on the web or mobile interface.

To accelerate contribution, we would like to crop automatically the individual price tags of shelf images, so that we can extract automatically the price and the barcode in a second step (or ask other volunteers to complete the data).

To illustrate, here is an proof image of a shop shelf:

Shop shelf

The idea is to train an object detection model that detect every individual price tag (the tag where the barcode and the price are written to).

We don't have a trained dataset to do so, we need first to annotate some data (with the help of the OFF community).

We have an instance of Label Studio running, that was already used to successfully annotate data for object detection models.

We also have a CLI that's a wrapper around Label Studio, and allow to easily upload/convert/pre-annotate data for object detection models. We usually use Yolov8 models for object detections, but other models can be explored as long as they can be converted to ONNX.

@baslia
Copy link
Collaborator

baslia commented Nov 1, 2024

I have some time to look at it, is there a way to get an account for Label Studio ?
Additionally, some details on where the data is stored as well as how to access it would be helpful to assess how many data points are missing to fit a model.
Thanks!

@raphael0202
Copy link
Contributor Author

@baslia our Label Studio instance is running at https://annotate.openfoodfacts.org/

Account creation is open to everyone. We perform a daily dump of the proof table that contains all information at https://prices.openfoodfacts.org/data/proofs.jsonl.gz.

Images are stored at https://prices.openfoodfacts.org/img/{path} with {path} being the path of the image file indicated in the dump.

@baslia
Copy link
Collaborator

baslia commented Nov 3, 2024

Thank you, would you mind checking the project configuration of the project I just created: "price-tag-shelf".

If everything looks good, I will use the dump to download and load some images for the labelling task.

@baslia baslia self-assigned this Nov 3, 2024
@raphael0202
Copy link
Contributor Author

Great! I just renamed image into image_url (that's the var name we usually use), and renamed as well the label to be lowercased and separated by '-'.

For proof images, we should select proofs with the PRICE_TAG type (these proofs contain images of store shelves and images of individual price tags).
We should probably a fraction of the images that are individual price tags, so that the model recognizes them correctly, but the bulk should be shelves.

@baslia
Copy link
Collaborator

baslia commented Nov 3, 2024

Thank you.
I am trying to load a small Json file with the urls, but I am getting an error, do you know what I am doing wrong ?
Here is the Json file:

[
  {
    "data": {
      "image_url": "https://prices.openfoodfacts.org/img/0008/xGpDk1mHT8.webp"
    }
  },
  {
    "data": {
      "image_url": "https://prices.openfoodfacts.org/img/0008/A9ooDKZL1M.webp"
    }
  },
  {
    "data": {
      "image_url": "https://prices.openfoodfacts.org/img/0008/yj22mIR9Nu.webp"
    }
  }
]

@raphael0202
Copy link
Contributor Author

I've discussed it directly with @baslia, it was due to a CORS issue, it should work now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants