Data Augmentation for Object detection: Rethinking image transforms for bounding boxes



When it comes to getting good performances from deep learning tasks, the more data the merrier. However, we may only have limited data with us. Data Augmentation is one way to battle this shortage of data, by artificially augmenting our dataset. In fact, the technique has proven to be so successful that it's become a staple of deep learning systems.

This is a companion discussion topic for the original entry at


Thanks a lot for an amazing tool.
The annotation tool that I usually use saves the bounding box co-ordinates and class in an xml file. It would be really helpful if you could provide the annotation tool that works best with this.



Here are a few things you can try!

  1. Try to look for a option in your tool that can save the data in a .txt format, or if you are stuck on using the .xml format, are you able to read it? You can write a little script that converts the xml format into the format the we have specified in this post.

  2. If you are having trouble in writing the script, why don’t you upload one sample .xml file, paste the link here and I’d try to have a shot at writing the conversion script. BTW, mind sharing your annots tool.

Again, please go through the annotation format to understand what format the annotations should be in for the augmentations to work. Now, if they aren’t the way and you don’t want to convert your dataset, convert them just after loading your data. So, look at part 4 where we have shown on how to modify COCO-type annotations to fit with our library. You can do something similar with your own annotations.


Hi! I have a small question, can the library output only 4 parameters(only the x,y coordinates of 2 corners) instead of 5 parameters? If no, then kindly direct me to the part of the code which should be modifies to achieve this functionality. Thanks