Object Detection Models

My models are Tensorflow models, as it is quite standard and very well-supported across many devices. There is a lot of good documentation, and an excellent selection of pre-trained models to facilitate transfer learning with good results.

The object detection models will be running on somewhat computationally limited devices (although the Nano and CM4 are quite impressive for their size), so I am sticking with the Lite version of Tensorflow. I did experiment with non-lite models on the Nano, which it has a GPU optimized for. However, for my images (I want at 640x640 to run through the models to enable my 'patterns' to work well), the fully-fledged Tensorflow models were too much for the Nano and basically froze it.

SSD MobileNet V2

To train my models, I started with SDD Mobilenet V2 as a base model. This model is pre-trained on the COCO data set, which is a big batch of some random household images.

As part of the training, it learns various things, that make it so that with just a little further customization/training, it can learn to find whatever household objects you like.

I then further trained the model on the 2,000 or so images I captured and labeled my self (ugg), using Label Studio, which is a great tool by the way. The labeling took days, and at some points my clicking finger became unresponsive, but somehow I persevered, and we now have a couple of trained models that seem to work pretty well.

Basement Model

This model is trained to find objects that happen to be sitting in my basement (which I acquired from Hobby Lobby). I selected random objects that seemed they would be easy to distinguish and somewhat symmetrical.

Windmill

The model seems to love this thing. I mean, if just a fraction of it is visible, the model finds it every time.

I think the pattern of the blades makes it obvious. This was a great choice.


Cat Tree

The model almost always finds this as well, and its symmetry make it easily identifiable from all sides.


House

This is a small cabin, and the model does a good job of finding it. However, it does get a little weird with the boundary boxes sometimes with this one.


Cone

I thought this would be great, but so far I'm less than impressed with results. We'll see.


Pineapple

This one needs some more experimentation, but the model does seem to find it pretty well.


Gazing Ball

The model finds this everywhere, even when it's not there. I have not figured what it appears similar to, but this may not be a great choice.


Speaker

This one doesn't seem to be very recognizable, but time will tell.


Doll House

This is my wildcard. I haven't actually tried it on the map yet. I don't even know where it came from (not Hobby Lobby).

Lights model

The second object detection model I am working with is trained to locate a certain model of IR lights (although it does find other lights as well). I chose a camera-mounted rechargeable IR light that is easy to acquire from Amazon.

This is the IR light the object detection model is trained to find. That is the only thing this model knows how to find.

In the case of IR Light objects, a secondary algorithm runs against the findings from the light model. All located lights and their locations are searched for patterns that match up with any of the following, which can be marked as landmarks on the map.

Identifiable IR LIght Patterns

Vertical Line

Grouping of 3 or more lights in a vertical pattern. Different landmarks can have different numbers of lights..


Square

Grouping of 4 lights in a square pattern.


Triangle

Grouping of 3 lights in a triangle pattern (oriented sideways).

Looking for patterns of lights, rather than individual objects (like on the basement model) is very effective at countering one of the shortcomings I've been finding with my detection models. The boundary boxes identified by the Tensorflow models are almost always a little to large or small, meaning the object will be reported as taller, shorter, wider, or more narrow than it actually appears in the image.

As long as the center of the object is good, the angle relative to the camera base is going to be a good reading. However, given that size is directly based on the bounding box height, that measurement is almost certainly incorrect.

When combining objects into a pattern, the size of each one doesn't matter, so much accurately estimating the center of each object. If the light pattern is a series of vertical lights, and the height of the landmark is considered to be the distance between the center of the top light and bottom light, the height (in pixels) becomes quite accurate, even with the bounding boxes being too large or small. For that reason, light-based landmarks receive a higher priority within this set of calculations.