Evaluation-reset loop on AWS DeepRacer (Beginners beware)

Evaluation-reset loop on AWS DeepRacer (Beginners beware)

Posted by Scott Phillips on 2nd Jan 2023

With Cloud Astronauts, we love exploring AWS services that help users learn something new and powerful about either Cloud or AI.  In the case of AWS DeepRacer, anyone can learn about Cloud and AI and experiment with creating a simple version of a self-driving car using Reinforcement Learning (RL), a type of Machine Learning.

What AWS has done is make it incredibly easy and user friendly to learn about RL and watch a self-driving car learn how to navigate a track.  You can then go deeper by learning to change the reward function to help the model learn faster and change hyperparameters that control the learning process (number of cycles, etc.).

Our ‘Mission 6:  Mars Rover AI Training’ challenges users with a scenario that involves the Mars Regolith Mining Rovers on a future Mars Colony outpost that are corrupted and need to be retrained to haul ore from the regolith mines back to the Mars Colony for processing.  We show users how to work on AWS to develop an autonomous Rover using AWS DeepRacer as a tool to develop that first rover model.  It’s a great way to jump in as a complete beginner and we document every step you need to take.  AWS makes it easy for new users to learn using AWS DeepRacer by giving every new user 10 hours of free DeepRacer training and evaluation time in their first 30-days from the first time they kick off a DeepRacer model.  Ten hours is enough to build 10-15 models, depending on how much training time you use.

However, we have found one issue.  When running an Object Avoidance race, one where there are fixed objects (bricks) sitting on the track that the rover has to learn to see and avoid, we have found an error condition during evaluation that could use up your free DeepRacer time if you are not careful.  When a new model with very little training time (ex. 30 minutes) is used (note:a new beginner is likely to run this scenario to get started) in evaluation, we have found a situation where the Rover hits the object, resets to start over a few spaces back from the object, and then hits the object again.  It will then reset, follow the same path causing it to hit the object again, and continue resetting in what appears to be an ‘Evaluation-Reset Loop’.

We have seen this multiple times.  It is not an issue in training because when you train a model on DeepRacer, you set a stop condition on the clock for your training time.  You won’t ever use more training time than you have budgeted.  But the evaluation process does NOT have a stop limit.  It will attempt to run a full set of evaluation trials (laps) around the track depending on how many you, the user, have defined and it has no way to stop if it gets into a loop.  

We have seen this resetting go on for nearly 10 minutes and over 100 resets with no indication that it will ever stop.  See the example picture below (in this case, the rover kept hitting the block in the road, resetting, and, then, hitting the block in the road again and again 46 times and did not stop until we selected the ‘Stop evaluation’ button):

Reaching out to AWS technical support, we have alerted them to this issue, but we do not know if or when they will fix it.  Until then, all beginners should be aware that, until you have created a very strong model that recovers well in an object avoidance trial, you should be extra diligent and manually monitor your model during the evaluation period.  Then, just manually select the ‘Stop evaluation’ button if you see this happening to your racer.

Ironically, this simple reset loop may show, in very simple fashion, one of the great fears of many who worry that advanced AI models could wreak havoc in the real-world if there are not proper safeguards.   

For all that, we really love the AWS DeepRacer service and the chance to learn and experiment with Reinforcement Learning.  We think everyone should try this!  But we do want to call out this issue.  It is most likely to impact new participants just starting out and we don’t want any new users to get discouraged or lose all of that precious Free Tier time on DeepRacer because of an infinite loop.

So, when you are just getting started, watch that model during evaluation!  Don't walk away until the evaluation is done!

That’s all for now. Good luck with AWS DeepRacer.

Cloud Astronauts