<div dir="ltr"><div>All, Robots are learning how to learn -- should we be very afraid or just afraid?<br></div><div><br></div>The following video is a brief summary of a paper<div><b><i>Practice Makes Perfect: Planning to Learn Skill Parameter Policies</i></b><br>Tom Silver<br>23 subscribers<br>75 views  Feb 15, 2024<br>Nishanth Kumar∗, Tom Silver∗ Willie McClinton, Linfeng Zhao, Stephen Proulx, Tomás Lozano-Pérez, Leslie Pack Kaelbling and Jennifer Barry<br>The AI Institute, MIT CSAIL, Northeastern University<br>RSS 2024<br></div><div><a href="https://youtu.be/123DXatw1V8?si=yaS8UKoVZvXi_Nle">https://youtu.be/123DXatw1V8?si=yaS8UKoVZvXi_Nle</a><br></div><div><br></div><div><br></div><div>The technical paper</div><div>  Practice Makes Perfect: Planning to

Learn Skill Parameter Policies

Nishanth Kumar∗†‡, Tom Silver∗†‡, Willie McClinton†‡, Linfeng Zhao†§

,

Stephen Proulx†

, Tomas Lozano-Perez ´

‡

, Leslie Pack Kaelbling‡

and Jennifer Barry† </div><div>†The AI Institute, </div><div>‡MIT CSAIL, </div><div>§Northeastern University  </div><div>*Indicates equal contribution.</div><div><br></div><div><div class="gmail-mx-auto gmail-w-full gmail-max-w-[90%] gmail-format gmail-format-md gmail-md:format-base gmail-lg:max-w-5xl gmail-lg:format-lg gmail-format-blue gmail-dark:format-invert" style="border:0px solid rgb(229,231,235);box-sizing:border-box;max-width:64rem;font-size:1.125rem;line-height:1.77778;margin-left:auto;margin-right:auto;width:1024px;font-family:ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,"Noto Sans",sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol","Noto Color Emoji""><div class="gmail-flex gmail-justify-center" style="border:0px solid rgb(229,231,235);box-sizing:border-box;margin-bottom:0px;display:flex"><p class="gmail-text-center gmail-text-xl gmail-!mt-0 gmail-!mb-2 gmail-font-medium gmail-max-w-[100%] gmail-md:max-w-[75%]" style="border:0px solid rgb(229,231,235);box-sizing:border-box;margin:1.33333em 0px;max-width:75%;text-align:center;font-size:1.25rem;line-height:1.75rem">We enable a robot to rapidly and autonomously <em style="border:0px solid rgb(229,231,235);box-sizing:border-box">specialize</em> parameterized skills by <em style="border:0px solid rgb(229,231,235);box-sizing:border-box">planning to practice</em> them. The robot decides <em style="border:0px solid rgb(229,231,235);box-sizing:border-box">what</em> skills to practice and <em style="border:0px solid rgb(229,231,235);box-sizing:border-box">how</em> to practice them. The robot is left alone for hours, repeatedly practicing and improving.</p></div></div><div class="gmail-mx-auto gmail-w-full gmail-max-w-[90%] gmail-format gmail-format-md gmail-md:format-base gmail-lg:max-w-5xl gmail-lg:format-lg gmail-format-blue gmail-dark:format-invert" style="border:0px solid rgb(229,231,235);box-sizing:border-box;max-width:64rem;font-size:1.125rem;line-height:1.77778;margin-left:auto;margin-right:auto;width:1024px;font-family:ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,"Noto Sans",sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol","Noto Color Emoji""><br style="border:0px solid rgb(229,231,235);box-sizing:border-box;margin-top:0px"><br style="border:0px solid rgb(229,231,235);box-sizing:border-box"><div style="border:0px solid rgb(229,231,235);box-sizing:border-box"><div class="gmail-flex gmail-justify-center gmail-content-center" style="border:0px solid rgb(229,231,235);box-sizing:border-box;display:flex"><p class="gmail-font-semibold gmail-text-2xl gmail-sm:text-3xl gmail-m-1 gmail-sm:m-2" style="border:0px solid rgb(229,231,235);box-sizing:border-box;margin:0.5rem;font-size:1.875rem;line-height:2.25rem;font-weight:600">Abstract</p></div><div class="gmail-flex gmail-justify-center gmail-content-center" style="border:0px solid rgb(229,231,235);box-sizing:border-box;display:flex"><p class="gmail-text-justify gmail-font-light gmail-text-base gmail-sm:text-lg gmail-m-1 gmail-sm:m-1 gmail-max-w-[100%] gmail-sm:max-w-[640px]" style="border:0px solid rgb(229,231,235);box-sizing:border-box;margin:0.25rem;max-width:640px;text-align:justify;font-size:1.125rem;line-height:1.75rem">One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together <em style="border:0px solid rgb(229,231,235);box-sizing:border-box">parameterized skills</em>. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to <em style="border:0px solid rgb(229,231,235);box-sizing:border-box">practice</em> to maximize expected future task success. We propose that the robot should <em style="border:0px solid rgb(229,231,235);box-sizing:border-box">estimate</em> the competence of each skill, <em style="border:0px solid rgb(229,231,235);box-sizing:border-box">extrapolate</em> the competence (asking: "how much would the competence improve through practice?"), and <em style="border:0px solid rgb(229,231,235);box-sizing:border-box">situate</em> the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice.</p></div></div></div></div><div><br></div><div><a href="https://arxiv.org/pdf/2402.15025">https://arxiv.org/pdf/2402.15025</a></div><div><div><br></div><div><br></div><div>After you've watched the video a couple of times .....</div><div><br></div><div>Should we plan to be very afraid or just afraid?</div><div><br></div><div><br></div><div><br></div><div>Ted</div><div><br></div></div></div>