<html><head></head><body><div class="yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div dir="ltr" data-setdir="false">Follow-up to a question in today's Zoom Chat : (Chat-chat?)</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false"><div><div>In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably". An overfitted model is a statistical model that contains more parameters than can be justified by the data. The essence of overfitting is to have unknowingly extracted some of the residual variation (i.e. the noise) as if that variation represented underlying model structure.</div><div><br></div><div>Underfitting occurs when a statistical model cannot adequately capture the underlying structure of the data. An under-fitted model is a model where some parameters or terms that would appear in a correctly specified model are missing.</div></div><div><br></div><div dir="ltr" data-setdir="false">(Quoted from Wikipedia)</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false"><a href="https://www.kaggle.com/getting-started/166897" rel="nofollow" target="_blank" class="">https://www.kaggle.com/getting-started/166897</a><br></div><div><br></div><div dir="ltr" data-setdir="false"> (Graphical examples of overfitting and underfitting in statistical analysis.)</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false"><span>Underfitting is when you have high bias and high variance in your model. So the model learns nothing from the training data (low training score aka high bias) and predicts poorly on the test data (low variance). You get underfitting when your model is too simple for the data or the data is too complex for your model to understand.</span><br></div><div dir="ltr" data-setdir="false"><span><br></span></div><div dir="ltr" data-setdir="false">Overfitting is when you have low bias and high variance. So the model learns everything from the training dataset (high train score aka low bias) but is not able to perform [<i>well</i>] on the test set (low test score aka high variance) You get overfitting when your model is too complex for the data or your data is too simple for the model. <br></div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">(Adapted from portions of this thread:</div><div dir="ltr" data-setdir="false"><a href="https://datascience.stackexchange.com/questions/100089/what-do-under-fitting-and-over-fitting-really-mean-they-have-never-been-cle" rel="nofollow" target="_blank">https://datascience.stackexchange.com/questions/100089/what-do-under-fitting-and-over-fitting-really-mean-they-have-never-been-cle</a> ) <br></div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">-- Bob Primak </div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false"><br></div><div><br></div><div><br></div><div dir="ltr" data-setdir="false"><br></div><br></div></div></body></html>