Evaluate the Model

Log Loss function

Is one of the way to evaluate the model. It’s to understand your model’s uncertainity about a given prediction. Log loss enables you to measure how strongly the model believes that its prediction is accurate.

Inference : Using your model to solve real problems.

  • When you perform inference using supervised learning, you generate predictions.

  • When you perform inference using unsupervised learning, you find patterns in your data.

  • Models are made specific by the data used to train them, and therefore you need a trained model before you can start generating predictions.

Data vectorization: A process that converts non-numeric data into a numerical format so that it can be used by a machine learning model.

Silhouette coefficient: A score from -1 to 1 describing the clusters found during modeling. A score near zero indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A score approaching 1 indicates successful identification of discrete non-overlapping clusters.

Stop words: A list of words removed by natural language processing tools when building your dataset. There is no single universal list of stop words used by all-natural language processing tools.