Evaluate the Model

Log Loss function

Is one of the way to evaluate the model. It’s to understand your model’s uncertainity about a given prediction. Log loss enables you to measure how strongly the model believes that its prediction is accurate.

Inference : Using your model to solve real problems.

When you perform inference using supervised learning, you generate predictions.
When you perform inference using unsupervised learning, you find patterns in your data.
Models are made specific by the data used to train them, and therefore you need a trained model before you can start generating predictions.

Data vectorization: A process that converts non-numeric data into a numerical format so that it can be used by a machine learning model.

Silhouette coefficient: A score from -1 to 1 describing the clusters found during modeling. A score near zero indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A score approaching 1 indicates successful identification of discrete non-overlapping clusters.

Stop words: A list of words removed by natural language processing tools when building your dataset. There is no single universal list of stop words used by all-natural language processing tools.