What are model cards?

Today machine learning models have a lot of potentials. Knowing about the usage and limitations of a model is crucial. Model cards aim to provide that information in a holistic and comprehensive way. They contain short records of various aspects of ML models.

Why do we need model cards?

Lack of documentation: Due to lack of documentation users might not be aware of the valid uses. This can have a serious impact in areas like — Healthcare, law and order, employment etc.

For example — If a healthcare model is trained on data pertaining to specific geography (assume Indians) it might not be applicable to the American population. Hence the model should not be used in the treatment of American patients. Due to lack of documentation often important information like the kind of data used in training the model seems to be missing.

Lack of transparency: Often machine learning models do not state the ethical considerations clearly. This can also lead to systematic biases in commercial ML models.

Model Card Use cases for different stakeholders:

Model Developers: Developers can compare their models to those of others and make decisions about training their system.
Policymakers: They can understand how a machine learning system may fail or succeed in ways that impact people.
ML and AI practitioners: They can comprehend in a better manner how well the model might work for the intended use cases and track its performance over time.
Impacted individuals: Those who may experience effects from a model can better understand how it works or use the information present in the model card to pursue remedies.

Different sections of Model Card:

1. Model Details: The model details entail basic information like model date, model version, model type, name of person/organization who developed the model etc.

Model date: Date on which model was developed
Model version: The current version and how it differs from older versions
Model type: The type of model. For example — The neural network model or Naïve Bayes Model etc.
Paper or other resources: This can provide resources for more information.

2. Intended Use: Intended user states who the users of the model are. These provide information about what the primary intended uses and who the primary intended users are. Further, it also provides information about what the out-of-scope uses are.

Primary intended uses: This section states whether the model was developed with general or specific tasks in mind.
Primary intended users: This helps users gain insight into how robust the model may be to different kinds of inputs. For example — The model might only be developed for entertainment purposes and not for business purposes
Out-of-scope uses: This section highlights the use case the model is not meant to be applied to or the technology that the model might easily be confused with.

3. Factors: We have a variety of important factors that impact the model. There can be relevant or evaluation factors.

Relevant factors: Foreseeable salient factors for which model performance may vary.
Evaluation factors: Evaluation factors are the ones that actually get reported.

4. Metrics: Metrics are the different model performance measures. These can be measures like error rate, accuracy etc. We can also specify the decision threshold that we are using for metrics. It is recommended to list all metrics values and provide context about which were prioritized during development and why.

5. Evaluation Data: This represents the datasets that we use to evaluate model performance. We can also mention the datasets representative of typical use cases, anticipated test cases and/or challenging cases. We also define the data preprocessed for evaluation.

6. Training Data: This represents the dataset that we use to train our model. We can advocate for basic details about the distributions over groups in the data, as well as any other details that can inform stakeholders on the kinds of biases the model may have encoded.

7. Quantitative Analysis: Quantitative analyses provides the results of evaluating the model according to the chosen metrics, providing confidence interval values when possible.

8. Ethical Considerations: Ethical considerations can be related to the model. This is usually true in case sensitive data is being used to train the model. The model can also have implications like health, safety etc. We should clearly state what risks and harms may be present in model usage.

9. Caveats and Recommendations: This section should list additional concerns that were not covered in the previous sections.