FEAT add DecodingTrust Machine Ethics (Jiminy CSV) dataset loader (subtask of #291)

### Subtask of #291

Following the Toxicity perspective (#1798 → #1821), this is the next remaining DecodingTrust sub-task. DecodingTrust [1] ships a human-annotated subset of the Jiminy Cricket benchmark [2] at
[`/data/machine_ethics/jiminy_subset.csv`](https://github.com/AI-secure/DecodingTrust/blob/main/data/machine_ethics/jiminy_subset.csv) — 2091 actions extracted from text-adventure games, each labelled by humans with a structured `Morality` field.

#### Data
- `jiminy_subset.csv` — 2091 rows. Columns of interest: `Description` (clear English summary of the action), `Neighboring text` (Zork ZIL source snippet), `Morality` (label), plus traceability fields `File` / `Line`.
- `Morality` follows the pattern `{good|bad}, {self|others}, {1|2|3}` (intensity 1-3); 50+ rows carry multiple labels joined by `\n`; 245 rows are unlabelled (neutral baseline).
- The folder also contains `jiminy_train.json` (1000) and `jiminy_test.json` (4000), but the embedded label-vector schema (`[0, 1, 0, 0]`) is **not documented** in this repo. I'm leaving those for a follow-up sub-issue to avoid mis-mapping.

#### Proposed loader
- `_DecodingTrustMachineEthicsDataset(_RemoteDatasetLoader)` mirroring the just-shipped Toxicity loader's structure.
- Parameters:
  - `morality: Literal["bad", "good", "neutral", "all"] = "bad"` — default matches the red-teaming convention agreed for Toxicity on #1798.
  - `min_intensity: int = 1` — keeps rows whose max label intensity is at or above the threshold (1-3).
- Per-row `harm_categories` derived from `Morality`: `bad_to_self`, `bad_to_others`, `good_to_self`, `good_to_others` (source's terminology).
- `value = Description` (plain English). `Neighboring text` and intensity preserved in `SeedPrompt.metadata` for reproducibility.
- Source URL pinned to commit `161ae8321ced62f45fcd9ceb412e05b47c603cd4` (same pin as the Toxicity loader).
- Unit tests mock `_fetch_from_url`, mirroring `tests/unit/datasets/test_decoding_trust_toxicity_dataset.py`.

#### License & attribution
Same approach as Toxicity (confirmed by @romanlutz on #1798): runtime fetch from `raw.githubusercontent.com` (no redistribution) + full attribution to both DecodingTrust and Jiminy Cricket authors in the class docstring and per-`SeedPrompt` `authors` / `groups`.

#### References
1. Wang et al., 2023. *DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.* https://arxiv.org/abs/2306.11698
2. Hendrycks et al., 2021. *What Would Jiminy Cricket Do? Towards Agents That Behave Morally.* https://arxiv.org/abs/2110.13136

> ⚠️ **Content warning:** the dataset describes harmful actions (self-harm, violence, theft, deception) extracted from text-adventure games — standard for safety/ethics evaluation but worth flagging.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT add DecodingTrust Machine Ethics (Jiminy CSV) dataset loader (subtask of #291) #1828

Subtask of #291

Data

Proposed loader

License & attribution

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

FEAT add DecodingTrust Machine Ethics (Jiminy CSV) dataset loader (subtask of #291) #1828

Description

Subtask of #291

Data

Proposed loader

License & attribution

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions