A public view of source intelligence depth. This is not a safety guarantee; it shows what Nipmod can inspect, where the limits are and how search quality is checked.
Benchmark
28/28 pass
MRR
1
Recall at 3
1
Blocked recommended
0
Depth
Source intelligence depth
npm
latest manifest, tarball integrity, registry signatures, lifecycle scripts, packument version intelligence, OSV advisory context, dependency count and download signal
Current
98
Target
98
Coverage
strong
PyPI
project JSON, latest release files, file hashes, yanked flags, OSV advisory context, release velocity, Simple API metadata and provenance links
Current
96
Target
96
Coverage
strong
GitHub
repository metadata plus selected manifest, security, workflow, Dockerfile, release asset, commit freshness and lockfile probes on the default branch
Current
95
Target
95
Coverage
strong
Hugging Face models
model API metadata, cardData, tags, siblings, downloads, likes, gated/private flags, commit SHA, file-shape counts, eval labels and remote-code indicators
Current
95
Target
95
Coverage
strong
Hugging Face datasets
dataset API metadata, dataset_info, features, splits, tags, siblings, data file shape, compressed archive/script warnings, downloads, likes, gated/private flags and commit SHA when returned
Current
93
Target
93
Coverage
strong
MCP
MCP registry server metadata, schema URL, remote endpoint security, repository link, status, package references and credential-scope summary when returned
Current
90
Target
90
Coverage
moderate
Benchmark
Search quality gates
Command
Run locally
pnpm search:benchmark
Snapshot
28/28 passing
Mean reciprocal rank 1, recall at 1 1, recall at 3 1.
Safety
No blocked recommendation
Benchmark cases include unsafe decoys and partial source outage behavior.
Scope
What the benchmark covers
Question
Can Nipmod choose a useful package, model, repo, dataset or MCP server before an agent moves toward external code execution?
Unit
search result and pre-install source selection
Counting
Source coverage counts benchmark cases where the source was requested; multi-source cases count toward each requested source.
Scenarios
Scenario groups are overlapping by design; one benchmark case can exercise more than one risk class.
npm
16/16 pass
PyPI
12/12 pass
GitHub
1/1 pass
Hugging Face models
2/2 pass
Hugging Face datasets
1/1 pass
MCP
2/2 pass
baseline package, model, repo or MCP selection
8 cases
partial or multi-source outage behavior
2 cases
typo, namespace, dependency confusion or source impersonation
6 cases
install, lifecycle, wallet, dataset script or credential-scope risk
5 cases
package metadata, README, long-description or model-card instruction risk
5 cases
deprecation, publisher continuity or takeover timeline risk