What is a ClawHub skill?

A ClawHub skill is an extension for the open-source AI agent OpenClaw. Skills expand OpenClaw with new capabilities, like reading PDFs, querying databases, or integrating with external APIs. They're published on the ClawHub marketplace and can be installed with a single command.

How many ClawHub skills are dangerous?

Calling them "dangerous" outright isn't accurate. Our static scanner finds at least one security issue in 48.4% of the 16,797 skills we examined. But that doesn't automatically mean these skills are malicious. They're technical signals that warrant a manual review. Without semantic analysis and runtime testing, no scanner can make a definitive call about malware.

What do the severity levels Critical, High, and Medium mean?

Critical marks patterns that can directly lead to a security incident, like hardcoded credentials or clear remote code execution patterns. High covers serious signals like suspicious network calls, homoglyphs, or unpinned dependencies. Medium is less critical but still noteworthy, like certain file accesses. The classification refers to the pattern, not the entire skill.

How can I spot a dangerous skill?

Open the SKILL.md and look at the scripts that get executed. Watch out for unpinned installs (pip install without a version), calls to external IP addresses, sudo commands, reading SSH keys or .env files, and hidden instructions in comments or Markdown. If the publisher isn't verified and the skill has many findings in critical categories, don't install it.

Why does Obfuscation dominate the hit statistics?

The Obfuscation category accounts for 55.6% of all instances because a single skill can produce thousands of hits in that category. The scanner counts every suspicious string individually. At the skill level, only 6.8% of skills are affected. That's the difference between "how many instances in total" and "how many skills are affected".

Who conducted the analysis?

The analysis is based on the public GitHub mirror of the ClawHub skills. I used a deterministic static scanner that relies on regex and heuristics. The raw data was processed by a refinement filter that removes typical false positives from tutorial documentation (161,000 homoglyphs, 47,000 code fences, 54,000 HTTP examples). For binding security decisions, we still recommend a manual review with a sandbox.

ClawHub Skills Analyzed: 48% Show Security Issues

OpenClaw is the fastest-growing open-source project in history. Alongside the tool, ClawHub has gone viral too. It's the companion marketplace for skills (extensions that expand OpenClaw with new capabilities). There are now more than 44,000 skills. Anyone can upload one. There's no mandatory security review.

That made me curious.

How safe are these skills really? I downloaded 16,797 ClawHub skills from the public GitHub mirror and analyzed them with a static security scanner. The result is clear. Almost half of all skills show at least one security finding.

Important note upfront.

These are technical signals from a rule-based scanner, not a malware diagnosis. Every finding needs a manual review to know whether it's real or a false positive. Even so, the numbers paint a clear picture.

Installing skills blindly comes with risk. And that risk is higher than most people think.

TL;DRKey Takeaways

16,797 ClawHub skills scanned, 48.4% with at least one security finding
182,258 total hit instances: 14.7% critical, 76.6% high, 8.7% medium
Supply chain risks (21.4%) and data exfiltration (20.1%) are the most widespread

1. What is the ClawHub?

The ClawHub is the official marketplace for OpenClaw. Skills are to the ClawHub what apps are to the App Store. Small packages that add new capabilities to OpenClaw. That might be a PDF parser, a database connector, a wrapper around an external API, or a full workflow.

Installation is dead simple. One command in the terminal, and the skill is ready to go. That's exactly what makes the ClawHub so attractive. And at the same time so risky.

A few numbers for context:

Over 44,000 skills in the marketplace
12,400 active skill developers
Only 6.8% of developers are verified
Over 2.3 million skill installations so far
127 downloads per skill on average

The problem with that:

OpenClaw is growing faster than the security review can keep up. To this day, there's no mandatory security review for new skills. Anyone can upload a skill, and anyone can download one. That's open source in its purest form. But it's also an attack surface.

2. The Methodology Behind the Analysis

For this analysis, I didn't download the skills directly from the ClawHub marketplace. Instead, I used the public GitHub mirror (clawhub-skills-repo). The repo mirrors all skills as a folder structure, which means you can rebuild the entire database locally with a single clone. I also tried downloading around 168 skills straight from the marketplace. Most of them came back with HTTP 429 (rate limit) errors. That approach wasn't practical for 16,000+ skills.

In the end, I had 16,797 skills from the GitHub mirror that could be scanned successfully.

There's one caveat though.

The mirror is a snapshot. Skills that have been removed from the official marketplace in the meantime can still be in the mirror. And new skills uploaded after the last mirror sync are missing.

The scanner itself is a deterministic tool. That means it works with fixed rules (regex and heuristics), no LLM, no probabilities. Every line of code and every file gets checked against a list of known risk patterns. When the scanner finds a match, it logs a finding, categorizes it, and assigns a severity level.

The categories are:

Supply Chain: unpinned pip/npm installs, curl-to-bash, package tampering
Data Exfiltration: SSH key accesses, hardcoded endpoints, browser data reads
Privilege Escalation: sudo, capability abuse, root execution
Persistence: cron jobs, profile modifications, CLAUDE.md writes
Destructive Patterns: rm -rf, shutil.rmtree, mass deletions
Code Execution: eval, shell pipes, reverse shells, unsafe deserialization
Hardcoded Secrets: embedded API keys, passwords, tokens
Prompt Injection: attempts to override roles or safety instructions
Obfuscation: homoglyphs, unicode tricks, unusual encodings
SSRF, Sandbox Escape, Suspicious Files, Resource Exhaustion as additional categories

On top of that, I used a refinement filter that removes known false positives. Hits inside Markdown code blocks (typical tutorial examples like curl commands), homoglyphs in multilingual documentation, and HTTP examples in .md files get filtered out. In total, the refinement filter removed 261,451 instances. Without it, the numbers would be much higher but also much less meaningful.

Note

These numbers are indicators, not proof. A scanner hit isn't a verdict. For binding security decisions, you need manual code reviews and runtime tests in a sandbox. The scanner can miss real risks, and it can flag harmless documentation as risky.

3. The Results at a Glance

The most important number from the analysis shows how the 16,797 scanned skills split between “with finding” and “no findings”:

No findings: 8,668 skills

With security finding: 8,129 skills

Source: own analysis, static scanner with refinement filter

CC BY 4.0

gradually.ai

Of the 16,797 scanned skills, 8,129 have at least one remaining security finding. That's 48.4%. Nearly every second skill on the ClawHub shows at least one technical warning signal.

The remaining 51.6% have no findings left after the refinement filter.

But here's the catch.

That doesn't mean these skills are automatically safe. It only means the scanner, with its fixed rules, doesn't find anything anymore. An attacker who hides their patterns cleverly slips right through.

The total hit count is equally striking. 182,258 individual hit instances in total. One skill can have multiple findings at once. That's why some publishers later in the analysis show tens of thousands of hits, even though they only published a handful of skills.

26,807 findings were classified as critical. That's 14.7% of all instances. Those include clearly dangerous patterns like embedded credentials, remote code execution, and hardcoded endpoints that look like data exfiltration.

4. Severity Breakdown

The distribution by severity shows how critical the findings are overall:

Source: own analysis, static scanner with refinement filter

CC BY 4.0

gradually.ai

High dominates by a wide margin. 76.6% of all findings fall into this category. A good chunk of that comes from obfuscation patterns like homoglyphs (Cyrillic or Greek characters that look like Latin ones) and unpinned installs. Neither of these is proof of malicious intent on its own, but they belong to the typical warning signs you should look at more closely.

Critical accounts for 14.7%. That's 26,807 individual findings spread across all skills. We're talking about things like hardcoded API keys, embedded passwords, shell pipes with eval, or direct execution of third-party scripts via curl and bash.

Medium sits at 8.7%. Those are findings that are noteworthy but not directly critical. Certain file accesses, configuration changes, or smaller privilege requests. The scanner caught them, but a manual review might classify them as harmless.

5. The Threat Categories

Now it gets interesting. The scanner hits can be split into 13 categories. And there are two completely different angles worth looking at.

5.1. By Share of Affected Skills

The chart below shows what percentage of the 16,797 skills have at least one hit in each category:

Source: own analysis, static scanner with refinement filter

CC BY 4.0

gradually.ai

Supply Chain is on top with 21.4%. About one in five skills contains patterns that point to unpinned installs, curl-to-bash, or package tampering. That isn't surprising. Many legitimate skills expect software to get installed on the fly. That's normal for automation, but it increases the supply chain risk when users execute commands blindly.

Data Exfiltration follows closely at 20.1%. About one in five skills contains patterns that look like data being sent out. That could be an access to sensitive directories, a call to a hardcoded endpoint, or a network operation that isn't immediately explainable. Important to note, these are patterns, not proof.

Privilege Escalation (12.1%), Persistence (9.7%), and Destructive Patterns (9.0%) follow. These three categories describe how a skill might reach beyond its intended purpose and touch the system.

5.2. By Share of All Hit Instances

This second angle shows which category produces the most hits. A single skill can have thousands of hits here, so the distribution looks very different from the first chart:

Source: own analysis, static scanner with refinement filter

CC BY 4.0

gradually.ai

Obfuscation dominates with 55.6%. That's an interesting finding. Only 6.8% of skills are affected at all. But those few skills contain thousands of individual hits per file. Homoglyphs, unicode tricks, and unusual encodings get used in bulk. Whether that's intentional (to hide functionality) or just multilingual documentation, you can only tell with a manual review.

Data Exfiltration is in second place with 17.3% of findings. Supply Chain is third with 7.0%. Then Privilege Escalation (5.9%), Persistence (3.9%), and Destructive Patterns (3.4%) follow.

Note

The category percentages in both charts don't add up to 100%. A skill can show up in multiple categories at once. And the two perspectives (“share of skills” and “share of instances”) answer different questions. The skill view shows how widespread a problem is. The instance view shows where the most individual hits are.

6. What the Most Important Categories Actually Mean

The abstract category names only go so far. Here's a short explanation of the six most common categories, with concrete examples. So you get a rough idea of what the scanner is actually reacting to.

6.1. Supply Chain (21.4%)

The scanner flags install instructions that skip version pinning. Commands like pip install requests instead of pip install requests==2.31.0. Or shell pipes like curl https://example.com/install.sh | bash. These commands are normal in the open source world, but they have a catch. If the package or script gets compromised later, anyone installing it fresh gets compromised too.

6.2. Data Exfiltration (20.1%)

This is about patterns that look like data being sent out. Accesses to ~/.ssh/, reading .env files, hardcoded IP addresses like http://127.0.0.1:8765, or suspicious HTTP requests to endpoints outside the official providers. After the refinement filter, what's left is mostly hits outside Markdown code blocks.

6.3. Privilege Escalation (12.1%)

Every time a script uses sudo, requests root privileges, or changes system configurations, it lands in this category. For system tools, that's completely normal. A skill that installs packages often needs sudo. A skill that just parses PDFs doesn't. The scanner can't tell the difference. You can, if you look at the code.

6.4. Persistence (9.7%)

Persistence means a skill embeds itself in the system long-term. Classic patterns are cron jobs, modifications to shell profiles (.bashrc, .zshrc), or new entries in autostart folders. Writes to CLAUDE.md files also fall into this category. Persistence isn't inherently bad, but it should always be transparent.

6.5. Destructive Patterns (9.0%)

Mass deletions with rm -rf, shutil.rmtree, or similar commands. Sometimes that's legitimate (a cleanup skill has to be able to delete things). Sometimes it's an accident waiting to happen. And in the worst case, it's intentional.

6.6. Code Execution (8.8%)

Unsafe execution of third-party code. eval(), exec(), unsafe deserialization with pickle, reverse shells, shell pipes with user input. The scanner knows the typical patterns and flags them. When in doubt, a hit here means you should stay away until you've actually read the code.

7. The Most Flagged Skill Publishers

Aggregating findings per publisher paints an interesting picture. A single publisher (pepe276) is responsible for 37,978 findings. That's 20.8% of all findings across the entire ClawHub. The table below shows the top 20 most flagged publishers, sorted by total findings:

Publisher	Total findings	Critical	High	Medium	Main risk
`pepe276`	37,978	19	37,948	11	Flagged
`cooperun`	5,201	4,982	192	27	Critical
`yuangu260`	4,582	26	4,550	6	Flagged
`jimliu`	4,575	6	4,555	14	Flagged
`ciklopentan`	4,235	2	4,233	0	Flagged
`keenone`	2,681	3	2,677	1	Flagged
`horosheff`	2,441	0	2,440	1	Mostly High
`qiumr`	2,421	6	2,412	3	Flagged
`snail3d`	2,149	363	1,650	136	Critical
`yoborlon-alpha`	2,059	1	2,058	0	Flagged
`deerleo`	1,687	8	1,675	4	Flagged
`keeper1978`	1,591	0	1,591	0	Mostly High
`mirra87654321`	1,319	1	1,318	0	Flagged
`satoshistackalotto`	1,250	0	1,250	0	Mostly High
`rsvbitrix`	1,230	1	1,229	0	Flagged
`h8kxrfp68z-lgtm`	1,161	6	1,154	1	Flagged
`mixx85`	1,041	25	1,012	4	Flagged
`chorus12`	1,008	0	1,008	0	Mostly High
`s7cret`	950	0	938	12	Mostly High
`offflinerpsy`	941	1	938	2	Flagged

The full list covers 50 publishers. I only showed the top 20 here for space reasons. Two patterns in the data stand out.

Fact is:

The publisher pepe276 stands out the most. With nearly 38,000 findings, this publisher is way ahead of everyone else. Almost all of them are High findings, which points to lots of obfuscation or supply chain patterns. A single person being behind a number like that is unusual. Either the publisher has released a lot of skills, or individual skills with particularly extensive code.

Even more concerning is cooperun. This publisher stands out with 4,982 critical findings. That's significantly more than anyone else and justifies a manual review of their skills. “Critical” means the scanner found clear patterns for high-risk operations.

Note

High finding counts can also come from extensive documentation. A skill with many SKILL.md pages, many examples, and many translated versions can legitimately produce hundreds or thousands of hits without being malicious. The table isn't a verdict but a priority list for manual reviews.

8. How Many False Positives Got Removed?

Before you compare these numbers with other analyses, you should know how heavily the raw hits were cleaned up. The refinement filter removes known false positives from three sources:

Source: own analysis, static scanner with refinement filter

CC BY 4.0

gradually.ai

Together, 261,451 instances were removed from the raw analysis. That's more than the final hit count of 182,258. Without the refinement filter, the numbers wouldn't just be unwieldy but also misleading.

Homoglyphs in Markdown (160,889 removed hits) are the most common false positive. Many skills have documentation in Chinese, Russian, or Arabic. The scanner flags the characters of those languages as “suspicious” because they look like obfuscated code. In reality, it's just a translation.

HTTP examples in documentation (54,034 hits) are the second major false positive. The SC-004 pattern ID covers API examples in .md files. In source code they'd be a signal. In documentation they're harmless.

Code in Markdown fences (46,528 hits) is the third block. Most SKILL.md files contain examples like curl -X POST ... or pip install tensorflow. The scanner would count those examples as findings. The refinement filter removes them.

What the refinement filter can't do:

It can miss real risks inside code blocks. If an attacker deliberately puts malicious code inside a Markdown block to hide it, the refinement filter strips the hit. That's the price for less noise.

9. What This Means for You

No need to panic. But no blind trust either. The numbers are a wake-up call, not the end of the world. Here are my concrete recommendations for working with ClawHub skills:

Check the publisher. Verified publishers (the 6.8%) aren't automatically safe, but they've at least gone through some kind of identity check. If the publisher is unknown and has little reputation, don't install the skill blindly.
Read the SKILL.md. If you really need a skill, open the SKILL.md and the scripts it references. Look for sudo commands, curl installs, accesses to .ssh or .env. If you don't understand some of the code, ask a colleague or have Claude or ChatGPT explain it.
Use a sandbox. OpenClaw supports restricted permissions. Give a skill only the rights it actually needs. No root rights for a PDF parser. No network access for a file renamer.
Keep OpenClaw up to date. New releases often include security fixes. CVE-2026-25253 (code execution by clicking once) was patched within 48 hours. Skipping updates leaves you exposed.
Avoid sensitive data. Don't use ClawHub skills with credentials for production systems, API keys for payment providers, or personal health data. As long as the marketplace has no mandatory security review, every skill is a potential risk.

If security matters more to you than feature breadth, take a look at the secure OpenClaw alternatives. Projects like OpenFang and IronClaw are built with security-first principles from day one. The selection of skills is much smaller, but the quality control is stricter.

The OpenClaw Foundation is working on a mandatory security review process for the ClawHub. Until then, the responsibility stays with the users. Eyes open when shopping for skills.

Frequently Asked Questions

That made me curious.

Important note upfront.

Installing skills blindly comes with risk. And that risk is higher than most people think.

TL;DRKey Takeaways

16,797 ClawHub skills scanned, 48.4% with at least one security finding
182,258 total hit instances: 14.7% critical, 76.6% high, 8.7% medium
Supply chain risks (21.4%) and data exfiltration (20.1%) are the most widespread

1. What is the ClawHub?

Installation is dead simple. One command in the terminal, and the skill is ready to go. That's exactly what makes the ClawHub so attractive. And at the same time so risky.

A few numbers for context:

Over 44,000 skills in the marketplace
12,400 active skill developers
Only 6.8% of developers are verified
Over 2.3 million skill installations so far
127 downloads per skill on average

The problem with that:

2. The Methodology Behind the Analysis

In the end, I had 16,797 skills from the GitHub mirror that could be scanned successfully.

There's one caveat though.

The mirror is a snapshot. Skills that have been removed from the official marketplace in the meantime can still be in the mirror. And new skills uploaded after the last mirror sync are missing.

The categories are:

Supply Chain: unpinned pip/npm installs, curl-to-bash, package tampering
Data Exfiltration: SSH key accesses, hardcoded endpoints, browser data reads
Privilege Escalation: sudo, capability abuse, root execution
Persistence: cron jobs, profile modifications, CLAUDE.md writes
Destructive Patterns: rm -rf, shutil.rmtree, mass deletions
Code Execution: eval, shell pipes, reverse shells, unsafe deserialization
Hardcoded Secrets: embedded API keys, passwords, tokens
Prompt Injection: attempts to override roles or safety instructions
Obfuscation: homoglyphs, unicode tricks, unusual encodings
SSRF, Sandbox Escape, Suspicious Files, Resource Exhaustion as additional categories

Note

3. The Results at a Glance

The most important number from the analysis shows how the 16,797 scanned skills split between “with finding” and “no findings”:

No findings: 8,668 skills

With security finding: 8,129 skills

Source: own analysis, static scanner with refinement filter

CC BY 4.0

gradually.ai

Of the 16,797 scanned skills, 8,129 have at least one remaining security finding. That's 48.4%. Nearly every second skill on the ClawHub shows at least one technical warning signal.

The remaining 51.6% have no findings left after the refinement filter.

But here's the catch.

4. Severity Breakdown

The distribution by severity shows how critical the findings are overall:

Source: own analysis, static scanner with refinement filter

CC BY 4.0

gradually.ai