Building a Claude Skill for Automated Battery Data Extraction: From Chaos to Standardized Data

How I used Claude Skills to transform messy supplier datasheets into clean, consistent JSON—and why it matters for battery engineers.

Table of Contents


The Problem: Every Supplier Speaks a Different Language

It's 4 PM on a Friday. You're looking forward to having dinner with your friends. Then your boss pings: "Hey, can you whip up a quick comparison of those 5 new cells asap?"

Sure, why not? Five cells. Twenty minutes, tops. Quick and painless.

You open the first PDF—looks fine. The second uses completely different terms. And the third? Data buried in dense tables with merged cells that look like abstract art.

Example of a simpler cell characteristics table Example of a multi-cell comparison table with complex formatting

So you started putting those numbers in an Excel sheet, just like the table below. You spend quite a long time diligently copying all the data. And then it hits you: "Damn! The suppliers are speaking completely different languages!" One says "Nominal Capacity," another says "Typical Capacity." One measures impedance at 10 seconds, another at 2 seconds. How on earth are you supposed to compare those for one single parameter?

Specification Supplier A Supplier B Supplier C
DCIR "DCIR @ 40°C, 10s pause" "AC impedance at 1 kHz" "Internal resistance ≤18mΩ"
Cycle Life "500 cycles at 80% DOD" "2000 times (0.5C/0.5C)" "≥800 cycles to 80% capacity"
Capacity "3000mAh (typ.)" "Min: 2900mAh, Nom: 3000mAh" "3.0Ah nominal"

That "quick 20-minute task" at 4 PM? It's now 8 PM. You missed dinner with friends. And you're sitting there thinking: "I did NOT get a PhD to become an expert at copy-paste."

For years, this was just... how the industry worked. No standards. Every manufacturer doing their own thing. The only solution? Hire someone with deep battery expertise—and have them do the world's most soul-crushing data entry. Nothing kills passion faster than copy-paste-double-check-repeat on loop.

I kept thinking: There HAS to be a better way. With the rapid evolution of large language models, I've found that Claude Skills are perfect for this. Honestly, it saved my sanity—and my Friday nights. Let me show you how to build one so you can get yours back, too.


Enter Claude Skills: Procedural Knowledge for AI

Recently, I kept seeing people on X talking about how awesome is Claude Skills. After researching through YouTube videos and documentation, I realized that extracting data from chaotic PDFs and standardizing the output was the perfect use case.

What Are Claude Skills?

Claude Skills are specialized folders containing instructions, scripts, and resources. They allow Claude to perform repeatable, complex workflows without you having to re-explain the rules in every new chat.

Think of them like your favorite recipe that you've perfected over the years. You write it down once—every step, every technique, every little trick you've learned. And from then on, anyone can follow it and get the same great result.

That's a Claude Skill. It's a set of instructions, scripts, and reference files that teach Claude how to do a specific task. You set it up once, and Claude follows those instructions every time.

Additionally, Claude Skills use Progressive Disclosure to stay efficient:

  1. Matching phase: Claude scans all your Skills to find relevant ones (~100 tokens).
  2. Loading phase: Only when you confirm, Claude loads the full instructions.
  3. Execution phase: Claude follows the Skill's instructions and runs any scripts.

This means inactive Skills barely consume tokens. You can have dozens without bloating your context window.

Flowchart illustrating the token-efficient progressive disclosure mechanism

Why Battery Data Extraction Is Perfect for Skills

Battery datasheet extraction is fundamentally a procedural knowledge problem. The properties are consistent across all suppliers:

What varies is how each supplier presents this data. That's exactly what Skills handle well.

Compared to using a general-purpose chat (like dropping a PDF into ChatGPT), Skills offer key advantages:

For example, battery suppliers use different units for the same measurements. A Skill can automatically check and convert units every time you extract data.

This is exactly what Skills excel at.


Demo: See It in Action

You can find my skill and the complete vault on GitHub: battery-cell-extraction-claude-skills

Quick Start:

  1. Create a zip file of the battery-cell-extraction/ folder
  2. Open Claude settings → Import the zip file as a custom skill
  3. Drop a PDF into Claude and say: "Extract cell data sheets for me."
  4. Claude loads the skill, extracts structured JSON data
  5. Copy the output to your local output/ folder

The Skill Architecture For Cell Tech-Specs Extraction

The skill has four lean components:

battery-cell-extraction/
├── SKILL.md                 # Instructions (~6KB, token-optimized)
├── scripts/extractor.py     # Parsing functions
├── normalization_rules.yaml # Supplier field mappings
└── validation_schema.json   # Output structure

How SKILL.md Works

The instructions follow a simple flow: Find → Extract → Normalize → Validate → Output.

Section Purpose
A: Data Identification Locate tables, identify layout pattern
B: Extraction Parse values, handle unit conversions
C: Normalization Map supplier terms to standard fields
D: Validation Check ranges, flag uncertainties
E: Output Generate JSON with confidence scores

Pattern Recognition: Taming Table Chaos

Different suppliers format tables differently. The skill identifies the pattern before extracting:

Pattern What It Looks Like
Key-Value Simple 2-column (Parameter | Value)
Multi-Condition Merged cells with nested conditions
Visual Grouping Indentation-based, no borders
Comparison Multiple products in columns

[!TIP] When new suppliers appear, just update normalization_rules.yaml—no code changes needed.

Field Normalization

The YAML file maps 100+ variations to standard names:

# Example: "Nominal Capacity", "Rated Capacity", "Typical Capacity" 
#          all map to → capacity.nominal_ah

Output Structure

Every extraction produces consistent JSON:

cell_info      → manufacturer, model, chemistry, format
mechanical     → dimensions, weight
electrical     → capacity, voltage, impedance, current
derived        → energy, power density (calculated)
metadata       → confidence scores, missing fields, warnings

Full schema definition lives in validation_schema.json.


The Time Savings Are Real

What used to take 15-20 minutes of tedious copy-paste work now takes less than 2 minutes. That's a 90% time reduction—and more importantly, you get consistent, validated data every time.

Before (Manual) After (Claude Skill)
Time per datasheet 15-20 minutes < 2 minutes
Error checking None (hope for the best) Automatic validation + confidence scores
New supplier format Hours of reformatting ~30 minutes to update YAML mappings

Try it yourself: Start with a single datasheet. Once you see how clean the output is, you'll never want to go back to manual extraction.


Key Learnings

1. Skills vs. MCP vs. Projects: Right Tool for the Job

When I first started, I was honestly confused between Claude's different tools: Projects, MCP, and Skills. Projects give Claude fixed context—great for keeping documents or code available across chats, but not for encoding repeatable procedures. MCP (Model Context Protocol) is about connectivity—fetching live data from databases, filesystems, or APIs. Powerful, but overkill when you just need to process local PDFs.

That's when I realized Skills were the perfect fit. They're designed for procedural knowledge—teaching Claude how to do a specific task the same way every time. For extracting and normalizing battery data from PDFs, Skills are simpler, more efficient, and exactly what the job requires.

2. Combine LLM Flexibility with Python Precision

The magic formula: Use Claude's natural language understanding for format variations ("3000mAh" vs "3.0Ah" vs "3000 mAh (typical)"), but delegate deterministic operations (unit conversions, calculations) to Python.

3. Validation is Non-Negotiable

LLMs can hallucinate. I built comprehensive validation including:

4. Iteration is Built-In

When the skill misses something or a new supplier format appears, I update the markdown instructions or YAML mappings—no coding required. This makes the solution maintainable by domain experts, not just developers.


For Non-Battery Engineers

If you're dealing with similar data normalization challenges in your domain—medical records, financial statements, technical specifications—consider whether Claude Skills could help. The key question: Is your problem about encoding procedural knowledge?

If yes,Claude Skills might be your game-changer.


Resources


Have questions or want to see a live demo? Connect with me on LinkedIn or check out my other articles on AI-assisted workflows.


Tags: #ClaudeSkills #AI #BatteryEngineering #DataExtraction #Automation #ProductivityTools

Back to My Thoughts