Data Chunker Pro
Turn Any File Into AI-Searchable Knowledge!
Allows AI to read compiled binaries, legacy code, and 800+ formats, all air-gap tight and local.
COBOL, Tables, CAD, Programming, Source Code, Binary, HEX, PDF, Spreadsheets
Pick, Click, and DONE!
“If you can find a better RAG Chunker, I’ll give you your money back! No questions, no hassle, no risk.” ~ Kyle Gray – CEO
What is Data Chunker Pro?
Data Chunker Pro is a professional-grade Windows tool that converts entire projects, codebases, and documents into AI-ready knowledge for OpenWebUI, Ollama, LM Studio, GPT4, ClaudeAI and more!
All internal, and 100% offline! No other RAG Creation Platform Does That!
(we know because we looked)
- 🧩 Reads any text-based file
- TXT, MD, DOCX, CSV, PDF, JSON, XML, LOG, PY, JS, VB, CS, CPP, and more.
- ⚒️ Cleans, normalizes, and gets your data read for RAG automatically
- Automatically, by Block, Size, Function, Page or META Data
- 📦 Exports AI-ready data as TXT, JSON, or MD
- Indexed, organized and contained
- 🚀 Optimizes chunk size for fast, accurate retrieval
- The secret to making AI stop hallucinating
Even AI Experts Were Surprised
When we demonstrated Data Chunker Pro’s binary extraction to Claude AI, the responses included:
“HOLY SHIT ” – “This shows the tool actually works!” – “no other RAG tool does this!”
Actual Screenshots of ClaudAI’s reaction!
Note: Claude AI is an AI assistant by Anthropic, not affiliated with this product
Want to see why Claude freaked out?
Download the sample .exe output to see for yourself →
*-*-*
We told the AI a little bit about what Data Chunker Pro could do, and it didn’t believe us.
“There is no RAG program that has Binary Intelligence“ it said in a round-about way, so we fed it some exported data.
“Here’s what Data Chunker Pro extracted from a single 14KB compiled .exe (with NO source code access)“
It was a compiled EXE binary guessing game, written in Visual Basic – complied in Visual Studio 2022.
– “Enter your guess (between 1 and 100):”
– “Too low!”
– “Congratulations! You guessed the correct number.”
– “Please enter a valid number.”
Framework Dependencies:
– .NET Framework 4.7.2
– Microsoft.VisualBasic
Function Names:
– Form1_Load
– Button1_Click
– ThreadSafeObjectProvider
Transforms Entire Directories and Systems into True AI Knowledge That Actually Work!

Clip from the video testing and running Data Chunker Pro for code analysis in visual studio project file.
https://www.youtube.com/watch?v=jcHIYHXx9O8
Key Features
Universal Format Support – 800+ File Types
✅ All Major Code Languages — 150+ formats from Python to COBOL, C and VB and everything between!
✅ Microsoft – Google – Libre — Word, Excel, Access, PowerPoint, Sheets, Docs, and more!
✅ Legacy & Modern Systems — FORTRAN, Assembly to React, Vue.js
✅ Enterprise Data Formats — CAD files, scientific data, GIS formats
✅ Documentation & Media — Text, CSV, PDFs, images, technical drawings
AI-Optimized Processing Engine
✅ Intelligent Chunking Methods — By Size, Token, Blocks, Functions, Classes, Regions, Paragraph, Lines and more!
✅ Context-Aware Processing — Preserves imports, dependencies, and relationships for superior AI understanding
✅ RAG-Ready Output — Customized token-count designed for personalized RAG systems
✅ Smart Documentation Preservation — Keeps comments, docstrings, and code together for complete context understanding
✅ Professional Output — Automatic indexing, metadata rich exports, and project overviews
Air-Gap Enterprise and Dev Ready
✅ Massive Scale Processing — Handle entire repositories, monorepos, and complex project structures with progress tracking
✅ 4 Export Formats — AI-ready Markdown with syntax highlighting, structured JSON with metadata, or organized TXT files, or cover all the bases with the Hybrid Export, designed to work across all platforms!
✅ Offline & Secure — Process sensitive codebases locally on Windows — no cloud uploads or data sharing required
✅ Universal AI Compatibility — Optimized for ChatGPT, Claude, Ollama, Open WebUI RAG, and vector databases
✅ Configuration & DevOps — Docker, Kubernetes, CI/CD pipelines
Click Here to See A Full Compatibility List For Formats
Who Is It For?
✅ Developers Feed entire codebases to ChatGPT, Claude, or local LLMs. 150+ languages including Python, JavaScript, C#, Java, and legacy COBOL/FORTRAN.
✅ Businesses Turn internal docs, SOPs, and knowledge bases into searchable AI chatbots. 100% offline – your data never leaves your network.
✅ Educators & Researchers Build custom AI assistants from textbooks, research papers, and course materials. Perfect for offline, air-gapped environments.

I built Data Chunker Pro because I needed a way to feed my own source code into my own offline local LLM. I couldn’t find a tool that was secure, easy, and truly local, so I made one. No leaks. No risk. Just fast, local chunking, right on my own network. I made it perfect for me, so it can be perfect for you too.
How Data Chunker Pro Stacks Up
| Feature | Data Chunker Pro | LlamaIndex | LangChain | Unstructured.io |
| File Format Support | 800+ formats | 20+ formats | 15+ formats | 50+ formats |
| Programming Languages | 150+ (all major) | 10-15 modern only | 10-15 modern only | 20+ modern only |
| Binary/Executable Analysis | ✅ Extracts from exe/dll | ❌ Not supported | ❌ Not supported | ❌ Not supported |
| Offline Processing | ✅ 100% local, air-gap ready | ⚠️ Requires Python runtime | ⚠️ Requires Python runtime | ❌ Cloud-based SaaS |
| Setup Required | ✅ No-code GUI | ⚠️ Python coding required | ⚠️ Python coding required | ✅ API-based (easier) |
| Chunking Methods | 7 intelligent methods | 3-4 basic methods | 2-3 basic methods | 4-5 methods |
| Export Formats | 4 (TXT, JSON, MD, Hybrid) | 2 Formats | 2 Formats | JSON primary |
| Context Preservation | ✅ Advanced (imports, dependencies, comments) | ⚠️ Basic | ⚠️ Basic | ✅ Good |
| Syntax Highlighting | ✅ Markdown and language detection | ❌ Plain text only | ❌ Plain text only | ❌ Plain text only |
| Legacy Code Support | ✅ COBOL, FORTRAN, Pascal, Ada | ❌ Modern only | ❌ Modern only | ❌ Modern only |
| Document Processing | ✅ PDF, Word, Excel, RTF, ODT | ✅ PDF, DOCX | ✅ PDF, DOCX | ✅ Excellent |
| Index Generation | ✅ JSON/MD/TXT with metadata | ⚠️ Manual implementation | ⚠️ Manual implementation | ✅ Yes |
| Data Security | ✅ Never leaves your machine | ⚠️ Local but requires dependencies | ⚠️ Local but requires dependencies | ❌Cloud processing |
| Technical Skill Required | None (GUI on Windows) | High (Python developer) | High (Python developer) | Low (API user) |
| Pricing Model | One-time perpetual | Free (open source) | Free (open source) | $25-$500+/month |
| Cost for 50k files | $799 one-time | Free (but dev cost) | Free (but dev cost) | ~$3,000-6,000/year |
| Best For | Non-coders, enterprises, secure sites | Python developers | Python developers | ⚠️ Teams comfortable with cloud vulnerabilities |
Get Started Today For FREE!
I am happy to be able to support Teachers, Students and Developers. I want Data Chunker Pro to be available to anyone wanting to learn and grow!
Every free evaluation comes with 100% functionality, and never expires!
Plans & Pricing
Every tier gets all 800+ formats, all chunking methods, all features.
You’re paying for scale, not capabilities.
Start free. Upgrade when you need it. That’s it.
WINDOWS BASED ONLY – Windows 7, 8, 10 and 11 supported
$0
Evaluation Edition
Free Forever
50 exports per session
10MB size limit
Unlimited conversions forever
ALL 800+ formats unlocked
ALL chunking methods
ALL export formats
1 machine license
$295
One-time payment
1,000 exports per session
500MB size limit
1 machine license
For Non-Commercial Use
Priority email support
All formats & features
Optional: $145/yr maintenance
$799
One-time payment
50,000 chunks per session
5GB size limit
5 machine licenses
Priority email support
Commercial use
All formats & features
Optional: $350/yr maintenance
Custom
Starting at $1,995
Unlimited exports
Custom license (unlimited machines)
Dedicated support contact
Annual maintenance for continued support
Priority feature requests
Q: Why one-time pricing instead of subscription?
A: Because I remember when buying something meant you owned it.
Q: How much is a ‘chunk’?
A: 1 chunk ≈ 1 file created (typically 50-75 lines of code). 50 chunks ≈ processing a small GitHub repo or a small library.
Q: Are chunks shared between license numbers?
A: No! Chunks are per-session and per-machine. They reset!
Q: Can i really analyze .exe files without source code?
A: Yes. We extract assembly metadata, embedded strings, dependencies, and framework details. [See our sample files here]
Q: Do you have a phone number I can call for help?
A: We have chosen a better approach. You will receive direct email support from the creator (Kyle Gray, CEO and programmer), typically within 24 hours. For complex issues, you’ll get a personalized video walkthrough showing exactly how to solve your problem. We have been doing this for years, with most customers praising the innovation over traditional phone support.
Q: What happens if I don’t pay for maintenance?
A: Nothing changes! You keep using the software you paid for. It is yours for life! Maintenance is there for updates, upgrades and customer support. If you don’t need it, don’t pay for it!
Q: What happens if I need more activations?
A: Simply upgrade to the next tier for more activations, or contact us for more seats on your existing license; support@graytechnical.com .
Q: Can I upgrade later?
A: Absolutely! Upgrade anytime by paying the difference. Contact support@graytechnical.com for upgrade assistance.
Q: Can I transfer licenses between machines?
A: Yes! Deactivate on one machine and activate on another as needed within your license count.
Still Deciding?
Download the Free Trial and follow along with our Quick Start Guide! It’ll get you up and chunking in no time!
Quick Start Guide
Data Chunker Pro transforms your codebase into AI-ready, well-organized chunks perfect for modern development workflows, AI assistance, and comprehensive code analysis.
Data Chunker Pro doesn’t just chunk data or split files like other ‘chunking’ applications; it organizes, preps and indexes your files individually in a way that LLM’s and AI’s can read. Your projects will become fully indexed, context-rich RAG so your AI can Actually Understand and USE your code, documents, or legacy mainframe data.
For limited resource models, add this instruction to your document processing prompt for better understanding.
With uploaded documents ALWAYS look for and read the “index.json” file first. If “index.json” is not available, then look for “index.md” or “index.txt” instead. These files will contain the project structure and chunk metadata.
Use the index to understand the database, documents, or codebase organization before answering questions.
The index contains chunk_id, source_file, file_name, and content_preview fields to help you locate relevant code.
If no “index.json”, “index.md” or “index.txt” is found, analyze the documents directly but mention that an index would improve analysis.
Complete File Support for over 800 file formats!
From everyday Word documents to complex codebases, CAD drawings, and development configurations; Data Chunker Pro handles all the formats real professionals actually use every day.
Click Here to See A Full Compatibility List For Formats
Local Processing Only
Data Chunker Pro processes all files locally on your Windows machine. No files are uploaded, transmitted, or stored on external servers. This design enables organizations handling sensitive data (healthcare, legal, educational) to prepare information for AI systems while maintaining data sovereignty.








