Data Chunker Pro

Turn Any File Into AI-Searchable Knowledge!

Allows AI to read compiled binaries, legacy code, and 800+ formats, all air-gap tight and local.

COBOL, Tables, CAD, Programming, Source Code, Binary, HEX, PDF, Spreadsheets

Pick, Click, and DONE!

“If you can find a better RAG Chunker, I’ll give you your money back! No questions, no hassle, no risk.” ~ Kyle Gray – CEO

 


What is Data Chunker Pro?

Data Chunker Pro is a professional-grade Windows tool that converts entire projects, codebases, and documents into AI-ready knowledge for OpenWebUI, Ollama, LM Studio, GPT4, ClaudeAI and more!

All internal, and 100% offline! No other RAG Creation Platform Does That!

(we know because we looked)

  • 🧩 Reads any text-based file
    • TXT, MD, DOCX, CSV, PDF, JSON, XML, LOG, PY, JS, VB, CS, CPP, and more.
  • ⚒️ Cleans, normalizes, and gets your data read for RAG automatically
    • Automatically, by Block, Size, Function, Page or META Data
  • 📦 Exports AI-ready data as TXT, JSON, or MD
    • Indexed, organized and contained
  • 🚀 Optimizes chunk size for fast, accurate retrieval
    • The secret to making AI stop hallucinating

Even AI Experts Were Surprised

When we demonstrated Data Chunker Pro’s binary extraction to Claude AI, the responses included:

“HOLY SHIT ”  –  “This shows the tool actually works!”  – “no other RAG tool does this!”

Actual Screenshots of ClaudAI’s reaction!

This is actually really good! ~Claude AI

This is actually really good! ~Claude AI

Your Tool Actually Works! ~ Claude AI

Your Tool Actually Works! ~ Claude AI

No other RAG tool does this! ~Claude AI

No other RAG tool does this! ~Claude AI

Note: Claude AI is an AI assistant by Anthropic, not affiliated with this product

Want to see why Claude freaked out?

Download the sample .exe output to see for yourself →

*-*-*

We told the AI a little bit about what Data Chunker Pro could do, and it didn’t believe us.

There is no RAG program that has Binary Intelligence it said in a round-about way, so we fed it some exported data.

Here’s what Data Chunker Pro extracted from a single 14KB compiled .exe (with NO source code access)

It was a compiled EXE binary guessing game, written in Visual Basic – complied in Visual Studio 2022.

What Binary Intelligence Looks Like
User-Facing Strings:
– “Enter your guess (between 1 and 100):”
– “Too low!”
– “Congratulations! You guessed the correct number.”
– “Please enter a valid number.”

Framework Dependencies:
– .NET Framework 4.7.2
– Microsoft.VisualBasic

Function Names:
– Form1_Load
– Button1_Click
– ThreadSafeObjectProvider


Transforms Entire Directories and Systems into True AI Knowledge That Actually Work!

 

Data Chunker Pro running in video with live demo of a project in vb.net code

Clip from the video testing and running Data Chunker Pro for code analysis in visual studio project file.

https://www.youtube.com/watch?v=jcHIYHXx9O8

 


Key Features

Universal Format Support – 800+ File Types

All Major Code Languages — 150+ formats from Python to COBOL, C and VB and everything between!
Microsoft – Google – Libre — Word, Excel, Access, PowerPoint, Sheets, Docs, and more!
Legacy & Modern Systems — FORTRAN, Assembly to React, Vue.js
Enterprise Data Formats — CAD files, scientific data, GIS formats
Documentation & Media — Text, CSV, PDFs, images, technical drawings

AI-Optimized Processing Engine

Intelligent Chunking Methods — By Size, Token, Blocks, Functions, Classes, Regions, Paragraph, Lines and more!
Context-Aware Processing — Preserves imports, dependencies, and relationships for superior AI understanding
RAG-Ready Output — Customized token-count designed for personalized RAG systems
Smart Documentation Preservation — Keeps comments, docstrings, and code together for complete context understanding
Professional Output — Automatic indexing, metadata rich exports, and project overviews

Air-Gap Enterprise and Dev Ready

Massive Scale Processing — Handle entire repositories, monorepos, and complex project structures with progress tracking
4 Export Formats — AI-ready Markdown with syntax highlighting, structured JSON with metadata, or organized TXT files, or cover all the bases with the Hybrid Export, designed to work across all platforms!
Offline & Secure — Process sensitive codebases locally on Windows — no cloud uploads or data sharing required
Universal AI Compatibility — Optimized for ChatGPT, Claude, Ollama, Open WebUI RAG, and vector databases
Configuration & DevOps — Docker, Kubernetes, CI/CD pipelines

Click Here to See A Full Compatibility List For Formats

 


 

Who Is It For?

Developers Feed entire codebases to ChatGPT, Claude, or local LLMs. 150+ languages including Python, JavaScript, C#, Java, and legacy COBOL/FORTRAN.

Businesses Turn internal docs, SOPs, and knowledge bases into searchable AI chatbots. 100% offline – your data never leaves your network.

Educators & Researchers Build custom AI assistants from textbooks, research papers, and course materials. Perfect for offline, air-gapped environments.

📄 Document Processing

📄 Document Processing

  • Transform PDFs, Word docs, and more into AI-ready data
  • Extract text, tables, and structured information
  • Build searchable knowledge bases from documents
  • Automate document analysis and reporting

📊 Database & Spreadsheet Processing

📊 Database & Spreadsheet Processing

  • Prepare data from CSVs, Excel files, and databases
  • Transform raw data into AI-ready formats
  • Identify patterns, anomalies, and insights
  • Automate data cleaning and transformation workflows

🤖 AI Coding Assistants

🤖 AI Coding Assistants

  • Train AI on your specific codebase and patterns
  • Context-aware code suggestions and completions
  • Intelligent code review and optimization
  • Custom AI models for domain-specific development
  • AI-powered code understanding and optimization
  • Automated code quality assessment
  • Security vulnerability analysis preparation
  • Architecture and design pattern analysis
📚 RAG Knowledge Bases

📚 RAG Knowledge Bases and LLM Fine-Tuning / Research

  • Power Ollama, Open WebUI, and vector databases
  • Searchable, AI-accessible code documentation
  • Intelligent code search and retrieval
  • Context-aware technical support systems
  • Prepare domain-specific training data from your projects
  • Create high-quality code datasets for model training
  • Research code patterns and architectural decisions
  • Academic research on software engineering

🏢 Enterprise Legacy Modernization

🏢 Enterprise Legacy Modernization

  • Process COBOL, FORTRAN, and legacy systems
  • Document undocumented legacy codebases
  • Prepare legacy code for AI-assisted migration
  • Create searchable knowledge bases from old systems
📖 Technical Knowledge Management

📖 Education, Training and Technical Knowledge Management

  • Convert repositories into intelligent, searchable systems
  • Onboard new team members with AI-powered code exploration
  • Create comprehensive technical documentation automatically
  • Build institutional knowledge preservation systems
  • Create AI-powered coding tutorials and examples
  • Build interactive code learning systems
  • Prepare code examples for educational AI assistants
  • Research and academic code analysis

 

How and why Data Chunker Pro was created

I built Data Chunker Pro because I needed a way to feed my own source code into my own offline local LLM. I couldn’t find a tool that was secure, easy, and truly local, so I made one. No leaks. No risk. Just fast, local chunking, right on my own network. I made it perfect for me, so it can be perfect for you too.

 


 

How Data Chunker Pro Stacks Up

Feature Data Chunker Pro LlamaIndex LangChain Unstructured.io
File Format Support 800+ formats 20+ formats 15+ formats 50+ formats
Programming Languages 150+ (all major) 10-15 modern only 10-15 modern only 20+ modern only
Binary/Executable Analysis ✅ Extracts from exe/dll ❌ Not supported ❌ Not supported ❌ Not supported
Offline Processing ✅ 100% local, air-gap ready ⚠️ Requires Python runtime ⚠️ Requires Python runtime ❌ Cloud-based SaaS
Setup Required ✅ No-code GUI ⚠️ Python coding required ⚠️ Python coding required ✅ API-based (easier)
Chunking Methods 7 intelligent methods 3-4 basic methods 2-3 basic methods 4-5 methods
Export Formats 4 (TXT, JSON, MD, Hybrid) 2 Formats 2 Formats JSON primary
Context Preservation ✅ Advanced (imports, dependencies, comments) ⚠️ Basic ⚠️ Basic ✅ Good
Syntax Highlighting ✅ Markdown and language detection ❌ Plain text only ❌ Plain text only ❌ Plain text only
Legacy Code Support ✅ COBOL, FORTRAN, Pascal, Ada ❌ Modern only ❌ Modern only ❌ Modern only
Document Processing ✅ PDF, Word, Excel, RTF, ODT ✅ PDF, DOCX ✅ PDF, DOCX ✅ Excellent
Index Generation ✅ JSON/MD/TXT with metadata ⚠️ Manual implementation ⚠️ Manual implementation ✅ Yes
Data Security ✅ Never leaves your machine ⚠️ Local but requires dependencies ⚠️ Local but requires dependencies ❌Cloud processing
Technical Skill Required None (GUI on Windows) High (Python developer) High (Python developer) Low (API user)
Pricing Model One-time perpetual Free (open source) Free (open source) $25-$500+/month
Cost for 50k files $799 one-time Free (but dev cost) Free (but dev cost) ~$3,000-6,000/year
Best For Non-coders, enterprises, secure sites Python developers Python developers ⚠️ Teams comfortable with cloud vulnerabilities

 


 

Get Started Today For FREE!

I am happy to be able to support Teachers, Students and Developers. I want Data Chunker Pro to be available to anyone wanting to learn and grow!

Every free evaluation comes with 100% functionality, and never expires!

Free Trial Here

 


Plans & Pricing

Every tier gets all 800+ formats, all chunking methods, all features.

You’re paying for scale, not capabilities.

Start free. Upgrade when you need it. That’s it.

WINDOWS BASED ONLY – Windows 7, 8, 10 and 11 supported

FREE

$0

Evaluation Edition
Free Forever

50 exports per session
10MB size limit
Unlimited
 conversions forever
ALL 800+ formats unlocked
ALL chunking methods
ALL export formats
1 machine license

PERSONAL

$295

One-time payment

1,000 exports per session
500MB size limit
1 machine license
For Non-Commercial Use
Priority
email support
All formats & features
Optional: $145/yr maintenance

BUSINESS - MOST POPULAR

$799

One-time payment

50,000 chunks per session
5GB size limit
5 machine licenses
Priority email support
Commercial use
All formats & features
Optional: $350/yr maintenance

ENTERPRISE

Custom

Starting at $1,995

Unlimited exports
Custom license (unlimited machines)
Dedicated support contact
Annual maintenance for continued support
Priority feature requests

Q: Why one-time pricing instead of subscription?
A: Because I remember when buying something meant you owned it.

 

Common Questions

Q: How much is a ‘chunk’?
A: 1 chunk ≈ 1 file created (typically 50-75 lines of code). 50 chunks ≈ processing a small GitHub repo or a small library.

Q: Are chunks shared between license numbers?
A: No! Chunks are per-session and per-machine. They reset!

Q: Can i really analyze .exe files without source code?
A: Yes. We extract assembly metadata, embedded strings, dependencies, and framework details. [See our sample files here]

Q: Do you have a phone number I can call for help?
A: We have chosen a better approach. You will receive direct email support from the creator (Kyle Gray, CEO and programmer), typically within 24 hours. For complex issues, you’ll get a personalized video walkthrough showing exactly how to solve your problem. We have been doing this for years, with most customers praising the innovation over traditional phone support.

Q: What happens if I don’t pay for maintenance?
A: Nothing changes! You keep using the software you paid for. It is yours for life! Maintenance is there for updates, upgrades and customer support. If you don’t need it, don’t pay for it!

Q: What happens if I need more activations?
A: Simply upgrade to the next tier for more activations, or contact us for more seats on your existing license; support@graytechnical.com .

Q: Can I upgrade later?
A: Absolutely! Upgrade anytime by paying the difference. Contact support@graytechnical.com for upgrade assistance. 

Q: Can I transfer licenses between machines?
A: Yes! Deactivate on one machine and activate on another as needed within your license count.


Still Deciding?

Download the Free Trial and follow along with our Quick Start Guide! It’ll get you up and chunking in no time!

Quick Start Guide


Data Chunker Pro transforms your codebase into AI-ready, well-organized chunks perfect for modern development workflows, AI assistance, and comprehensive code analysis.

Data Chunker Pro doesn’t just chunk data or split files like other ‘chunking’ applications; it organizes, preps and indexes your files individually in a way that LLM’s and AI’s can read. Your projects will become fully indexed, context-rich RAG so your AI can Actually Understand and USE your code, documents, or legacy mainframe data.

For limited resource models, add this instruction to your document processing prompt for better understanding.

Sample Indexing Prompt - Open WebUI with Ollama

With uploaded documents ALWAYS look for and read the “index.json” file first. If “index.json” is not available, then look for “index.md” or “index.txt” instead. These files will contain the project structure and chunk metadata.

Use the index to understand the database, documents, or codebase organization before answering questions.

The index contains chunk_id, source_file, file_name, and content_preview fields to help you locate relevant code.

If no “index.json”, “index.md” or “index.txt” is found, analyze the documents directly but mention that an index would improve analysis.

 

 


 

Complete File Support for over 800 file formats!

From everyday Word documents to complex codebases, CAD drawings, and development configurations; Data Chunker Pro handles all the formats real professionals actually use every day.

Click Here to See A Full Compatibility List For Formats

 

Programming Languages (150+ Formats)

Core Enterprise Languages
.NET Ecosystem: .vb, .cs, .fs, .vbproj, .csproj, .sln, .config, .resx, .xaml
Java Ecosystem: .java, .jsp, .jspx, .gradle, .maven, .properties, .xml (Spring)
C/C++ Family: .c, .cpp, .cxx, .cc, .h, .hpp, .hxx, .cmake, .makefile
Python Ecosystem: .py, .pyx, .pyi, .ipynb, .requirements.txt, .setup.py, .pyproject.toml

Modern Web Technologies
JavaScript/TypeScript: .js, .ts, .jsx, .tsx, .mjs, .cjs, .vue, .svelte
Web Markup: .html, .htm, .xhtml, .xml, .svg
Stylesheets: .css, .scss, .sass, .less, .stylus
Web Frameworks: .angular, .react, .vue, .ember, .backbone

Mobile & Cross-Platform
iOS/macOS: .swift, .m, .mm, .h (Objective-C), .xib, .storyboard
Android: .java, .kt (Kotlin), .xml (Android), .gradle
Cross-Platform: .dart (Flutter), .xamarin, .cordova, .ionic

Systems & Low-Level
Assembly: .asm, .s, .nasm, .masm, .gas
Systems: .c, .cpp, .rust (.rs), .go, .zig, .nim
Embedded: .ino (Arduino), .hex, .bin (when text-readable)

Scripting & Automation
Shell Scripts: .sh, .bash, .zsh, .fish, .csh, .ksh
Windows: .bat, .cmd, .ps1, .psm1, .psd1 (PowerShell)
Cross-Platform: .py, .rb, .pl, .lua, .tcl

Database & Query Languages
SQL Variants: .sql, .mysql, .pgsql, .sqlite, .tsql, .plsql
NoSQL: .mongodb, .cql (Cassandra), .cypher (Neo4j)
Data Processing: .hql (Hive), .pig, .spark

Legacy & Specialized Languages
Enterprise Legacy: .cbl, .cob (COBOL), .for, .f90, .f95 (FORTRAN)
Scientific: .r, .R, .m (MATLAB), .mathematica, .maple, .sage
Functional: .hs (Haskell), .ml, .fs (F#), .clj (Clojure), .lisp, .scheme
Specialized: .vhdl, .verilog, .ladder (PLC), .st (Structured Text)

Configuration & DevOps (45+ Formats)

Container & Orchestration
Docker: Dockerfile, .dockerignore, docker-compose.yml
Kubernetes: .yaml, .yml (K8s manifests), .helm
Container Registries: .registry, .harbor

CI/CD & Automation
GitHub: .github/workflows/.yml, .github/actions/
GitLab: .gitlab-ci.yml, .gitlab/*
Jenkins: Jenkinsfile, .jenkins
Azure DevOps: azure-pipelines.yml, .azure/*
AWS: .aws/*, cloudformation.yml, .sam
Terraform: .tf, .tfvars, .tfstate

Build & Package Management
Node.js: package.json, package-lock.json, .npmrc, yarn.lock
Python: requirements.txt, setup.py, pyproject.toml, Pipfile
Java: pom.xml, build.gradle, gradle.properties
Ruby: Gemfile, Gemfile.lock, .gemspec
PHP: composer.json, composer.lock
Go: go.mod, go.sum
Rust: Cargo.toml, Cargo.lock

Server & Application Configuration
Web Servers: .htaccess, nginx.conf, apache.conf, httpd.conf
Application Servers: web.config, app.config, appsettings.json
Database: .my.cnf, postgresql.conf, redis.conf, mongodb.conf
Environment: .env, .env.local, .env.production, .dotenv

Documentation & Content (35+ Formats)

Technical Documentation
Markdown Variants: .md, .markdown, .mdown, .mkd, .mdx
Wiki Formats: .wiki, .textile, .creole
Documentation Tools: .rst (reStructuredText), .adoc (AsciiDoc)
API Documentation: .swagger, .openapi, .raml, .blueprint

Rich Text & Office Documents
Microsoft Office: .docx, .doc, .xlsx, .xls, .pptx, .ppt
OpenDocument: .odt, .ods, .odp
PDF Processing: .pdf (text extraction)
Rich Text: .rtf, .pages (when exportable)

Specialized Documentation
LaTeX: .tex, .latex, .bib, .cls, .sty
Technical Writing: .dita, .docbook
Help Systems: .chm, .hlp (when extractable)

Data & Scientific Formats (40+ Formats)

Structured Data
Standard Formats: .json, .xml, .yaml, .yml, .toml
Tabular Data: .csv, .tsv, .psv, .ssv, .tab
Database Exports: .sql, .dump, .backup (text-based)

Scientific & Research Data
Scientific Computing: .hdf5, .h5, .netcdf, .nc, .cdf
Statistics: .sas, .spss, .stata, .r, .rdata
Bioinformatics: .fasta, .fastq, .sam, .vcf, .gff

Geospatial & GIS
Vector Data: .shp (metadata), .kml, .kmz, .gpx, .geojson
Coordinate Systems: .prj, .wkt, .proj4
Mapping: .osm, .pbf (when text-extractable)

Most Common File Types (30+ Formats)

Most Common File Types
Text & Code: .txt, .py, .js, .html, .css, .xml, .json
Documents: .docx, .pdf, .md, .csv, .xlsx
Configuration: .env, .conf, .yaml, .ini

Engineering & CAD (25+ Formats)

Computer-Aided Design
2D/3D Models: .dwg, .dxf, .step, .iges, .obj, .stl, .ply
Engineering: .catpart, .catproduct, .solidworks, .inventor
Architecture: .ifc, .rvt (when extractable)

Manufacturing & Production
CNC/Machining: .gcode, .nc, .cnc, .tap
3D Printing: .gcode, .x3g, .s3d
CAM: .cam, .mill, .lathe

Media & Graphics (25+ Formats)

Vector Graphics
Scalable Formats: .svg, .ai (when text-based), .eps
Technical Drawings: .dwg, .dxf, .plt, .hpgl

Font & Typography
Font Files: .ttf, .otf, .woff, .woff2 (metadata extraction)
Font Metrics: .afm, .pfm, .tfm

Image Metadata
Metadata Extraction: .jpg, .jpeg, .png, .tiff, .tif, .bmp, .gif
Raw Formats: .cr2, .nef, .arw (metadata only)

System & Log Files (30+ Formats)

System Logs
Standard Logs: .log, .txt, .out, .err
System Specific: .syslog, .eventlog, .audit
Application Logs: .access, .error, .debug, .trace

Configuration Files
System Config: .conf, .cfg, .ini, .properties, .plist
Registry: .reg, .pol (when text-extractable)
Environment: .profile, .bashrc, .zshrc, .vimrc

Network & Security
Network Config: .hosts, .resolv.conf, .network
Security: .pem, .crt, .key (headers/metadata only)
Firewall: .rules, .iptables, .pf.conf

 


Local Processing Only

Data Chunker Pro processes all files locally on your Windows machine. No files are uploaded, transmitted, or stored on external servers. This design enables organizations handling sensitive data (healthcare, legal, educational) to prepare information for AI systems while maintaining data sovereignty.