We’re excited to announce two powerful new additions to the GreenPT API: our Scraper API and OCR API. These tools are designed to help developers extract, process, and structure content at scale — all while maintaining the privacy-first, sustainable approach that defines GreenPT.

Whether you’re building RAG pipelines, automating document workflows, or extracting web data for AI applications, these APIs provide the foundation you need.

🌐 Scraper API: Turn Any Webpage into Structured Data

The web holds vast amounts of valuable information, but extracting it reliably has always been a challenge. Our new Scraper API changes that.

What You Can Do

Multiple Output Formats Get your data exactly how you need it. Extract content as clean Markdown, structured HTML, JSON objects, full-page screenshots, or raw text. One API, multiple possibilities.

Smart Caching Speed matters. Our intelligent caching system (with a 2-day default) ensures repeated requests are lightning-fast while keeping your data fresh.

Full Browser Automation Some data isn’t just sitting on the page waiting to be grabbed. Our Scraper API can click buttons, scroll through infinite feeds, fill out forms, and execute JavaScript — handling dynamic content that traditional scrapers miss.

LLM-Powered JSON Extraction Define a custom schema, and let our AI extract exactly the structured data you need. No more regex nightmares or brittle parsing logic.

Brand Identity Extraction Building competitive analysis tools or design systems? Extract brand colors, fonts, and typography automatically from any website.

📄 OCR API: Extract Text from Any Document

Documents come in countless formats. Our OCR API handles them all, turning images, PDFs, and office files into structured, searchable text.

Supported Formats

Process over 20 file types including PDF, DOCX, PPTX, XLSX, PNG, JPG, TIFF, and more. If your business deals with documents, we can read them.

Multiple OCR Engines

Choose the right tool for your use case. Select from EasyOCR, Tesseract, or RapidOCR depending on your language requirements, accuracy needs, and processing speed preferences.

Advanced Extraction Features

Table Structure Recognition Documents aren’t just text — they contain structured data in tables. Our API preserves table layouts with both fast and accurate processing modes.

Formula Recognition Working with scientific or technical documents? Extract mathematical formulas with LaTeX output, ready for rendering or further processing.

Flexible Output Get your extracted content as Markdown (perfect for LLM consumption), JSON (for structured processing), HTML (for web display), or plain text.

Sustainable Infrastructure, Built for Europe

Like all GreenPT services, both APIs run on 100% renewable energy infrastructure hosted in the EU. This means:

  • Full GDPR compliance — your data stays in Europe
  • Transparent data processing — no hidden data retention or training on your content
  • Carbon-conscious computing — extract data without expanding your carbon footprint

Get Started Today

Both APIs are available now through the GreenPT platform. Check out our developer documentation at docs.greenpt.ai for complete API references, code examples, and integration guides.

Building the next generation of AI-powered applications? Do it sustainably with GreenPT.