# PDFParser

> PDFParser is a PDF parsing service that turns PDF documents into structured
> data. It performs text extraction, image extraction (image from PDF),
> PDF-to-Markdown (PDF to MD) conversion, and data structuring — exposed as a
> simple HTTP API and a web app.

PDFParser accepts a PDF upload and returns a JSON array. Each element
represents a page with the following fields:

- `pageNumber` — zero-based index of the page.
- `rawText` — plain text extracted from the page.
- `textWithImage` — text with inline image references (e.g. `[image:1]`).
- `images` — array of objects `{ imageId, imageBase64 }` for every image on
  the page.

## Core capabilities

- **Text extraction** — clean, ordered text from any PDF.
- **Image extraction** — every image on the page returned as base64 PNG.
- **PDF to MD** — convert a PDF to Markdown with headings, lists, and inline
  image references preserved.
- **Data structuring** — turn unstructured PDFs into structured JSON ready
  for indexing, search, RAG pipelines, or analytics.

## Pricing

- **Free** — 100 pages / month. Available now.
- **Pro** — Coming soon.
- **Enterprise** — Coming soon.

## Pages

- [Home](/): Upload a PDF and parse it in the browser.
- [Pricing](/pricing): Plans and limits.
- [API](/api): Programmatic access (coming soon).
- [About](/about): Background on PDFParser (coming soon).
- [Sign in](/login): Existing user login.
- [Sign up](/register): Create a free account.

## Resources

- [Full LLM context](/llms-full.txt)
- [Sitemap](/sitemap.xml)