# PDFParser — Full LLM Context

PDFParser is a PDF parsing service for developers and teams. It extracts
text, extracts images, converts PDF documents to Markdown (PDF to MD), and
returns clean structured data as JSON.

## What PDFParser does

PDFParser turns any PDF into structured data. It is built around four
capabilities:

1. **Text extraction** — Plain text from every page, in reading order, ready
   for indexing or RAG (retrieval-augmented generation) pipelines.
2. **Image extraction** — Every image embedded in a PDF, returned as
   base64-encoded PNG. Useful for "image from PDF" workflows.
3. **PDF to MD (Markdown)** — Convert PDF to Markdown with headings, lists,
   and inline image references preserved.
4. **Data structuring** — Output is a JSON array, one element per page, with
   raw text, text with image markers, and images.

## API contract

`POST /v1/parse` (mocked in the current frontend) accepts a PDF file and
returns the following JSON shape:

```json
[
  {
    "pageNumber": 0,
    "rawText": "string",
    "textWithImage": "string with [image:1] markers",
    "images": [
      { "imageId": 1, "imageBase64": "base64-png-data" }
    ]
  }
]
```

## How it works (3 steps)

1. **Upload PDF** — Drop a PDF file or use the API.
2. **We parse it** — The engine extracts text, images, and structured fields.
3. **Get structured data** — Receive clean JSON ready to use in your app.

## Result views in the web app

After parsing, three view modes are available:

- `raw_text` — Plain extracted text.
- `text_with_image` — Text with inline image references.
- `json` — Full JSON payload, formatted for copy/paste.

## Plans

- **Free** — $0 / month. 100 pages / month. Raw text, text + image
  extraction, and JSON output. Available now.
- **Pro** — $29 / month. Up to 10,000 pages / month, tables & forms
  detection, higher API limits, priority support. Coming soon.
- **Enterprise** — Custom pricing. Unlimited pages, SLA, dedicated support,
  SSO, audit logs, on-premise deployment. Coming soon.

## Common questions

**Can PDFParser extract images from a PDF?** Yes. Every image on every page
is returned as a base64 PNG in the `images` array of the response.

**Can PDFParser convert PDF to Markdown?** Yes. PDF to MD conversion
preserves headings, lists, and inline image references.

**Does PDFParser return structured data?** Yes. Output is a JSON array with
one entry per page. The shape is documented above.

**Is there a free tier?** Yes. The Free plan allows 100 pages per month at
no cost.

**What file formats are supported?** PDF only at this time.

## Keywords

PDF parsing, PDF parser, PDF to MD, PDF to Markdown, text extract, text
extraction, image extract, image extraction, image from PDF, images from
PDF, data structuring, structured data, PDF API, parse PDF online, extract
data from PDF, RAG, document AI.