
1 result

Extract structured metadata, text, and citations from academic PDFs using machine learning. GROBID converts unstructured PDFs into TEI XML format with recognized logical structure. Use when: (1) extracting metadata (title, authors, affiliations) from research papers, (2) identifying document sections and structure, (3) parsing reference lists and citations, (4) building indexes or knowledge bases from academic PDFs, or (5) batch processing directories of papers. Not suitable for: scanned image PDFs without OCR, non-academic documents with unusual layouts, binary content (tables/figures), or simple text extraction tasks.