AI Document Summaries Look Reliable, Until You Check the Details

Our testing found that the AI tools professionals trust most still miss errors, invent citations, and deliver inconsistent results.

Stéphane TurquayPublished: June 16, 2026

AI-powered document summaries promise to save time, and many professionals now trust them to analyze contracts, reports, and policies with minimal oversight. To test how well that trust holds up, Smallpdf asked four leading AI tools to summarize 40 business documents, each containing a deliberately planted contradiction, and surveyed more than 1,000 professionals about their experiences. The results revealed a significant gap between confidence in AI summaries and the accuracy professionals actually receive.

Key takeaways

AI document summaries can appear trustworthy even when they miss important details: 88% of professionals consider them accurate, though we found AI tools failed to catch a document's planted contradiction 54% of the time on average.
Across all four AI tools tested, 1 in 6 document summaries contained at least one citation to a section or page that doesn't exist.
66% of professionals favor ChatGPT, yet it hallucinated more than any other tool (16% of its citations pointed to something that didn't exist).
71% of professionals act on AI document summaries without checking the original file.
43% have caught AI inventing a number, date, or clause that was never in the document, and of those, 13% only discovered it after they had already acted on it.

When AI Reads the Document but Misses the Problem

AI tools can process large documents in seconds, but speed doesn't always translate into accuracy. Important details are often hidden in footnotes, conflicting clauses, or references that require careful review.

Bar chart showing ChatGPT, Gemini, Copilot, and ChatPDF letter grades for AI document accuracy.

Bar chart showing ChatGPT caught 85% of planted contradictions, the highest of four AI tools tested.

Certain types of errors repeatedly caused problems across nearly every platform. Categorization errors, where the same item appeared in multiple locations, caused issues in 75% of tests. References to sections or appendices that didn't exist appeared in 67% of tests, while footnotes that changed a body figure and eligibility or deadline contradictions each affected 63% of tests. Wrong percentages and rate mismatches created inaccuracies in 60% of tests.

Stacked bar chart showing how often four AI tools caught, partially noted, or missed document contradictions.

Among the tools tested, ChatGPT delivered the strongest performance when identifying contradictions, catching 85% of the planted issues. Gemini identified 62%, while Copilot and ChatPDF missed most contradictions altogether. The results suggest that some AI tools are much better than others at recognizing conflicts within the same document.

Bar chart showing ChatGPT had the highest hallucinated citation rate at 16% across four AI tools.

The findings also revealed that AI tools sometimes invented information entirely. Across the four tools tested, 1 in 6 document summaries (16%) contained at least one citation pointing to a section or page that did not exist. ChatGPT produced the most fabricated references; 79 of its 507 citations (16%) pointed to nonexistent sections. Copilot (5%), Gemini (2%), and ChatPDF (1%) had lower citation-error rates but still produced similar mistakes.

Bar chart showing ChatGPT, Gemini, and Copilot were consistent 67% of the time; ChatPDF only 33%.

Consistency presented another challenge. ChatGPT, Gemini, and Copilot returned the same answer only about two-thirds of the time when identical tests were rerun. ChatPDF performed significantly worse, producing matching results in only one-third of repeat tests, making it the least reliable platform for consistent document analysis.

How Often Professionals Use AI for Document Work

AI has quickly become a routine part of workplace productivity. Across industries, professionals are relying on these tools to summarize, review, and analyze documents every day.

Callout box showing 48% of professionals use AI for document work daily.

ChatGPT was the most widely used platform for document-related tasks, with 66% of professionals reporting regular use. Gemini followed at 45%, while Copilot reached 34%. Despite ChatGPT's popularity, it also generated the highest rate of hallucinated citations in our testing, highlighting how widespread adoption does not necessarily guarantee perfect accuracy.

Daily AI use was especially common in several industries. Retail led the way at 64%, followed by information technology at 60%. Manufacturing workers reported daily use at 57%, while finance and consulting or professional services each reached 51%.

Stats showing 46% of professionals find AI summaries somewhat accurate and 42% very or extremely accurate.

Confidence in AI remains remarkably high. Nearly nine in ten professionals (88%) considered AI-generated document summaries accurate. However, the testing told a different story. Across all platforms, AI tools failed to identify planted contradictions 54% of the time on average. That disconnect suggests many professionals may place more trust in AI-generated outputs than current performance levels warrant.

Acting on AI Without Checking the Source

Many professionals use AI summaries to save time, especially when reviewing lengthy documents. However, relying on summaries without verifying the source can create significant risks.

Callout box showing 71% of professionals acted on AI document summaries without checking the original source.

More than two-thirds of professionals (71%) reported acting on AI-generated document summaries without checking the original file. While that approach can speed up decision-making, it leaves users vulnerable when AI misses key details or introduces inaccurate information.

Bar chart showing 63% of professionals caught AI fabrications before acting on them, 13% only after.

The consequences are already affecting workplaces. Forty-three percent of professionals said they had caught AI inventing a number, date, or clause that was never present in the document. Among those who encountered fabricated information, 13% only discovered the error after they had already acted on it.

These mistakes sometimes led to measurable business consequences. Fourteen percent experienced a missed deadline or project delay after relying on inaccurate AI output. Another 12% had to correct or retract work with a client or external party, while 10% reported a financial error or loss connected to fabricated or inaccurate information.

Trust Requires Verification

AI tools have become valuable workplace assistants, helping professionals process documents faster than ever before. Yet the findings suggest that speed and convenience can create a false sense of certainty when important details are overlooked or fabricated.

While some platforms performed better than others, every tool tested missed contradictions, produced hallucinated citations, or delivered inconsistent results under certain conditions. At the same time, most professionals continue to trust AI summaries and frequently act on them without reviewing the original document. As AI becomes more deeply integrated into document workflows, the most productive approach may not be choosing between AI and human review, but combining both to ensure critical decisions are built on accurate information.

Methodology

We created 40 business documents, including 10 contracts, 10 financial reports, 10 HR policies, and 10 research summaries. Each document contained one deliberately planted error: a clause that contradicted an earlier clause, a math error in a subtotal, a footnote that modified a body figure, or a reference to a section that doesn't exist. We logged every fact in every document in advance (1,082 facts in total) as our answer key.

We then uploaded each PDF to four leading AI tools on their paid tiers, including ChatGPT (GPT-5), Gemini 3.1 Pro, Microsoft Copilot (Smart mode), and ChatPDF Plus, using the same standardized prompt, asking each AI to summarize the document and list every fact. That produced 160 baseline tests (4 tools × 40 documents). Each test used a fresh chat session with no follow-up questions. We reran 12 of the tests in separate sessions to measure whether the same tool gives the same answer twice.

The score: Every output was compared against the ground-truth fact log. We measured whether each AI caught the planted contradiction, how completely it covered the document's facts, and whether its citations referenced things that actually exist in the source document. The composite grade weighted these three measures at 40%/40%/20%.

We also surveyed 1,005 working professionals about how they use AI to summarize, draft, review, and analyze documents and how often that AI output turns out to be inaccurate. Participants spanned a range of job levels, from individual contributors (56%) and managers (29%) to senior leaders including directors, vice presidents, C-suite executives, and company owners (14%). The sample spanned generations, including Gen Z (16%), millennials (55%), Gen X (25%), and baby boomers (4%). Methodology percentages not totaling 100% are due to rounding. The data was collected in May 2026.

About Smallpdf

Smallpdf helps professionals, students, and businesses work more efficiently with digital documents. From converting and compressing PDFs to editing, organizing, and eSigning files, Smallpdf provides simple tools that streamline document workflows and reduce friction. As AI becomes more common in document management, Smallpdf helps users maintain accuracy, security, and control throughout the process.

Fair Use Statement

The information and findings presented in this article may be used for noncommercial purposes only. If you share or reference this content, please provide proper attribution and include a link back to Smallpdf.

How File Errors Derail Workflows and Cost Businesses

Why Visual Receipts Are Ruining Our Digital Memory

Stéphane Turquay

Principal Product Manager at Smallpdf

More by this author