Ever tried doing a "site:xyz.com" search through Google? Unless they're a site like idkfa (forbidding all search engine indexing), Google usually does a pretty good job.
Also, text indexing for the purpose of a search database is non-trivial.
Also, PDF scraping for textual content is non-trivial.