My question seems to be a mash up of two questions. The first is that over at TeX.SE there is a question about creating PDFs with crawler resistant email addresses. There are some good answers about creative ways that go way beyond just the boring name<at>domain<dot>com
, but my question is, do these method cut down on spam? This question: Is using 'dot' and 'at' in email addresses in public text still useful? provides an answer for webpages, but not documents like .pdf
and .doc
files. The answer highlights a set of methods based on "dynamic" CSS and PHP methods which are not available for static documents. Further, .doc
files, and to some extent .pdf
, files are not the same as plain text HTML so it is not clear what the crawlers are working on. Crawling through a .doc
binary file to find an email is presumably much harder than crawling through plain text. This makes me think that protection in files is potentially different from protection in webpages. Is obfuscation useful for files posted on webpages?
-
1How can the question be different? After all, you should consider is also a document. What is the difference between a HTML, DOC or PDF... As long as you have the plain text the same response will apply. – kiBytes Feb 14 '14 at 09:59
-
1@kiBytes I thought that doc and pdf are not plain text. I edited the question to try and clarify what I see as different. – StrongBad Feb 14 '14 at 11:09
1 Answers
It really depends on how sophisticated the crawler is. I doubt that the vast majority would go to the trouble of downloading word and excel documents, mainly due to the potential size.
There are libraries available thought that make reading of doc, xls(x), pdf, etc. relatively easy. It is not out of the realm of possibility that some crawlers do download files. It's not feasible to protect against every possible threat, so let's consider how difficult it would be to obscure email addresses in documents vs. the security benefits.
You would need different strategies for likely different document formats, and this may lead to them rendering poorly, and thus losing usability (ability to receive legitimate emails). You could include specific instructions like "In the subject mention 'assistance'.", or something to that end.
- 5,120
- 1
- 15
- 24