HTML Email ought to be considered harmful

Some people make fun of my plain-text emails, but really, I think it’s time we re-consider our desire for colours, hyperlinks and inline images in email messages, especially for those who use web-based email clients as their primary email interface.

The problem basically boils down to this: HTML gives too much opportunity for mischief by a malicious party. In most cases, HTML isn’t even necessary to convey the information required. Tables are about the only real “feature” that is hard to replicate in plain text, for everything else there’s reasonable de-facto standards already in existence.

Misleading hyperlinks

HTML has a feature where a link to a remote page can take on any descriptive text the author desires, including images and other valid URIs. For example, the following piece of HTML code is perfectly valid:

<a href="http://www.google.com.malicious.website.example.com/">
   http://www.google.com
</a>

There are many cases where this feature is “useful”, however in an email, it can be used to disguise phishing attempts. In the above example, the link is claiming to be to Google’s search website, however would otherwise re-direct that user to some other, likely malicious, website.

Granted, not every user can read a URI to determine if it is safe. There are adults who “grew up with the Internet”, that have never typed a URI in an address bar ever, instead relying on tools like search engines to locate websites of interest.

However, it would seem disingenuous to say that because a proportion of the community cannot read a URI, we should hide any and all links from everybody. For that small portion, showing the links won’t make a difference, but it will at least make it easier to avoid such traps.

Media exploits

Media decoders are written by humans, and humans are imperfect, thus it is fair to say there are media decoders that contain bugs, some of which could be disastrous for computer security.

Microsoft had such a problem in their GDI+ JPEG decoder back in 2004. More recently, there was a kernel-level security vulnerability in their TrueType font parser.

Modern HTML allows embedding of all this, and more. Most email clients will also allow you to “preview” an email without opening it. If an email embeds inline media which exploits vulnerabilities such as the one above, just previewing it will be sufficient to gain access.

Details are scarce, but it would appear it was a vulnerability along these lines that allowed unauthorised access into the Australian National University back in 2018.

Scripting

Modern web standards allow all kinds of means for embedding scripts, that is, small pieces of interpreted code which runs client-side in the HTML renderer. ECMAScript (JavaScript) can be embedded:

  • in <script> tags (the traditional way)
  • inside a hyperlink using a javascript: URI
  • HTC and XBL features in Internet Explorer and Mozilla Firefox, respectively.

Probably lots more ways I haven’t thought about.

Web-based email clients

Now, a stand-alone email client such as Microsoft Outlook, Eudora or Mozilla Thunderbird can simply not implement the scripting features, however the problem is highly acute where web-based email clients are used.

Here, you’re viewing an email in a HTML engine that has complete media and scripting capabilities. There’s dozens of ways to embed both forms of content into a blob of HTML, and you are entirely at the mercy of your web-based email client’s ability to sanitise the HTML before it dumps it inside the DOM tree that represents your email client.

As far as the web browser is concerned, the “email” is just another web page, and will not hesitate to execute embedded scripts or render inline media, whether the user wishes it to or not.

It’s not known what ANU uses for their email infrastructure, but many universities are big fans of web-based email since it means they don’t have to explain to end users how to configure their email clients, and provides portability for their users.

Putting users at risk

Despite the above, it would appear there are lots of organisations that are completely oblivious to this problem, and insist on forcing people to render their emails as HTML, putting their customers/users at risk of security breach.

The purpose of multiple formats in the same email is to provide alternate formats of the same content. Not to provide totally different emails because you can’t be stuffed!

For example, my workplace’s hosting provider, recently sent us an email, which when viewed as plain text, read as follows:

Hello Client,
 
Unfortunately your email client is outdated and does not support HTML emails, our system uses HTML emails as standard. You will NOT be able to read this email.
 
HOW DO I READ THIS EMAIL?
 
To read this email please login to your domain manager https://hostingprovider.example.com/login/ and click on Notifications to see a list of all sent emails.
 
Thank You
 
Customer Support

The suggestion that an email client configured to read emails as plain text, counts as it being “outdated” is naïve in the extreme, and I’d expect a hosting provider to know better. I’m thankful I personally don’t purchase services from them!

Then there’s financial service providers. One share registry’s handling of the situation is downright abusive:

Link Market Services sent numerous emails that looked exactly like this.

Yeah, rather than just omitting the text/plain component and letting the email client at this end try to render the HTML as plain text (which works reasonably well in many cases), in this case, they just sent an empty text/plain body:

From: …redacted… <comms@linkmarketservices.com.au>
To: …redacted…
Reply-To: donotreply@linkmarketservices.com.au
Date: Wed, 29 Jul 2020 04:42:30 +0000
Subject: …redacted… Funds Attribution Managed Investment Trust Member Annual
 Statement
Content-Type: multipart/alternative;
 boundary=--boundary_55327_8fbc43bd-48f3-4aa1-9ab7-9046df02b853
ZMID: 9f485235-9848-49cf-9a66-62c215ea86ba-1
Message-ID: <0100017398e10334-d45a0a83-ff4d-4b83-8d19-146b100017f6-000000@us-east-1.amazonses.com>
X-SES-Outgoing: 2020.07.29-54.240.9.110
Feedback-ID: 1.us-east-1.oB/l4dCmGdzC38jjMLirCbscajeK9vK6xBjWQPPJrkA=:AmazonSES


----boundary_55327_8fbc43bd-48f3-4aa1-9ab7-9046df02b853
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"


----boundary_55327_8fbc43bd-48f3-4aa1-9ab7-9046df02b853
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"

PCFET0NUWVBFIGh0bWw+Cgo8aHRtbCBsYW5nPSJlbiI+CjxoZWFkPgo8bWV0YSBjaGFyc2V0PSJ1
dGYtOCI+CjxtZXRhIGNvbnRlbnQ9IndpZHRoPWRldmljZS13aWR0aCwgaW5pdGlhbC1zY2FsZT0x
IiBuYW1lPSJ2aWV3cG9ydCI+CjxtZXRhIGNvbnRlbnQ9IklFPWVkZ2UiIGh0dHAtZXF1aXY9Ilgt

Ohh yeah, I’m so fluent in BASE64! I’ve since told them to send it via snail mail. Ensuring they don’t burn down forests posting blank sheets of A4 paper will be their problem.

The alternative: wiki-style mark-up

So, our biggest problem is that HTML does “too much”, so much that it becomes a liability from a security perspective. HTML wasn’t the first attempt at rich text in email… some older email clients used enriched text.

Before enriched text and HTML, we made do with various formatting marks, for instance, *bold text might be surrounded by asterisks*, and /italics indicated with forwardslashes/. _Underlining_ could be done too.

No there was no colour or font size options, but then again this was from a time when teletype terminals were not uncommon. The terminals of that time only understood one fixed-size font, and most did not do colour.

More recently, Wikis have built on this to allow for some basic mark-up features, whilst still allowing the plain text to be human readable. Modern takes of this include reStructured Text and Markdown, the latter being the native format for Wikis on Github.

Both these formats allow for embedding images inline and for tables (Markdown itself doesn’t do tables, but extensions of it do).

In email clients such images should be replaced with place-holders until the user clicks a button to reveal the images (which they should only click if they trust the sender).

Likewise, hyperlinks should be rendered in full, e.g. in a web-based client, the link [a cool website](http://example.com/) might be rendered as <a href="http://example.com">[a cool website](<code>http://www.example.com</code>)</a> — thus allowing for malicious links to be more easily detected. (It also makes them printable.) Only plain text should be permitted as a “label” for a hyperlink.

Use of such a mark-up format would have a number of benefits:

  • the format is minimal, meaning a much reduced attack surface for security vulnerabilities
  • whilst minimal, it would cover the vast majority of peoples’ use cases for HTML email
  • the mark-up is light-weight, reducing bandwidth for those on low-speed links or using lower-power devices

The downside might be for businesses, which rely on more advanced features in HTML to basically make an email look like their letter head. The business community might wish to consider the differences between a printed letter sent via the post, and an email sent electronically. They are different beasts, and trying to treat one as a substitute for the other will end in tears.

In any case, a simple letter head could be embedded as an inline image quite safely if such a feature was indeed required.

It is in our interests to curtail the features used in email communications if we intend to ensure communications remain safe and reliable.