$> man42.net Blog written by a human

I've just ran into a bad surprise with html (html5, evergreen browser).

# The end-tag open (ETAGO) delimiter problem

# Context: JavaScript

Until now when I wanted to escape user data to be inserted inside a <script> tag in an html file, I was doing something like this:

let userComment = "hello!";

let htmlScript = `
<script>
var userComment = ${JSON.stringify(userComment)}; /* WRONG! */
</script>
`;

It would get inserted in my html file like this:

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Website!</title>
</head>
<body>
<script>
var userComment = "hello!";
</script>
</body>
</html>

Well, I've just discovered that if userComment contains "hello!</script><script>window.alert('POWNED!');</script>", something interesting happens...

Let's have a look at the generated html:

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Website!</title>
</head>
<body>
<script>
var userComment = "hello!</script><script>window.alert('POWNED!');</script>";
</script>
</body>
</html>

OK, everything is escaped correctly so it should be fine... but no!

Because the html parser has little context over what's inside the <script></script> tags, it just naively gets whatever is between <script> and the first </script> tag it encounters after it and tries to execute that. It's called the end-tag open (ETAGO) delimiter problem.

So now, your safe looking string is triggering an alert popup to your visitors... not cool.

It can also happen in other contexts:

# Context: CSS style

<style type="text/css">
  p {
    content: "</style><script>window.alert('POWNED!');</script>";
    background: green;
  }
</style>

# Context: JSON-LD (SEO)

<script type="application/ld+json">
{
  "@context" : "http://schema.org",
  "@type" : "BlogPosting",
  "description": "Do you know the html ETAGO problem? </script><script>window.alert('POWNED!');</script>"
}
</script>

# Solution

Basically, the soluton is to escape </ to <\/ and <!-- to <\!--.

If you're generating your html file with JavaScript / Node.js, I recommend using jsesc.

After installing jsesc, here is the code I am now using:

let userComment = "hello!</script><script>window.alert('POWNED!');</script>";

let htmlScript = `
<script>
var userComment = ${jsesc(userComment, { json: true, isScriptContext: true })}; /* SAFE! :) */
</script>
`;

And this is the html file that we get:

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Website!</title>
</head>
<body>
<script>
var userComment = "hello!<\/script><script>window.alert('POWNED!');<\/script>";
</script>
</body>
</html>

In this example, userComment can also be an object, jsesc will behave like JSON.stringify() (but safer as it'll also escape </ and <!-- strings).

As a result you can use it to escape a string that would be used with the css content: field as </style> will be escaped as well.

You can also use this code to escape the json you are using in <script type="application/ld+json"></script> when using producing JSON-LD (SEO).

If you are looking for alternatives, here is another solution.

Buffer this pageShare on TumblrDigg thisShare on FacebookShare on LinkedInTweet about this on TwitterEmail this to someoneShare on Google+Share on RedditPin on Pinterest