I've just ran into a bad surprise with html (html5, evergreen browser).
# The end-tag open (ETAGO) delimiter problem
# Context: JavaScript
Until now when I wanted to escape user data to be inserted inside a <script>
tag in an html file, I was doing something like this:
let userComment = "hello!";
let htmlScript = `
<script>
var userComment = ${JSON.stringify(userComment)}; /* WRONG! */
</script>
`;
It would get inserted in my html file like this:
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Website!</title>
</head>
<body>
<script>
var userComment = "hello!";
</script>
</body>
</html>
Well, I've just discovered that if userComment
contains "hello!</script><script>window.alert('POWNED!');</script>"
, something interesting happens...
Let's have a look at the generated html:
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Website!</title>
</head>
<body>
<script>
var userComment = "hello!</script><script>window.alert('POWNED!');</script>";
</script>
</body>
</html>
OK, everything is escaped correctly so it should be fine... but no!
Because the html parser has little context over what's inside the <script></script>
tags, it just naively gets whatever is between <script>
and the first </script>
tag it encounters after it and tries to execute that. It's called the end-tag open (ETAGO) delimiter problem.
So now, your safe looking string is triggering an alert popup to your visitors... not cool.
It can also happen in other contexts:
# Context: CSS style
<style type="text/css">
p {
content: "</style><script>window.alert('POWNED!');</script>";
background: green;
}
</style>
# Context: JSON-LD (SEO)
<script type="application/ld+json">
{
"@context" : "http://schema.org",
"@type" : "BlogPosting",
"description": "Do you know the html ETAGO problem? </script><script>window.alert('POWNED!');</script>"
}
</script>
# Solution
Basically, the soluton is to escape </
to <\/
and <!--
to <\!--
.
If you're generating your html file with JavaScript / Node.js, I recommend using jsesc.
After installing jsesc
, here is the code I am now using:
let userComment = "hello!</script><script>window.alert('POWNED!');</script>";
let htmlScript = `
<script>
var userComment = ${jsesc(userComment, { json: true, isScriptContext: true })}; /* SAFE! :) */
</script>
`;
And this is the html file that we get:
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Website!</title>
</head>
<body>
<script>
var userComment = "hello!<\/script><script>window.alert('POWNED!');<\/script>";
</script>
</body>
</html>
In this example, userComment
can also be an object, jsesc
will behave like JSON.stringify()
(but safer as it'll also escape </
and <!--
strings).
As a result you can use it to escape a string that would be used with the css content:
field as </style>
will be escaped as well.
You can also use this code to escape the json you are using in <script type="application/ld+json"></script>
when using producing JSON-LD (SEO).
If you are looking for alternatives, here is another solution.