Malformed Internationalized Domain Name (IDN) Leads to Discovery of Vulnerability in IDN Libraries



Executive Summary

As part of our research for "Farsight Security Global Internationalized Domain Name (IDN) Homograph, Q2 2018 Report", Farsight Security discovered a bug in the popular libidn and libidn2 C libraries, which are used to build Internationalized Domain Name in Applications (IDNA)-aware software. Depending on how the code is written, this bug could lead to a security vulnerability in trusting applications. It occurs in the Punycode decoder when pathological inputs decode to illegal Unicode code point values.

While we worked closely with the vendor to report and patch the vulnerability, it is important for application programmers and end-users to patch their code.


To get the most from this article, the reader should be familiar with the following technologies:

The functions responsible for decoding Punycode into Unicode in both libidn and libidn2 can be coerced to generate invalid Unicode code point values yet return successfully. These resultant code point values are larger than the maximum valid Unicode code point of 0x10FFFF (1,114,112) and depending on how they are subsequently treated by application code, these values may result in a program crash or other undefined behavior including possible arbitrary code execution.

The simplest Punycode string that triggers this behavior is xn--0000h, which decodes to a single "code point" value of U+127252 (1,208,914) - and is not a legal Unicode code point. This is shown below using a simple test program "punydecode" (available in Appendix A).

$ echo "xn--0000h" | punydecode -


The libidn and libidn2 libraries are open source implementations of IDNA (libidn implements IDNA2003 while libidn2 implements IDNA2008). They both provide APIs to encode and decode internationalized domain names.

Inside the latest versions of both libraries (1.35 for libidn and 2.0.5 for libidn2) are two almost identical¹ functions responsible for decoding Punycode strings into Unicode code points. Libidn calls this function punycode_decode() while libidn2 calls it _idn2_punycode_decode()².

From here on out, we will refer to both functions as simply the "Punycode decoder".

The Punycode decoder is an implementation of the algorithm described in section 6.2 of RFC 3492. As it walks the input string, the Punycode decoder fills the output array with decoded code point values. The output array itself is typed to hold unsigned 32-bit integers while the Unicode code point space fits within 21 bits. This leaves a remainder of 11 unused bits that can result in the production of invalid Unicode code points if accidentally set. The vulnerability is enabled by the lack of a sanity check to ensure decoded code points are less than the Unicode code point maximum of 0x10FFFF. As such, for offending input, unchecked decoded values are copied directly to the output array and returned to the caller.

The Fix

The bug can be fixed simply by checking for excessive code point values prior to insertion into the output array. Something as simple as the following will work:

/* decoding of basic string */
if (code_point > 0x10FFFF)
    return punycode_bad_input;
/* insertion into the output array */

A similar patch has been pushed to the libidn and libidn2 repositories and should be readily available.

In Closing

For the remediation and disclosure of this security condition, Farsight worked directly with Tim Rühsen, the maintainer of libidn and libidn2. We would like to thank him for his prompt and detailed responses at every point in the process.

Finally, Farsight did not discover this vulnerability through a code audit, but rather, through an encounter with a malformed IDN in the wild. While we won't (currently) release details on the domain in question, we feel it's important to inform others that there are live hostnames out there that may trigger this bug, and thus that it is important to upgrade dependent libidn / libidn2 packages.

Appendix A: Punycode Decode Test Program

The following program can be used to check Punycode input strings for overflow. It expects input as single Punycode-encoded labels with or without the ACE prefix and can read from a file or a pipeline.

If there is no error, the output is colon separated as per the following:

input punycode:code point count:code points.

For conforming inputs punydecode will prepend a lowercase u+ before each code point:

$ echo "xn--8a" | punydecode -

For offending inputs it will prepend an uppercase U+ before each code point:

$ echo "xn--0000h" | punydecode -

Additionally, the program tests the reversibility of the input Punycode string and will emit an "encode mismatch" error if the decoded code points don't encode to the original Punycode.

To build punydecode.c, you'll need "idn2.h", "puny_decode.c", "puny_decode.c", and "punycode.h" from libidn2 to reside in the same directory. You can build with something like:

gcc -Wall -O0 -ggdb punydecode.c puny_decode.c puny_encode.c -o punydecode.

 * alabel punycode decoder
 *  Copyright (c) 2018 by Farsight Security, Inc.
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <errno.h>

#include "idn2.h"
#include "punycode.h"

main(int argc, char **argv)
	int rc;
	FILE *f;
	char *line_buf = NULL, line[BUFSIZ], *p, alabel[BUFSIZ];
	ssize_t line_len;
	size_t line_cap = 0;
	uint32_t i, ulabel[BUFSIZ] = {0};
	size_t ulabel_len = sizeof (ulabel), alabel_len = sizeof (alabel);

	if (argc != 2) {
		fprintf(stderr, "usage: %s <infile> || cat <infile> | %s -\n", argv[0], argv[0]);
		return (EXIT_FAILURE);

	if (strcmp(argv[1], "-") == 0)
		f = stdin;
	else {
		f = fopen(argv[1], "r");
		if (f == NULL) {
			fprintf(stderr, "error: unable to open %s: %s\n",
			return (EXIT_FAILURE);

	while ((line_len = getline(&line_buf, &line_cap, f)) > 0) {
		strcpy(line, line_buf);
		p = line;
		line[line_len - 1] = '\0';

		if (line[0] == 'x' && line[1] == 'n' && line[2] == '-' && line[3] == '-')
			p += 4;

		rc = _idn2_punycode_decode(strlen(p), p, &ulabel_len, ulabel);
		if (rc != IDN2_OK) {
			fprintf(stderr, "%s:decode err: %d\n", p, rc);

		fprintf(stderr, "%s:%zu:", p, ulabel_len);
		for (i = 0; i < ulabel_len; i++) {
			if (ulabel[i] > 0x10FFFF)
				/* overflow */
				fprintf(stderr, "U+%04x", ulabel[i]);
				fprintf(stderr, "u+%04x", ulabel[i]);
			if (i + 1 < ulabel_len)
				fprintf(stderr, ",");

		/* check reversibility */
		rc = _idn2_punycode_encode(ulabel_len, ulabel, &alabel_len, alabel);
		if (rc != IDN2_OK) {
			fprintf(stderr, "%s:encode err: %d\n", p, rc);
		if (alabel_len > 0 && strncasecmp(alabel, p, strlen(p)) != 0)
			fprintf(stderr, ":encode mismatch %s\n", alabel);
			fprintf(stderr, "\n");

	return (EXIT_SUCCESS);


¹ The only difference is libidn's support for case-awareness. Since IDNA2008 removes support for uppercase characters, libidn2 has no such support.

² This function is ostensibly private and not directly usable through the libidn2 API. In fact, access to it is "protected" by a call to the libunistring function u8_to_u32() which validates the Punycode before handing it off to _idn2_punycode_decode(). However, the function is not static in scope and is externally accessible. According to the libidn2 README, the library is intended to be drop-in replacement for libidn:

"This library is backwards (API) compatible with the libidn library. Replacing the idna.h header with idn2.h into a program is sufficient to switch the application from IDNA2003 to IDNA2008 as supported by this library."

As such, if an application programmer upgrades from libidn to libidn2 and has an IDNA-based application that directly calls punycode_decode(), and does something like the following, program will be vulnerable the overflow:

extern _IDN2_API int
_idn2_punycode_encode (size_t input_length, const uint32_t input[],
size_t * output_length, char output[]);

extern int
_idn2_punycode_decode (size_t input_length, const char input[], size_t *
output_length, uint32_t output[]);

#define punycode_decode _idn2_punycode_decode
#define punycode_encode _idn2_punycode_encode

/* ...libidn-based code here ...*/

Furthermore, if an application programmer is concerned about bloat and/or performance, the Punycode source files might be cherry-picked directly from the library, bypassing any protections afforded by u8_to_u32().

Mike Schiffman is an IDNA2020 Hopeful for Farsight Security, Inc.