I am an identity thief—identithief, if you’re goofy like me. I first consciously stole an identity when I applied for a U.S. visa. On the Visa appointment sign-up page, I received the feedback: "invalid characters found" when I typed in my name, which for the purposes of this article, is "Long Hog-Silver."
I’d spent my whole life spelling my name as it appears on my passport, so I thought it was a system glitch. Fearing that changing my name might complicate the process, I repeatedly tried without success. After hours of frustration, I called a friend, who informed me that the problem was the hyphen in my last name. I removed the hyphen, entered "Long Hog Silver," and—hurray! The system accepted it. Identithief!
Turns out, I'm not alone. Beyoncé, Zoë Kravitz, Julia Louis-Dreyfus, Jean-Claude Van Damme, and Mary-Kate Olsen are all identithieves too, at least on certain documents. I’m willing to bet that these celebrities have financial or government documents where their names are incorrectly formatted—thanks to outdated systems that can’t handle accents, hyphens, or non-Latin characters.
If my experience as "Long Hog-Silver" is anything to go by, Jean-Claude Van Damme might have documents that read “Jean Claude Van Damme,” “Jeanclaude Vandamme,” or “Jean Claude Vandamme.” Van Damme may play a hero who massacres bad guys on screen, but in real life, he may be carrying around an identity that's been butchered by technology—just like mine.
The root of this problem lies in the design of many online systems, particularly in the United States, which are built on outdated technical standards like ASCII. ASCII is a character encoding system developed in the 1960s, originally designed to support 128 characters, including basic Latin letters, numbers, and symbols. ASCII was sufficient when the world was less connected and America was far less diverse, but it is ineffective when confronted with names containing hyphens, accents, or non-Latin characters. In essence, many digital platforms in America lack the infrastructure to recognize names like mine.
ASCII was designed to make sure early computers could talk to each other using just the 26 English letters, numbers, and a handful of symbols. When it was developed, the Federal Information Processing Standards (FIPS) adopted ASCII as the official encoding standard for government systems. That’s why when I try to type in "Long Hog-Silver" in the visa application platform, a federal platform today, the system still sees me as a glitch.
The solution is the problem
There is a cybersecurity argument for sticking with ASCII and restricting special characters—it helps prevent injection attacks. Injection attacks occur when malicious data is fed into a system, causing it to execute unintended actions, like retrieving sensitive data. If a system only allows basic ASCII characters and properly validates those inputs, it reduces the risk of these attacks. Special characters like hyphens and symbols can be exploited in crafting malicious inputs, so sticking to ASCII helps mitigate that risk.
However, relying solely on ASCII for security is like holding an umbrella in a hurricane; it might offer some protection, but it’s far from foolproof. Limiting inputs to ASCII does not inherently prevent injection attacks and can create a false sense of security. Attackers can still exploit ASCII-based systems by using common ASCII characters in malicious queries.
Robust input validation practices—regardless of the character set—are far more effective at preventing injection attacks than simply restricting characters. However, many older systems, especially in government, finance, and healthcare, were built around ASCII, and upgrading them would be costly and complex. That leaves simplicity and compatibility with legacy systems as the only real benefit of sticking with ASCII.
In the 1980s, the International Organization of Standardization (ISO) introduced ISO 8859, but by then, ASCII was so deeply embedded in U.S. systems that upgrading would have been monumental. ISO later introduced ISO/IEC 10646 in 1993, a universal encoding standard that became the basis for Unicode—a system capable of representing over a million characters from various writing systems and backward-compatible with ASCII.
Yet here we are in 2024, still dealing with the limitations of ASCII. When Unicode was introduced, Beyoncé was her daughter’s age. Now, as the mother of an almost-teenage daughter, Beyoncé faces the same outdated systems that existed when she was that little girl. Her daughter cannot even enter the correct encoding of her name to the outrageously outdated security question, “What is your mother’s maiden name?”
British jurist Lord Sumption famously said, “We are ruled by dead people because the basic rules of law are those which past generations have chosen.” This couldn’t be truer in the context of the technical limitations that restrict name input to just the 26 letters of the English alphabet.
Confirm the identithief
Some state governments, like Florida, New York, Texas, California, and Arkansas, have adopted Unicode in their systems to handle diverse names and special characters more accurately. As a result, my Arkansas driver's license accurately spells my name as "Long Hog-Silver." However, most financial institutions, including Visa and Mastercard, still rely heavily on ASCII encoding.
This causes a mismatch when I apply for a new card: while my Arkansas ID lists my last name with a hyphen, my issued card drops it entirely, leading to versions like "Long Hogsilver" or even splitting my name into the double-barreled "Long Hog Silver." And yet their system accepts my license as sufficient proof of my identity. This mismatch raises real concerns about data integrity across systems.
When systems process inconsistent data due to encoding problems, it creates vulnerabilities that attackers can exploit. Hackers can take advantage of these gaps to commit real identity theft (not identitheft), bypass authentication checks, or even create fake profiles using mismatched data across different systems.
This data inconsistency isn’t just a nuisance; it can have significant consequences. In 2018, a major voter registration issue in Georgia put 53,000 registrations on hold because the information in voter records didn’t precisely match data from other state agencies.
This highlights how inconsistent encoding standards like ASCII and Unicode could contribute to such mismatches. Georgia lacks specific regulations on which encoding systems state agencies should use, leaving both ASCII and Unicode in circulation. Such variations in encoding might have played a role in the discrepancies that affected voter registration in the state, leading to widespread concerns of voter disenfranchisement.
Beyond voter registration, these mismatches create deeper security risks. One key reason institutions, especially in finance like Visa and Mastercard, are slow to adopt Unicode is their reliance on legacy systems. These systems, built decades ago using ASCII, prioritize compatibility and simplicity. Decades after systems ensuring data matching were developed, regulations in the U.S. have yet to push for a widespread upgrade to Unicode, keeping financial institutions ruled by outdated technology. Dead people!
Bury the dead
Large-scale data integrity issues don’t just affect individuals—they undermine overall system security. Unicode offers a comprehensive solution to this problem by enabling the consistent representation of characters across different platforms and systems. Unlike ASCII, which is limited to 128 characters, Unicode can encode over a million characters, allowing it to handle names with accents, hyphens, and characters from non-Latin alphabets accurately.
By adopting Unicode, systems can prevent data mismatches that occur when one system reads a name or piece of data differently than another, ensuring that information like voter records, financial details, and personal identities remain consistent. This consistency strengthens data integrity, making it harder for attackers to exploit encoding gaps for identity theft or to bypass authentication checks. Ultimately, using Unicode across all systems could close the vulnerability that exists due to encoding inconsistencies, improving both security and usability across platforms.
The saying goes that a fish rots from the head, but when it comes to encoding standards, the tail of the fish has been rotting for a long time—it is time for the head to fester too. Private organizations and various states have slowly started shifting towards Unicode. The biggest holdup for a complete move to Unicode comes from regulatory pressure and the need to stay compatible with legacy systems, many of which belong to government institutions.
Some organizations have made halfway attempts, using both standards depending on the context. For example, a bank may allow accents and hyphens in its mobile app, but those same characters won’t appear on the bank card.
This dual approach highlights the reluctance to fully upgrade, as compatibility concerns have left institutions caught between the desire for modernity and the need to remain compatible with outdated systems. Government standards, designed to preserve compatibility with legacy systems, are the biggest obstacle to progress. It’s time to let go of the past. The future demands systems that can recognize our names as they truly are—not butchered versions that turn us into identithieves.
Nenebi Tony (Anthony Owura-Akuaku) is an Arkansas-based Ghanaian writer with a focus on the intersection of technology, media, business, and law. He writes about the evolving relationship between technology and society, with a particular interest in how digital systems shape modern life and culture. His 2022 book, Everything that Happened and the People Who Made It, profiles Ghana's top 10 most influential entertainment brands of the 2010s.