On this page
Encoding Links Dark corners of Unicode Let’s Stop Ascribing Meaning to Code Points UTF-8 – “The most elegant hack” (2013) UTF-8 Everywhere (HN )Not everything is UTF-8 (2020) (Lobsters )Archie Markup Language (ArchieML) - Structured text format optimized for human writability.Unidecode - Lossy ASCII transliterations of Unicode text.BARE Message Encoding - Simple binary representation for structured application data. (Lobsters )Explaining text representation in computers (2020) Text Encoding: The Thing You Were Trying to Avoid (2020) Amazon Ion - Richly-typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations. (HN ) (HN )Unicode In Five Minutes (2013) Unicode support. What does that actually mean? (2020) Fast UTF-8 validation (2020) (HN ) (Code )UTF-8 Illustrator Coded Character Sets, History and Development (1980) Awesome Code Points - Curated list of characters in Unicode, that have interesting (and maybe not widely known) features or are awesome in some other way.Text rendering tests - Unicode’s test suite for text rendering engines.Executable PNGs (2020) (Lobsters ) (HN )Unicode Proposal to Encode Subscripts/Superscripts for Mathematical Programming Emoji Under the Hood (HN )The history of UTF-8 as told by Rob Pike (Lobsters ) (HN )Concise Encoding - Friendly data format for human and machine. Ad-hoc, secure, with 1:1 compatible twin binary and text formats and rich type support. (Code )Practical Reed-Solomon for Programmers (2021) Apache Avro - Data serialization system. (Web )Unicode sorting is hard & why browsers added special emoji matching to regexp (2021) Any Encoding, Ever (2021) (HN )casync - Content Addressable Data Synchronizer.ANSI Escape Codes (HN )Entropy coding in Oodle Data: Huffman coding (2021) Fun with Morse Code Latinendian vs Arabendian (2020) ICU - International Components for Unicode (Code )ruststep - STEP toolkit for Rust.Overview of Serialization Technologies (2018) Substrait - Cross-Language Serialization for Relational Algebra. (Code ) (substrait-rs )OpenH264 - Open Source H.264 Codec.Unicode Normalization Forms: When ö ≠ ö (2021) (HN )Planus - Alternative flatbuffer implementation.TinyCBOR - Concise Binary Object Representation (CBOR) Library.Cheat sheets for the Portable Document Format Understanding UUIDs, ULIDs and String Representations (Lobsters ) (HN )How UTF-8 works (2022) (HN ) (Lobsters )What Every Programmer Absolutely, Positively Needs To Know About Encodings (2011) (HN )Why I invented “dash encoding”, a new encoding scheme for URL paths (2022) (Tweet )You Don't Know GIF – An analysis of a GIF file and some weird GIF features (2022) DeGauss - Avro schema compatibility checker.Hex: A Strategy Guide (HN )trrs - CLI tool to transform data between different encodings.Ask HN: Is there a tool to generate binary protocol figures out of a spec? (2022) Plain Text - Dylan Beattie (2021) Low Complexity Communication Codec (LC3) ltp - High performance, readable, and maintainable, in-place encoding format.Identity Crisis: Sequence v. UUID as Primary Key (Lobsters )Concise Encoding - Secure data format for a modern world. (HN )BIPF (Binary In-Place Format) Spec - Binary format designed for in-place (without parsing) reads, with schemaless json-like semantics.New UUID Formats (2022) (HN )UTF8.XYZ - Quick web app for fetching Unicode characters without extra fluff. (Code )How to estimate disk space muon - Compact and simple binary object notation. (Lobsters ) (Doc )Character encoding and UTF-8 (2022) (HN )Type of Barcodes and Their Usage (HN )Unicode Character Search - Search for Unicode Characters by name, codepoint or text. (Code )Understanding Big Data File Formats (2022) Free Lossless Audio Codec (FLAC) How QR codes work (2022) (HN )Lyra V2 – a better, faster, and more versatile speech codec (2022) (HN )ULIDs are greate replacement for UUIDs LXMF - Lightweight Extensible Message Format.VRS - File format optimized to record & playback streams of sensor data, such as images, audio samples, and any other discrete sensors (IMU, temperature, etc), stored in per-device streams of timestamped records."AVIF: Creating a new image format in the open" by Jon Bauman (2022) Awesome Unique ID Google explains why it's removing JPEG XL from the Chromium code base (2022) (Lobsters ) (HN )Elements Of a Great Markup Language (2022) (HN ) (Lobsters )The essence of Reed-Solomon coding (2022) (HN ) (Lobsters )msgpack-tools - Command-line tools for converting between MessagePack and JSON.ADBC: Arrow Database Connectivity hext - Binary File Markup Language.MP4 file encoding explained visually (HN )GraphAr - Open source, standard data file format for graph data storage and retrieval.Quite OK Image is now my favorite asset format (2022) (HN )PA - Native storage format based on arrow.rsbkb - Command line tools to encode/decode things.Hello, PNG (2023) (HN )Schemaboi - Serialization / deserialization format designed to fill a similar niche as Protobuf or JSON.Unicode Arrows A Safer High Performance AV1 Decoder (2023) (Lobsters )Image Codec Comparison (JXL vs. AVIF vs. WebP vs. JPG) (2023)