The lost art of XML — mmagueta

Kissaki@programming.dev · 6 days ago

The lost art of XML — mmagueta

Ephera@lemmy.ml · 6 days ago

IMHO one of the fundamental problems with XML for data serialization is illustrated in the article:

(person (name "Alice") (age 30))
[is serialized as]
<person>
  <name>Alice</name>
  <age>30</age>
</person>
Or with attributes:
<person name="Alice" age="30" />

The same data can be portrayed in two different ways. Whenever you serialize or deserialize data, you need to decide whether to read/write values from/to child nodes or attributes.

That’s because XML is a markup language. It’s great for typing up documents, e.g. to describe a user interface. It was not designed for taking programmatic data and serializing that out.

atzanteol@sh.itjust.works · 6 days ago

This is your confusion, not an issue with XML.

Attributes tend to be “metadata”. You ever write HTML? It’s not confusing.

Feyd@programming.dev · edit-2 6 days ago

In HTML, which things are attributes and which things are tags are part of the spec. With XML that is being used for something arbitrary, someone is making the choice every time. They might have a different opinion than you do, or even the same opinion, but make different judgments on occasion. In JSON, there are fewer choices, so fewer chances for people to be surprised by other people’s choices.

atzanteol@sh.itjust.works · edit-2 6 days ago

I mean, yeah. But people don’t just do things randomly. Most people put data in the body and metadata in attributes just like html.

Ephera@lemmy.ml · 6 days ago

Having to make a decision isn’t my primary issue here (even though it can also be problematic, when you need to serialize domain-specific data for which you’re no expert). My issue is rather in that you have to write this decision down, so that it can be used for deserializing again. This just makes XML serialization code significantly more complex than JSON serialization code. Both in terms of the code becoming harder to understand, but also just lines of code needed.
I’ve somewhat come to expect less than a handful lines of code for serializing an object from memory into a file. If you do that with XML, it will just slap everything into child nodes, which may be fine, but might also not be.

atzanteol@sh.itjust.works · 6 days ago

Having to make a decision isn’t my primary issue here (even though it can also be problematic, when you need to serialize domain-specific data for which you’re no expert). My issue is rather in that you have to write this decision down, so that it can be used for deserializing again. This just makes XML serialization code significantly more complex than JSON serialization code. Both in terms of the code becoming harder to understand, but also just lines of code needed.

This is, without a doubt, the stupidest argument against XML I’ve ever heard. Nobody has trouble with using attributes vs. tag bodies. Nobody. There are much more credible complaints to be made about parsing performance, memory overhead, extra size, complexity when using things like namespaces, etc.

I’ve somewhat come to expect less than a handful lines of code for serializing an object from memory into a file. If you do that with XML, it will just slap everything into child nodes, which may be fine, but might also not be.

No - it is fine to just use tag bodies. You don’t need to ever use attributes if you don’t want to. You’ve never actually used XML have you?

https://www.baeldung.com/jackson-xml-serialization-and-deserialization

Ephera@lemmy.ml · 6 days ago

Okay, dude, glad to have talked.

Feyd@programming.dev · 6 days ago

JSON also has arrays. In XML the practice to approximate arrays is to put the index as an attribute. It’s incredibly gross.

Kissaki@programming.dev · 6 days ago

In XML the practice to approximate arrays is to put the index as an attribute. It’s incredibly gross.

I don’t think I’ve seen that much if ever.

Typically, XML repeats tag names. Repeating keys are not possible in JSON, but are possible in XML.

<items>
  <item></item>
  <item></item>
  <item></item>
</items>

Feyd@programming.dev · edit-2 6 days ago

That’s correct, but the order of tags in XML is not meaningful, and if you parse then write that, it can change order according to the spec. Hence, what you put would be something like the following if it was intended to represent an array.

<items>
  <item index="1"></item>
  <item index="2"></item>
  <item index="3"></item>
</items>

Kissaki@programming.dev · edit-2 6 days ago

https://www.w3.org/TR/2004/REC-xml-infoset-20040204/

[children] An ordered list of child information items, in document order.

Does this not cover it?

Do you mean if you were to follow XML standard but not XML information set standard?

Feyd@programming.dev · 6 days ago

Information set isn’t a description of XML documents, but a description of what you have that you can write to XML, or what you’d get when you parse XML.

This is the key part from the document you linked

The information set of an XML document is defined to be the one obtained by parsing it according to the rules of the specification whose version corresponds to that of the document.

This is also a great example of the complexity of the XML specifications. Most people do not fully understand them, which is a negative aspect for a tool.

As an aside, you can have an enforced order in XML, but you have to also use XSD so you can specify xsd:sequence, which adds complexity and precludes ordered arrays in arbitrary documents.

Kissaki@programming.dev · edit-2 5 days ago

If the XML parser parses into an ordered representation (the XML information set), isn’t it then the deserializer’s choice how they map that to the programming language/type system they are deserializing to? So in a system with ordered arrays it would likely map to those?

If XML can be written in an ordered way, and the parsed XML information set has ordered children for those, I still don’t see where order gets lost or is impossible [to guarantee] in XML.

Feyd@programming.dev · 5 days ago

You are correct that it is the deserializer’s choice. You are incorrect when you imply that it is a good idea to rely on behavior that isn’t enforced in the spec. A lot of people have been surprised when that assumption turns out to be wrong.

faint_marble_noise@programming.dev · 4 days ago

XML is not great for user interfaces at all.

Ephera@lemmy.ml · 4 days ago

Eh, I don’t think it’s the be-all and end-all of describing user interfaces, but it deals well with the deep nesting that UIs generally have, and the attributes allow throwing in metadata for certain elements, which is also something you frequently need in UIs.

At the very least, JSON, YAML, INI and TOML would be a lot worse.

faint_marble_noise@programming.dev · 4 days ago

Well, from my experience working with android xml guis is soul crushing. While QML is much more pleasant, and it is kinda like json, but not quite.

Ephera@lemmy.ml · edit-2 4 days ago

Yeah, fair enough. I was thinking in terms of the more general-purpose text formats. I have heard good things about QML, too…

Kissaki@programming.dev · 6 days ago

It can be used as alternatives. In MSBuild you can use attributes and sub elements interchangeably. Which, if you’re writing it, gives you a choice of preference. I typically prefer attributes for conciseness (vertical density), but switch to subelements once the length/number becomes a (significant) downside.

Of course that’s more of a human writing view. Your point about ambiguity in de-/serialization still stands at least until the interface defines expectation or behavior as a general mechanism one way or the other, or with specific schema.

epyon22@sh.itjust.works · 6 days ago

The fact that json serializes easily to basic data structures simplifies code so much. Most use cases don’t need fully sematic data storage much of which you have to write the same amount of documentation about the data structures anyways. I’ll give XML one thing though, schemas are nice and easy, but high barrier to entry in json.

Kissaki@programming.dev · 6 days ago

Most use cases don’t need fully sematic data storage

If both sides have a shared data model it’s a good base model without further needs. Anything else quickly becomes complicated because of the dynamic nature of JSON - at least if you want a robust or well-documented solution.

[object Object]@lemmy.world · 6 days ago

If both sides have a shared data model

If the sides don’t have a common understanding of the data structure, no format under the sun will help.

Kissaki@programming.dev · 6 days ago

The point is that there are degrees to readability, specificity, and obviousness, even without a common understanding. Self-describing data, much like self-describing code, is different from a dense serialization without much support in that regard.

lad@programming.dev · 6 days ago

Yeah, when the same API endpoint sometimes return a string for an error, sometimes an object, and sometimes an array, JSON doesn’t help much in parsing the mess

Feyd@programming.dev · edit-2 6 days ago

Honestly, anyone pining for all the features of XML probably didn’t live through the time when XML was used for everything. It was actually a fucking nightmare to account for the existence of all those features because the fact they existed meant someone could use them and feed them into your system. They were also the source of a lot of security flaws.

This article looks like it was written by someone that wasn’t there, and they’re calling people telling them the truth that they are liars because they think features they found in w3c schools look cool.

Diplomjodler@lemmy.world · 6 days ago

It’s true, though, that JSON is just better for most applications.

AnitaAmandaHuginskis@lemmy.world · edit-2 6 days ago

I love XML, when it is properly utilized. Which, in most cases, it is not, unfortunately.

JSON > CSV though, I fucking hate CSV. I do not get the appeal. “It’s easy to handle” – NO, it is not. It’s the “fuck whoever needs to handle this” of file “formats”.

JSON is a reasonable middle ground, I’ll give you that

thingsiplay@lemmy.ml · 6 days ago

Biggest problem is, CSV is not a standardized format like JSON. For very simple cases it could be used as a database like format. But it depends on the parser and that’s not ideal.

flying_sheep@lemmy.ml · 5 days ago

Exactly. I’ve seen so much data destroyed silently deep in some bioinformatics pipeline due to this that I’ve just become an anti CSV advocate.

Use literally anything else that doesn’t need out of band “I’m using this dialect” information that has to match to prevent data loss.

thingsiplay@lemmy.ml · 6 days ago

JSON is easier to parse, smaller and lighter on resources. And that is important in the web. And if you take into account all the features XML has, plus the entities it gets big, slow and complicated. Most data does not need to be self descriptive document when transferring through web. Fundementally these languages are two different kind of languages: XML is a general markup language to write documents, while JSON is a generalized data structure with support for various data types supported by programming languages.

Kissaki@programming.dev · 6 days ago

while JSON is a generalized data structure with support for various data types supported by programming languages

Honestly, I find it surprising that you say “support for various data types supported by programming languages”. Data types are particularly weak in JSON when you go beyond JavaScript. Only number for numbers, no integer types, no date, no time, etc.

Regarding use, I see, at least to some degree, JSON outside of use for network transfer. For example, used for configuration files.

lehenry@lemmy.world · 6 days ago

While I understand the critic about XPath and XSL, the fact that we have proper tools to query and tranform XML instead of the messy wat of getting specific information from JSON is also one of tge strong point of XML.

deadbeef79000@lemmy.nz · 6 days ago

XSLT and XPath are entirely underrated. They are seriously powerful tools.

While you can approximate XSLT with a heap of coffee and a JSON parser it’s harder to keep it declarative.

Ephera@lemmy.ml · 6 days ago

There is JSONPath, at least: https://en.wikipedia.org/wiki/JSONPath

Kissaki@programming.dev · 6 days ago

Yeah, I wish I had something like XPath as consistently (in terms of availability and syntax) for JSON.

[object Object]@lemmy.world · 6 days ago

Has no one here heard of jq?

tyler@programming.dev · 6 days ago

You do? jsonAta and JSONPath both exist and are very good.

entwine@programming.dev · 5 days ago

I agree with everything this article said. A lot of software would work better if devs took the time to learn and appreciate XML. Many times I’ve found myself reinventing shit XML gives you for free.

…But at the same time, if I’m working on a developer-facing product of any kind, I know that choosing XML over JSON is going to turn a lot of people away.

Phoenixz@lemmy.ca · 6 days ago

I’m sure XML has its uses

I’m also sure that for 99% of the applications out there, XML is overkill and over complicated, making things slower and more error prone

Use JSON, and you’ll be fine. If you really really need XML then you probably already know why

TunaLobster@lemmy.world · 6 days ago

IMO, the best thing about YAML is the referencing. It’s super easy to reuse an object multiple times. Gives that same kind of parten child struct ability that programming languages have. Sure XML can do it, but it’s not in every parser. cough python built in parser cough But then YAML is also not a built in parser and doing DOM in things other than XML feels odd.

Feyd@programming.dev · 6 days ago

That capability is what enables billion laugh attacks, unfortunately, so not having it enabled in cases where there is external input possible is wise

A_norny_mousse@feddit.org · edit-2 5 days ago

I never understood why people would say JSON is superior, and why XML seemed to be getting rarer, but the author explains it:

XML was not abandoned because it was inadequate; it was abandoned because JavaScript won.

I’ve been using it ever since I started using Linux because my favorite window manager uses it, and because of a long-running pet project that is almost just as old: first I used XML tools to parse web pages, later I switched to dedicated data providers that offered both XML and JSON formats, and stuck to what I knew.

I’m guessing that another reason devs - especially web devs - prefer JSON over XML is that the latter uses more bytes to transport the same amount of raw data. One XML file will be somewhat larger than one JSON file with the same content. That advantage is of course dwarved by all the other media and helper scripts - nay, frameworks, devs use to develop websites.

BTW, XML is very readable with syntax highlighting and easily editable if your code editor has some very basic completion for it. And it has comments!

Kissaki@programming.dev · 6 days ago

The readability and obviousness of XML can not be overstated. JSON is simple and dense (within the limit of text). But look at JSON alone, and all you can do is hope for named fields. Outside of that, you depend on context knowledge and specific structure and naming context.

Whenever I start editing json config files I have to be careful about trailing commas, structure with opening and closing parens, placement and field naming. The best you can do is offer a default-filled config file that already has the full structure.

While XML does not solve all of it, it certainly is more descriptive and more structured, easing many of those pain points.

It’s interesting that web tech had XML in the early stages of AJAX, the dynamic web. But in the end, we sent JSON through XMLHttpRequest. JSON won.

tyler@programming.dev · 6 days ago

You are clearly one of those people that never had to deal with xml in a production system. Even with proper syntax highlighting, dealing with xml is a nightmare, whether it’s for configuration or data transmission. People switched to JSON because it’s better. Period. And that’s an incredibly low bar to set, because I don’t think JSON is that good either.

Like another person said, all of these features of XML doesn’t make it nicer, it makes it worse, because it means you have to be ready for any of those features even if they’re never used.

Feyd@programming.dev · 6 days ago

There are really good uses for XML. Mostly for making things similar to HTML. Like markup for Android UIs or XAML for WPF. For pretty much everything else the complexity only brings headaches

whotookkarl@lemmy.dbzer0.com · 6 days ago

deleted by creator

Colloidal@programming.dev · 6 days ago

ASN.1 crying in the corner.

arjen@piefed.social · 6 days ago

Preaching the choir I like to sing in.

I didn’t know the link to S-Expressions, ty.