Know more about Python XML Parser

Cloudytechi
3 min readOct 6, 2021

--

XML represents Extensible Markup Language and like HTML, it is additionally a markup language. In XML, be that as it may, we don’t utilize predefined labels yet here we can utilize our own custom labels dependent on the information we are putting away in the XML document.

An XML document is frequently used to share, store, and design information since it can undoubtedly be moved among servers and frameworks. We as a whole know with regards to information, Python is one of the most amazing programming languages to measure and parse.

Fortunately, Python accompanies a Standard XML module that can parse XML documents in Python and furthermore compose information in the XML record. This is called Python XML Parser.

In this Python XML Parser tutorial, we will stroll through the Python XML minidom and ElemetnTree modules, and figure out how to parse an XML document in Python.

Python XML minidom and ElementTree module

The Python XML module support two sub-modules minidom and ElementTreeto parse an XML record in Python.

The minidom or Minimal DOM module gives a DOM (Document Object Model) like construction to parse the XML record, which is like the DOM design of JavaScript.

Despite the fact that we can parse an XML record utilizing minidom, ElementTree gives a greatly improved Pythonic approach to parse an XML document in Python.

XML File

For every one of the models in this instructional exercise, we will utilize the demo.xmlfile, which contains the accompanying XML information:

#demo.xml

<item>
<record>
<name>Jameson</name>
<phone>(080) 78168241</phone>
<email>cursus.in.hendrerit@ipsumdolor.edu</email>
<country>South Africa</country>
</record>

<record>
<name>Colton</name>
<phone>(026) 53458662</phone>
<email>non@idmagna.ca</email>
<country>Libya</country>
</record>

<record>
<name>Dillon</name>
<phone>(051) 96790901</phone>
<email>Aliquam.ornare@Etiamlaoreetlibero.ca</email>
<country>Madagascar</country>
</record>

<record>
<name>Channing</name>
<phone>(014) 98829753</phone>
<email>faucibus.Morbi.vehicula@aliquamarcu.co.uk</email>
<country>Korea, South</country>
</record>
</item>

In the above example, you can see that the data is nested under custom <tags>. The root tag is <item>, which has <record> as a nested tag, which further has 4 more nested tags:

  1. <name>,
  2. <phone>,
  3. <email>, and
  4. <country>

Parse/Read XML Document in Python using minidom

minidom is the submodule of the Python standard XML module, which means you do not have to pip install XML to use minidom.

The minidom module parses the XML document in a Document Object Model(DOM), whose data can further be extracted using the getElemetsByTagName()function.

Syntax: To parse the XML document in Python using minidom

from xml.dom import minidom  minidom.parse("filename")

Example:

Let’s grab all the names and phone data from our demo.xml file.

from xml.dom import minidom


#parse xml file
file = minidom.parse('demo.xml')

#grab all <record> tags
records = file.getElementsByTagName("record")

print("Name------>Phone")

for record in records:
#access <name> and <phone> node of every record
name = record.getElementsByTagName("name")
phone = record.getElementsByTagName("phone")

#access data of name and phone
print(name[0].firstChild.data, end="----->")
print(phone[0].firstChild.data)

Output

Name------>Phone
Jameson----->(080) 78168241
Colton----->(026) 53458662
Dillon----->(051) 96790901
Channing----->(014) 98829753

Parse/Read XML Document in Python Using ElementTree

The ElementTree module gives a basic and direct approach to parse and peruse XML documents in Python. As minidom is the submodule of xml.dom, the ElementTree is the submodule of xml.etree.

The ElementTree module parses the XML document in a tree-like design where the root branch will be the first <tag> of the XML file(<item> for our situation).

Syntax:

import xml.etree.ElementTree as ET 

ET.parse('file_name.xml')

Example

Using minidom we grab the name and phone data, now let’s access email and country data using XML ElementTree.

import xml.etree.ElementTree as ET

tree = ET.parse('demo.xml')

#get root branch <item>
item = tree.getroot()


#loop through all <record> of <item>
for record in item.findall("record"):
email = record.find("email").text
country = record.find("country").text
print(f"Email: {email},-------->Country:{country}")

Output

Email: cursus.in.hendrerit@ipsumdolor.edu,-------->Country:South Africa
Email: non@idmagna.ca,-------->Country:Libya
Email: Aliquam.ornare@Etiamlaoreetlibero.ca,-------->Country:Madagascar
Email: faucibus.Morbi.vehicula@aliquamarcu.co.uk,-------->Country:Korea, South

Conclusion

That summarizes this tutorial on Python XML Parser. As should be obvious, Python gives an inbuild Standard XML module to peruse and parse XML records in Python. It by and large has 2 submodules that can parse an XML document:

  • minidom and
  • ElementTree

The minidom module follows the Document Object Model way to deal with parsing an XML record. Then again, the ElementTree module follows the tree-like construction to parse the XML record.

--

--

Cloudytechi
Cloudytechi

Written by Cloudytechi

A tech guy who is more enthusiastic of programming and love coding.

No responses yet