Parsing and manipulating data

The most common requirement for plug-ins is the ability to retrieve data, manipulate that data to extract meaningful content from it, and return that content to either a client or the media server itself. Because data available online comes in a variety of formats, the framework includes a number of data parsing libraries capable of handling the most common types of data format and provides simple APIs for converting between these formats and regular Python objects.

XML

The XML API provides methods for converting between XML-formatted strings and trees of XML Element objects. New XML element trees can also be constructed. The underlying functionality is provided by the lxml etree and objectify libraries.

Note

It is strongly recommended that developers read lxml’s XPath Tutorial. Manipulating elements returned by the etree library using XPath is a very powerful way of finding and accessing data within a XML document. Learning to use XPath efficiently will greatly simplify the plug-in’s code.

XML.Element(name, text=None, **kwargs)

Returns a new XML element with the given name and text content. Any keyword arguments provided will be set as attributes.

Parameters:
  • name (str) – The name of the new element.
  • text (str) – The text content of the new element.
Return type:

_Element

XML.StringFromElement(el, encoding='utf8')

Converts the XML element object el to a string representation using the given encoding.

XML.ElementFromString(string, encoding=None)

Converts string to a HTML element object.

Return type:_Element
XML.ElementFromURL(url, values=None, headers={}, cacheTime=None, encoding=None, errors=None, timeout=<object object at 0x10042bc40>, sleep=0)

Retrieves the content for a given HTTP request and parses it as XML using the above method.

Parameters:
  • url (str) – The URL to retrieve content from.
  • values (dict) – Values to pass as URL encoded content for a POST request.
  • headers (dict) – Custom HTTP headers to add to the request.
  • cacheTime (float) – The maximum age (in seconds) that cached data should still be considered valid.
  • timeout (float) – The maximum amount of time (in seconds) that the framework should wait for a response before aborting.
  • sleep (float) – The number of seconds the current thread should pause for if a network request was made, ensuring undue burden isn’t placed on web servers. If cached data was used, this value is ignored.
XML.StringFromObject(obj, encoding='utf-8')

Attempts to create objectified XML from the given object.

XML.ObjectFromString(string)

Parses string as XML-formatted content and attempts to build a Python object using the objectify library.

Return type:ObjectifiedElement
XML.ObjectFromURL(url, values=None, headers={}, cacheTime=None, autoUpdate=False, encoding=None, errors=None, timeout=<object object at 0x10042bc40>, sleep=0)

Retrieves the content for a given HTTP request and parses it as objectified XML using the above method.

Parameters:
  • url (str) – The URL to retrieve content from.
  • values (dict) – Values to pass as URL encoded content for a POST request.
  • headers (dict) – Custom HTTP headers to add to the request.
  • cacheTime (float) – The maximum age (in seconds) that cached data should still be considered valid.
  • timeout (float) – The maximum amount of time (in seconds) that the framework should wait for a response before aborting.
  • sleep (float) – The number of seconds the current thread should pause for if a network request was made, ensuring undue burden isn’t placed on web servers. If cached data was used, this value is ignored.

HTML

The HTML API is similar to the XML API, but is better suited to parsing HTML content. It is powered by the lxml html library.

HTML.Element(name, text=None, **kwargs)

Returns a new HTML element with the given name and text content. Any keyword arguments provided will be set as attributes.

Parameters:
  • name (str) – The name of the new element.
  • text (str) – The text content of the new element.
Return type:

HtmlElement

HTML.StringFromElement(el, encoding='utf8')

Converts the HTML element object el to a string representation using the given encoding.

HTML.ElementFromString(string)

Converts string to a HTML element object.

Return type:HtmlElement
HTML.ElementFromURL(url, values=None, headers={}, cacheTime=None, encoding=None, errors=None, timeout=<object object at 0x10042bc40>, sleep=0)

Retrieves the content for a given HTTP request and parses it as HTML using the above method.

Parameters:
  • url (str) – The URL to retrieve content from.
  • values (dict) – Values to pass as URL encoded content for a POST request.
  • headers (dict) – Custom HTTP headers to add to the request.
  • cacheTime (float) – The maximum age (in seconds) that cached data should still be considered valid.
  • timeout (float) – The maximum amount of time (in seconds) that the framework should wait for a response before aborting.
  • sleep (float) – The number of seconds the current thread should pause for if a network request was made, ensuring undue burden isn’t placed on web servers. If cached data was used, this value is ignored.

JSON

The JSON API provides methods for easily converting JSON-formatted strings into Python objects, and vice versa.

More information about the JSON format can be found here.

Note

The framework includes two JSON parsers - one is fast, but very strict, while the second is slower but more tolerant of errors. If a string is unable to be parsed by the fast parser, an error will be logged indicating the position in the string where parsing failed. If possible, the developer should check for and resolve these errors, as slow JSON parsing can have a severely detrimental effect on performance, especially on embedded systems.

JSON.StringFromObject(obj)

Converts the given object to a JSON-formatted string representation.

JSON.ObjectFromString(string, encoding=None)

Converts a JSON-formatted string into a Python object, usually a dictionary.

JSON.ObjectFromURL(url, values=None, headers={}, cacheTime=None, encoding=None, errors=None, timeout=<object object at 0x10042bc40>, sleep=0)

Retrieves the content for a given HTTP request and parses it as JSON-formatted content using the above method.

Parameters:
  • url (str) – The URL to retrieve content from.
  • values (dict) – Values to pass as URL encoded content for a POST request.
  • headers (dict) – Custom HTTP headers to add to the request.
  • cacheTime (float) – The maximum age (in seconds) that cached data should still be considered valid.
  • timeout (float) – The maximum amount of time (in seconds) that the framework should wait for a response before aborting.
  • sleep (float) – The number of seconds the current thread should pause for if a network request was made, ensuring undue burden isn’t placed on web servers. If cached data was used, this value is ignored.

YAML

YAML.ObjectFromString(string)

Parses the given YAML-formatted string and returns the object it represents.

YAML.ObjectFromURL(url, values=None, headers={}, cacheTime=None, encoding=None, errors=None, timeout=<object object at 0x10042bc40>, sleep=0)

Retrieves the content for a given HTTP request and parses it as YAML-formatted content using the above method.

Parameters:
  • url (str) – The URL to retrieve content from.
  • values (dict) – Values to pass as URL encoded content for a POST request.
  • headers (dict) – Custom HTTP headers to add to the request.
  • cacheTime (float) – The maximum age (in seconds) that cached data should still be considered valid.
  • timeout (float) – The maximum amount of time (in seconds) that the framework should wait for a response before aborting.
  • sleep (float) – The number of seconds the current thread should pause for if a network request was made, ensuring undue burden isn’t placed on web servers. If cached data was used, this value is ignored.

RSS

The RSS API provides methods for parsing content from RSS, RDF and ATOM feeds. The framework includes the excellent Universal Feed Parser library to achieve this functionality.

For more information about the objects returned by the feed parser, please consult the documentation here.

RSS.FeedFromString(string)

Parses the given string as an RSS, RDF or ATOM feed (automatically detected).

RSS.FeedFromURL(url, values=None, headers={}, cacheTime=None, autoUpdate=False, encoding=None, errors=None, timeout=<object object at 0x10042bc40>, sleep=0)

Retrieves the content for a given HTTP request and parses it as an RSS, RDF or ATOM feed using the above method.

Parameters:
  • url (str) – The URL to retrieve content from.
  • values (dict) – Values to pass as URL encoded content for a POST request.
  • headers (dict) – Custom HTTP headers to add to the request.
  • cacheTime (float) – The maximum age (in seconds) that cached data should still be considered valid.
  • timeout (float) – The maximum amount of time (in seconds) that the framework should wait for a response before aborting.
  • sleep (float) – The number of seconds the current thread should pause for if a network request was made, ensuring undue burden isn’t placed on web servers. If cached data was used, this value is ignored.

Plist

The Plist API greatly simplifies handling content in Apple’s XML-based property list format. Using these methods, data can easily be converted between property lists and regular Python objects. The top-level object of a property list is usually a dictionary.

More information about the property list format can be found here.

Plist.StringFromObject(obj)

Converts a given object to a Plist-formatted string representation.

Plist.ObjectFromString(string)

Returns an object representing the given Plist-formatted string.

Plist.ObjectFromURL(url, values=None, headers={}, cacheTime=None, encoding=None, errors=None, timeout=<object object at 0x10042bc40>, sleep=0)

Retrieves the content for a given HTTP request and parses it as a Plist using the above method.

Parameters:
  • url (str) – The URL to retrieve content from.
  • values (dict) – Values to pass as URL encoded content for a POST request.
  • headers (dict) – Custom HTTP headers to add to the request.
  • cacheTime (float) – The maximum age (in seconds) that cached data should still be considered valid.
  • timeout (float) – The maximum amount of time (in seconds) that the framework should wait for a response before aborting.
  • sleep (float) – The number of seconds the current thread should pause for if a network request was made, ensuring undue burden isn’t placed on web servers. If cached data was used, this value is ignored.