Asynchronously Read Files in Python with Gio
Most people will learn how to read and write to files in Python using the built-in file objects. That works great for simple read/write operations on the local filesystem. However, if you need a more advanced I/O library, take a look at Gio.
Gio provides an abstract I/O API without having to know what the underlying filesystem is. In other words, you can read files from various sources including http, ftp, ssh, etc. The Gio library has support for asynchronous operations and access to a ton of other useful information (mime types, themed icon names, etc.).
Here is a closer look at asynchronously reading files in Python using Gio.
A Quick Note about GObject Introspection
These examples are based on PyGObject in which the python code is accessing the underlying C libraries through a little bit of gobject introspection magic. Therefore the API documentation is for C but it's pretty easy to understand how to translate the C documentation to Python.
The GioFile Object
The Gio.File object is the primary file and directory interface you work with when using the Gio library. You can use
Gio.File objects to get all sorts of interesting information about a file and perform various operations on the file object. They can be created from a URI, a local filesystem path, and command line arguments. I'll be using the URI to create Gio.File objects.
Creating Gio.File Objects
from gi.repository giofile = Gio.File.new_for_uri('http://www.micahcarrick.com') giofile = Gio.File.new_for_path('/home/micah/') giofile = Gio.File.new_for_commandline_arg(sys.argv)
Simple Asynchronous Read
The easiest way to asynchronously read a file is to use the load_contents_async()method. Since the file is read asynchronously, a callback function is defined to be called when the file is finished loading. The arguments passed to this callback function are defined by GAsyncReadyCallback.
In the callback function, a call to load_contents_finish() finishes the asynchronous read and returns a 3-tuple containing a boolean if the operation was a success, the contents from the read, and the file's etag.
In the example below, a GLib main event loop is used since this program doesn't have a GUI. It is necessary to have a main event loop otherwise the program would likely get to the end and terminate before the asynchronous read has completed.
#!/usr/bin/env python from gi.repository import GLib, Gio class GioAsyncReadExample(object): def read(self, uri): print "Reading %s" % uri giofile = Gio.File.new_for_uri(uri) giofile.load_contents_async(None, self._load_contents_cb, None) def _load_contents_cb(self, giofile, result, user_data=None): success, contents, etag = giofile.load_contents_finish(result) print "Finished reading %d bytes" % len(contents) loop.quit() if __name__ == "__main__": loop = GLib.MainLoop() reader = GioAsyncReadExample() reader.read("http://www.w3.org/TR/1999/REC-html401-19991224/html40.txt") print "Entering GLib main loop." loop.run()
This program will asynchronously read the contents of the HTML 4.01 Specification from the W3C website, display how many bytes it read, and then break out of the main event loop to terminate the appliaction. The output looks like this:
[micah@octopus Desktop]$ ./load_contents_async.py Reading http://www.w3.org/TR/1999/REC-html401-19991224/html40.txt Entering GLib main loop Finished reading 792284 bytes
There are plenty of things that can go wrong when reading a file. Does it exist? What if it is deleted in the middle of reading it? What if the internet goes down during a remote file read? What if you don't have permission to read it? The list goes on and on.
You can catch a GLib.GError exception when calling load_contents_finish() to handle the errors.
try: success, contents, etag = giofile.load_contents_finish(result) print "Finished reading %d bytes" % len(contents) except GLib.GError as error: print str(error)
You can handle specific errors by checking for a specific 'Gio.IOErrorEnum' constant.
try: success, contents, etag = giofile.load_contents_finish(result) print "Finished reading %d bytes" % len(contents) except GLib.GError as error: if error.code == Gio.IOErrorEnum.PERMISSION_DENIED: print "Sorry dude, you're not allowed to read this file." elif error.code == Gio.IOErrorEnum.IS_DIRECTORY: print "C'mon man, you can't read a directory." else: print str(error)
Cancelling Asynchronous Operations
The first argument passed to load_contents_async() is a Gio.Cancellable object which provides the mechanism to cancel an asynchronous operation. In the simple example, None was passed and therefore there was no way to cancel the operation. However, in a GUI application in which these operations may take a long time (eg. a huge file over the network). Providing the end-user with a means to cancel the operation is one of the benefits of using asynchronous I/O.
A Gio.Cancellable object can be re-used for all of an application's asynchronous operations. It's reset() method is called before it is passed to an async function to make sure it's not already flagged as being cancelled. A button or key press within the user interface could call the cancel method of the Gio.Cancellable object at any time during the asynchronous operation.
The call to load_contents_finish() in the callback function would handle the cancellation of an async operation by catching the GLib.Error exception with the Gio.IOErrorEnum.CANCELLED code.
#!/usr/bin/env python from gi.repository import GLib, Gio class GioAsyncReadExample(object): def __init__(self): self._cancellable = Gio.Cancellable() def cancel(self): # This could be connected to the "clicked" signal of a Gtk.Button print "Canceling asynchronous operation..." self._cancellable.cancel() def read(self, uri): print "Reading %s" % uri giofile = Gio.File.new_for_uri(uri) self._cancellable.reset() giofile.load_contents_async(self._cancellable, self._load_contents_cb, None) def _load_contents_cb(self, giofile, result, user_data=None): try: success, contents, etag = giofile.load_contents_finish(result) print "Finished reading %d bytes" % len(contents) except GLib.GError as error: if error.code == Gio.IOErrorEnum.CANCELLED: print "Alright then, we've aborted the read operation." else: print str(error) loop.quit() if __name__ == "__main__": loop = GLib.MainLoop() reader = GioAsyncReadExample() reader.read("http://www.w3.org/TR/1999/REC-html401-19991224/html40.txt") reader.cancel() # cancel the read in progress print "Entering GLib main loop" loop.run()
The contents returned from these read operations are not converted to any particular character encoding for you. When working with GTK+ most of the calls will be expecting UTF-8 text. Python's decode, encode, and unicode can be used to convert to and from various character sets.
try: decoded = contents.decode("UTF-8") except UnicodeDecodeError: print "Error: Unknown character encoding. Expecting UTF-8"
Auto-detecting character encoding is way beyond the scope of this post (and is technically impossible), but, you could take a look at the Python Unicode HOWTO to get started. There are some libraries that do some advanced character encoding detection. A "poor-man's" approach is to simply try to decode using some common encodings.
encodings = ['UTF-8', 'ISO-8859-15'] document_encoding = None for encoding in encodings: try: decoded = contents.decode(encoding) document_encoding = encoding print "Auto-detected encoding as %s" % encoding break except UnicodeDecodeError: pass if not document_encoding: print "Unknown character encoding"
Putting It All Together
These concepts can be put together to build a simple GTK+ application which can asynchronously read files, including remote files, into a Gtk.TextView. A button allows the user to cancel the operation if it is taking too long. The character encoding of the document is detected (guessed) and encoded to UTF-8 as requried by the Gtk.TextView.
A Gtk.MessageDialog displays any errors that come back from the Gio library.
If you run this application in a terminal and open http://www.w3.org/TR/1999/REC-html401-19991224/html40.txt you will see that the "detected" character encoding is is ISO-8859-15. If you then load my website http://www.micahcarrick.com the character encoding will be detected as UTF-8.
#!/usr/bin/env python from gi.repository import GLib, Gio, Pango, Gtk class GioGtkExample(object): def __init__(self): self._cancellable = Gio.Cancellable() self._create_window() def run(self): """ Show window and enter GTK+ main event loop """ self.window.show() Gtk.main() def _reset(self): """ Reset the UI to the ready state """ self._cancellable.reset() self._entry.set_sensitive(True) self._open_button.set_sensitive(True) self._cancel_button.set_sensitive(False) def read(self, uri): """ Read the specified URI into the Gtk.TextView """ # toggle widgets sensitivity self._entry.set_sensitive(False) self._open_button.set_sensitive(False) self._cancel_button.set_sensitive(True) # clear text view self._view.get_buffer().set_text("") # begin read operation giofile = Gio.File.new_for_uri(uri) giofile.load_contents_async(self._cancellable, self._load_contents_cb, None) def _load_contents_cb(self, giofile, result, user_data=None): """ Callback for Gio.File.load_contents_async() """ try: success, contents, etag = giofile.load_contents_finish(result) except GLib.GError as error: if error.code != Gio.IOErrorEnum.CANCELLED: # only show error dialog if NOT cancelled self.error_dialog(str(error)) self._reset() return encodings = ['UTF-8', 'ISO-8859-15'] document_encoding = None for encoding in encodings: try: decoded = contents.decode(encoding) document_encoding = encoding print "Auto-detected encoding as %s" % encoding # if your application is going to save the file later then it # should remember the encoding here so that it can convert it # back to the original encoding before writing. break except UnicodeDecodeError: pass if document_encoding: # GTK+ always wants UTF-8 text self._view.get_buffer().set_text(decoded.encode('UTF-8')) else: self.error_dialog("Wrong character encoding. Expected UTF-8.") self._reset() def _create_window(self): """ Create main application window and widgets """ # the entry allows user to enter a URI self._entry = Gtk.Entry() # the open button begins the async read self._open_button = Gtk.Button.new_from_stock(Gtk.STOCK_OPEN) self._open_button.connect("clicked", lambda b: self.read(self._entry.get_text())) # the cancel button calls cancel() on the Gio.Cancellable self._cancel_button = Gtk.Button.new_from_stock(Gtk.STOCK_CANCEL) self._cancel_button.set_sensitive(False) self._cancel_button.connect("clicked", lambda b: self._cancellable.cancel()) hbox = Gtk.HBox() hbox.pack_start(Gtk.Label("URI:"), False, True, 2) hbox.pack_start(self._entry, True, True, 2) hbox.pack_start(self._open_button, False, True, 2) hbox.pack_start(self._cancel_button, False, True, 2) self._view = Gtk.TextView() font_desc = Pango.FontDescription("Monospace 10") self._view.modify_font(font_desc) sw = Gtk.ScrolledWindow() sw.set_shadow_type(Gtk.ShadowType.IN) sw.add(self._view) vbox = Gtk.VBox() vbox.pack_start(hbox, False, True, 2) vbox.pack_start(sw, True, True, 2) vbox.show_all() self.window = Gtk.Window() self.window.set_default_size(400, 300) self.window.set_border_width(2) self.window.set_title("Gio Async Read Example") self.window.connect("destroy", lambda w: Gtk.main_quit()) self.window.add(vbox) # always show images on buttons Gtk.Settings.get_default().set_property("gtk-button-images", True); def error_dialog(self, message): """ Display a simple error dialog """ dialog = Gtk.MessageDialog(self.window, Gtk.DialogFlags.MODAL | Gtk.DialogFlags.DESTROY_WITH_PARENT, Gtk.MessageType.ERROR, Gtk.ButtonsType.OK, message) dialog.set_title("Error") dialog.run() dialog.destroy() if __name__ == "__main__": app = GioGtkExample() app.run()
What About Asynchronously Writing Files?
The concepts for asynchronously writing files is pretty much the same. In fact, the concept for pretty much all of the asynchronous file operations are the same. See the replace_async() and append_to_async() methods to asynchronously write a file.