xelem.3.0

Package nl.fountain.xelem.lex

Provides classes that enable reading SpreadsheetML.

See:
          Description

Interface Summary
ExcelReaderFilter Recieve notification of parsing events and the construction of XLElements and transmit these events, values and instances to the listeners registered with this ExcelReaderFilter.
ExcelReaderListener Recieve notification of parsing events and the construction of XLElements.
 

Class Summary
DefaultExcelReaderFilter Passes all events, values and instances unfiltered.
DefaultExcelReaderListener Does effectively nothing.
ExcelReader Basic class for reading xml-spreadsheets of type spreadsheetML.
WorkbookListener A concrete implementation of ExcelReaderListener.
 

Package nl.fountain.xelem.lex Description

Provides classes that enable reading SpreadsheetML.

The basic class for reading xml-spreadsheets of type spreadsheetML is ExcelReader. It can deliver the contents of an xml-file or an xml-InputSource as a fully populated Workbook. This package also offers an event-based model with the ExcelReaderListener-interface which can be fine-tuned with an ExcelReaderFilter that acts as a wrapper for listeners.

 

Reading SpreadsheetML - a simple case

Suppose you want to do something. Well rather, suppose you have an Excel-file and you want to do something with it's contents and structure in Java. Here's what you got to do:
   ExcelReader reader = new ExcelReader();
   Workbook xlWorkbook = reader.getWorkbook("foo/bar/myFile.xml");
Now what you have is a fully populated XLWorkbook, that is, the (almost) complete representation of the Excel xml-file myFile.xml in Java class-instances. Now you can inspect it's structure, process it's contents, alter it or send it to the president.

OOoops! You got an Exception in thread "main" java.lang.OutOfMemoryError: Java heap space , did ya? The thing is that the above code works fine as long as your workbooks aren't too big. Every cell, row, worksheet and so on in the file that you're reading is stuffed as an object into the one encompassing object XLWorkbook. How much will go in there depends -among others- on your system and VM-settings. So what if you want to read large files and huge input sources?

Strategies for circumnavigating memory problems

There are several strategies to overcome the memory problems just mentioned:
  1. Cache memory-consuming XLElements in such a way that they can be found easily and loaded back on to dynamic memory at the time we need them. The Row-element would be a likely candidate for caching.
  2. Partial reading. Read only those parts of the workbook that we're interested in.
  3. With an event-based interface we could process essential parts of the workbook while reading. We would discard those parts as soon as we're finished processing and not have the need to keep them in memory.
The xelem.lex API realises strategies 2. and 3. What's more, the two strategies can be mixed and intermingled, procuring a flexible API that meets your every need.
[Mark that the event-based API will be faster. When for instance your object is to process rows from a workbook, with the event-based model you only loop through all of the rows of the workbook once. Building an XLWorkbook first and then iterate through all of it's rows would take two loops.]

Partial reading - setting the read area

When we know what part of the workbook we're interested in we can restrict reading to that particular part. More precise, we can tell ExcelReader which Area of the worksheets to read.
   ExcelReader reader = new ExcelReader();
   reader.setReadArea(new Area("E11:M16"));
   Workbook xlWorkbook = reader.getWorkbook("foo/bar/myFile.xml");
The XLWorkbook that is returned will contain SSWorksheets and SSTables containing only those cells, rows and columns that were found in the specified area of the succesive worksheets. Apart from cells, rows, columns, tables and worksheets, all other XLElements will be present and fully populated. Mark that the above code will treat all the worksheets in the workbook in the same way. If we want to set different read areas or clear the read area for individual worksheets we need other code. We'll come to that. First let's take a look at the event-based API.

The event-based model

We can register an ExcelReaderListener on the ExcelReader and recieve notification of parsing events and the construction of XLElements while reading. Here's a code-stub that enables us to process rows. We'll use an anonymous implementation of the DefaultExcelReaderListener.
   ExcelReader reader = new ExcelReader();
   reader.addExcelReaderListener(new DefaultExcelReaderListener() {
       public void setRow(int sheetIndex, String sheetName, Row row) {
           // process row and/or it's cells
           // ...
       }
   });
   reader.read("foo/bar/myFile.xml");
Of course we could do the same with a non-anonymous listener. And we could be listening to events other then the setRow-event as well.

When we're only interested in particalur parts of the worksheets we might set the read area:

   ExcelReader reader = new ExcelReader();
   reader.setReadArea(new Area("E11:M16"));
   reader.addExcelReaderListener(new DefaultExcelReaderListener() {
       public void setRow(int sheetIndex, String sheetName, Row row) {
           // process row and/or it's cells
           // ...
       }
   });
   reader.read("foo/bar/myFile.xml");
Now we will only be notified when the row is within the specified area and the row only contains cells that were found within that area.

Swapping read areas

Up till this far we have used the setReadArea-method to set one area for all the worksheets we were reading. Maybe we want to read different areas on different worksheets. Using the event-model we can set different read areas when different worksheets come along. (Or -for that matter- trigger what action so ever on any event.) Here's a SwappingAreaListener:
class SwappingAreaListener extends DefaultExcelReaderListener {
    
   private ExcelReader reader;

   public SwappingAreaListener(ExcelReader reader) {
      this.reader = reader;
   }
   
   // override method in DefaultExcelReaderListener
   public void startWorksheet(int sheetIndex, Worksheet sheet) {
      switch (sheetIndex) {
         case 1:
            reader.setReadArea(new Area("A1:C6"));
            break;
         case 2:
            reader.setReadArea(new Area("G11:G11"));
            break;
         default:
            reader.clearReadArea();
     }        
   }  
}
The SwappingAreaListener tells ExcelReader to only read particular areas on sheets 1 and 2 and to read the rest of the sheets unrestricted. Mind that the sheetIndex is 0-based. Here's how SwappingAreaListener might be fit in in code that tells ExcelReader to get a workbook:
   ExcelReader reader = new ExcelReader();
   SwappingAreaListener sal = new SwappingAreaListener(reader);
   reader.addExcelReaderListener(sal);
   Workbook xlWorkbook = reader.getWorkbook("foo/bar/myFile.xml");
The returned XLWorkbook will have it's sheets populated according to the restrictions imposed by SwappingAreaListener. Internally ExcelReader uses a WorkbookListener to build the workbook and events are being dispatched to both listeners. WorkbookListener uses these events to build and populate the workbook, SwappingAreaListener at the same time imposes restrictions. At the end of the read that is performed by the getWorkbook-method ExcelReader removes the workbooklistener and then returns the completed XLWorkbook. If you want to use the same ExcelReader to perform more reads that must not be restricted, do not forget to remove the SwappingAreaListener.

Using filters

An ExcelReaderFilter is an ExcelReaderListener that can register other listeners: it wraps other listeners. It's a sort of chain. The default implementations of these interfaces DefaultExcelReaderListener and DefaultExcelReaderFilter have contrary behaviour. Events send to DefaultExcelReaderListener drop dead unless it's subclass overrides the setXXX- or startXXX-method. Events send to DefaultExcelReaderFilter are propagated unfiltered to the registered listeners unless it's subclass overrides the method and takes appropriate action.

Filters and listeners can be configured in a variety of ways. For instance, we use an EmptyRowFilter to count the number of empty rows on a worksheet and abandon listening for further events on that sheet at a certain emptyRowCount. I'm sure your imagination, inventiveness and creativity go beyond that.


xelem.3.0