It is fairly easy to utilize the Apache POI API in Eclipse.
Download the most current version of POI from Apache and include the jars from the following directories in the Java Build Path in an Eclipse project's properties. "./" is the root directory of the downloaded API.
- ./lib
- ./ooxml-lib
- ./
After you have included all external libraries in the Java Build Path of your project, you can start using the API.
Reference an existing Excel file with POI
The below code snippet shows how an existing Excel file can be referenced with POI. Since there are different classes for different file types, a WorkbookFactory should be used to instantiate objects from corresponding classes. The factory will determine, if one of the following classes should be used.
- HSSFWorkbook for .xls (Excel 2003 or earlier versions)
- XSSFWorkbook .xlsx (Excel 2007 and later)
You however should not use one of the above classes. Instead you should use the interface Workbook to reference instances of one of the above classes.
Reference a sheet in an Excel Workbook
Referencing a sheet within a workbook works accordingly. Since the WorkbookFactory returned the correct implementation of the interface Workbook the rest is easy.
Use the method getSheet(String sheetName) from the interface Workbook to get an instantiation of the interface Sheet. A returned object will of course be of a class corresponding the Excel file type referenced by an instance of the interface Workbook.
Be careful with the imports though as there is another Sheet interface in a package related to PowerPoint sheets.
Determine last row in an Excel sheet
As Microsoft Excel VBA developers already will know, it can be quite tricky to determine the last row with data in an Excel sheet. Below is a method, which will work according to the Apache API docs.
However, I did not test it properly yet, it will give you a hint where to look for a solution. Apparently the method getLastRowNum() has to be considered unsafe whenever it returns zero. Then the method getPhysicalNumberOfRows() can return a more reliable result.
Be careful though, as this might still be not what you want. The last row might not match the last row with data but point to empty rows with formatting according to Microsoft Excel's UsedRange object.
I will test this properly and finish this article with some updates on things like this as they are good to know when working with POI.
POI Component APIs
Apache POI delivers several APIs for different document types. Here is a list from their homepage, which will give you a hint on where to look for APIs for different Microsoft document types.
- Excel (SS=HSSF+XSSF)
- Word (HWPF+XWPF)
- PowerPoint (HSLF+XSLF)
- OpenXML4J (OOXML)
- OLE2 Filesystem (POIFS)
- OLE2 Document Props (HPSF)
- Outlook (HSMF)
- Visio (HDGF)
- TNEF (HMEF)
- Publisher (HPBF)
Articles
Below you can find some resources regarding POI on my website.
Find last row in Excel file using POI
Reading and writing MS Excel files using POI
References
How to read from an Excel file using POI
http://www.avajava.com/tutorials/lessons/how-do-i-read-from-an-excel-file-using-poi.html
POI API documentation
http://poi.apache.org/apidocs/index.html
Apache POI homepage