Metadata extraction with Apache Tika
The most important part of the above code example is using the JpegParser to parse the .JPG file and the creation of the Metadata object with the appropriate information.
Read full article from Metadata extraction with Apache Tika
Tika defines a standard API and makes use of existing libraries like POI and PDFBox for it's content extraction. While writing this post the current release of Tika is version 0.6 and the following file formats are already supported:
- HyperText Markup Language
- XML and derived formats
- Microsoft Office document formats
- OpenDocument Format
- Portable Document Format
- Electronic Publication Format
- Rich Text Format
- Compression and packaging formats
- Text formats
- Audio formats
- Image formats
- Video formats
- Java class files and archives
- The mbox format
The most important part of the above code example is using the JpegParser to parse the .JPG file and the creation of the Metadata object with the appropriate information.
Of course in the above test case I only test for the current Camera Model, but the Metadata object holds much more information then just that. Viewing all the fields found in the metadata of the image can be achieved quite easily by using for instance the following method.
private void listAvailableMetaDataFields(final Metadata metadata) {
for(int i = 0; i <metadata.names().length; i++) {
String name = metadata.names()[i];
System.out.println(name + " : " + metadata.get(name));
}
}
No comments:
Post a Comment