Releases: iipc/webarchive-commons
Releases · iipc/webarchive-commons
New features
- MetaData is now multivalued to support repeated WARC and HTTP headers. #98
Dependency upgrades
- commons-io 2.18.0
- commons-lang 2.6
- guava 33.3.1-jre
- hadoop 3.4.1
- htmlparser 2.1
- httpcore 4.4.16
- json 20240303
- junit 4.13.2
Bug fixes
- Fixed URLParser and WaybackURLKeyMaker failing on URLs with IPv6 address hostnames #100
- WAT extractor: do not fail on missing WARC-Filename in warcinfo record
- ExtractingParseObserver: extract rel, hreflang and type attributes
- ExtractingParseObserver: extract links from onClick attributes
Dependency Upgrades
- commons-collections 3.2.2
- commons-io 2.14.0
- dsiutils 2.2.8
- guava 33.3.0-jre
- hadoop 3.4.0 (now optional)
- pig 0.17.0
- org.json 20231013
Dependency Removals
- joda-time (was unused)
webarchive-commons-1.1.9 (2019-05-07)
Closed issues:
- CompressedWARCReader does not work for Common Crawl WARC files. #81
- Fixing bad dates in WARC file #80
- upgrade to commons-collections.jar 3.2.2 #76
Merged pull requests: