Which PHP libraries are the best to convert HTML to PDF, ePub, and PPT?

If you need to export your documents to pdf, ePub, and ppt (pptx), this article may come of handy. I have been in a project recently as a part a team working on an electronic educational web site, where educational materials were available online as articles, and also convertable from HTML to either a pdf document, an ePub document, or a Powerpoint pptx document. Then, it was required to render some articles into screenshots for presentations (mostly for PowerPoint). We have reviewed the libraries that were available, and came up with our choice of tools. And no, you don’t need expensive commercial solutions to do it!

1. The best PDF PHP Library? Phpwkhtmltopdf.

In my previous article from about a year ago, I have advocated the use of the mpdf library for HTML to PDF conversion. However, this time, we had full control of the server, so we could also use binaries with PHP libraries as bridges. So, there is actually a champion, that can beat mPDF. And that champion of html-to-pdf conversion is wkthmltopdf.

Wkhtmltopdf is webkit-based. The binaries can be downloaded from the wkhtmltopdf downloads section. Debian and Ubuntu have a handicapped version in their repositories, so you would have to download wkhtmltopdf from their site. Then, you also need to install additional libraries, like, xfonts-75dpi and xvfb. These can be installed from the repositories. These libraries are needed for the server-side rendering.

After the binaries have been installed, you should install the PHP library, called phpwkhtmltopdf, from the phpwkhtmltopdf Github repository. If you use Composer, all the better.

Strengths and weaknesses of phpwkhtmltopdf. Wkhtmltopdf with the phpwkhtmltopdf library provide a way superior way of rendering and converting html to pdf than any of the purely PHP libraries. However, due to the differences in format, 100% similarity of the documents is still not achievable in some cases.

2. The best ePub PHP Library? PHPePub.

There has also been a need for the client to export their html articles as ePub books. The articles themselves organized hierarchically, that hierarchy was to be implemented as chapters in ePub. The choice of ours is the PHPePub library. It can be downloaded from the PHPePub Github repository[https://github.com/Grandt/PHPePub]. Easily installable with Composer. Requires no additional binaries.

Strengths and weaknesses of PHPePub. PHPePub will create for you a neat valid ePub ebook. It will embed styles and images internally. Besides, PHPePub will support some ePub3 format. The support of ePub3 is partial, though - it will not download and embed videos for you. Besides, you need to know, that many ePub ebook readers will strip most of your styles, and supplement their own instead.

3. The best PPT (PowerPoint) PHP Library? PHPPresentation.

Now this HTML to PPT part is a bit tricky. We have reviewed another candidate initially - OpenOffice running headless was the first thing we tried. If you look it up on the web, OpenOffice is the #1 choice to convert to PowerPoint. There are dozens of ways to do that - start OpenOffice headless as a unix daemon, and call it with other libraries (like jodconverter), or, call OpenOffice with command line. We have decided not to use OpenOffice though, because we needed presentations to look close to the original. OpenOffice has failed to produce slides that looked anything like the original. We decided to make screenshots instead and embed them in the presentation.

Thus, we chose PHPPresentation. PHPPresentation is not a html-to-ppt converter, though, but it is a presentation creation library. It is good at creating presentations from the scratch.

PHPPresentation is a part of the PHPOffice open source PHP library set. The library can be downloaded from the PHPPresentation Github repository. Best use Composer to install it and it’s dependencies.

Strengths and weaknesses of PHPPresentation. PHPPresentation is very handy for exporting the data into the MS Office PowerPoint format. However, it is not usable to export html markup into slides.

4. Create screenshots of site content? PhantomJS.

When in presentation mode, the exported data, exported into PPT and PDF format, needed to look as close to the original website markup as possible. This was needed for presentations purposes, but was only partially achievable for PDF, and quite useless for PPT. The decision was made to capture screenshots of the content when in presentation mode, and insert these screenshots in PPT and PDF as slides.

After reviewing some of the options, we decided to use PhantomJS to render content and embed it as slides. PhantomJS uses the webkit engine to render content and runs with NodeJS. Binaries of the newer PhantomJS were not working correctly yet because of one bug with SSL support. However, there are some custom builds available that subvert the bug, one of which I used successfully. It is recommended to use the latest version of PhantomJS, because it has a better support of CSS3 and of embedded fonts.

After installing PhantomJS, you will need a PHP Library. The library is PHPPhantomJS. It can be downloaded from the PHPPhantomJS Github repository. As usual, better use Composer to install it. After the PHPPhantomJS library is installed, you will need to place links of the PhantomJS binaries in the library’s folder where required.

Strengths and weaknesses of PhantomJS. PhantomJS will require NodeJS to run on the server. it may also be somewhat tricky to install. However, it is the best tool currently available, to render html into an image. Simple as that.

What is your experience using these libraries? Would you choose another solution for the same end?