Saturday, 23 July 2016

Convert DOC to HTML with Images

We will be using OpenXML and OpenXmlPowerTools to convert Word document into HTML.

Step 1

Install Required Package

Install-Package DocumentFormat.OpenXml

Install-Package OpenXmlPowerTools

Add Reference

Right click in you Project in Solution Explorer
then Add >> Reference >> Select System.Drawing and WindowsBase

Follow the CODE Below

Fork me on GITHUB

https://github.com/niisar/WordToHTML

12 comments:

  1. Is OpenXml package is supports only docx File?

    ReplyDelete
    Replies
    1. Yes Buddie.. it supports later version of ms office. Find more detail hear https://en.wikipedia.org/wiki/Office_Open_XML#Application_support

      Delete
  2. Its is not working solution....getting error
    Object reference not set to an instance of an object.

    ReplyDelete
  3. Thanks, you have saved my day.

    ReplyDelete
  4. I am getting an Error "File access denied" which is referring to the images in the document.

    ReplyDelete
  5. This code is working fine but i am facing an issue while retaining the images width. After conversion from word to html the images taking entire width.

    Do you have any solution for this?

    ReplyDelete
  6. Unique post on DOCX - HTML Converter. Thanks for sharing with us…

    ReplyDelete
  7. Hi,

    I have used above method in my MVC application with the help of MS OpenXML documentation to convert my docx report into html file with images for preview purpose. I have implemented above logic to embed images into html. But its still its not showing the images in html. Path of image file and everything is fine but after opening the html in browser, it just displays image box without image. I tried checking its path by inspecting it in console window. It is giving error in console window. Console window shows error: Failed to load resource: the server responded with a status of 404 ()

    ReplyDelete
  8. Hello,
    I have a problem with the code and more specifically with this line:
    Dim html As XElement = HtmlConverter.ConvertToHtml(wDoc:=doc, htmlConverterSettings:=settings)

    Every time I get the error: System.ArgumentNullException : 'La valeur ne peut pas être null.
    Nom du paramètre : part'

    I can't find any solution, can you help me please? Thank you in advance.
    Best regards,

    Claude

    ReplyDelete
  9. I am getting not supported exception

    ReplyDelete
  10. Thank you for the code.

    I was able to arrive and parse the doc well, but I couldn't retrieve the images, I didn't know how to build the image handler.

    Your example worked first time, then I picked the handler and put it on my code saving the images on the db....it worked like a charm

    ReplyDelete