
Download HTML string from each section.Get the URI of each section from the HTML hyperlink.In the page content, the title of each chapter, which is so easy with jQuery-style API: indexPage.Children("ol").Children("li").Get the article content of the index page (get rid of HTML page header, footer, sidebar, article comments …): indexPage.In the downloaded HTML string, get the title of the tutorial from the tag of the downloaded HTML string: indexPage.Text().Download HTML string from index page:, which is easy by just calling WebClient.DownloadString.The first steps are to download everything from this blog: Download index page HTML and all contents via CsQuery It is a jQuery-like library for DOM process via C#.
#Download microsoft open xml converter .dll#
VSTO (Visual Studio Tools for Office): .dll from VSTO provides APIs to directly automate Word application itself to build a document.Īfter searching around, I found CsQuery library, which is available from Nuget: Install-Package CsQuery.Open XML SDK: Open XML is a lower level API to build the Word document.C#: it is easier to use C# to implement the conversion to Word document.Node.js: It is easy to use JavaScript to process downloaded HTML DOM.There might be several possible solutions, e.g.: Merge all contents as one well formatted document, with:.Download the content of each chapter/section.Interpret the index page and get the title/URI of each chapter and its sections.Download the content of index page of the entire tutorial.Recently I wanted to convert my LINQ via C# tutorial into a Word document (.doc). C# C# 6.0 HTML XML VSTO Open XML OneDrive Office LINQ LINQ to Objects
