Free Code Snippets in C#, Net Framework, Office 365, ASP.Net,WPF, Visual Studio, SQL Server, Antivirus free
#

WPF HTML: Read Website with Webrequest

 

This example shows a WPF application with C # code, which reads a web page and reads out the details contained therein.

A WebRequest, a HttpWebresponse is used and the HTML website is evaluated with the HTMLAgilityPack.

 

Preparation:

To work with HTML documents in WPF you should download and install the HtmlAgilityPack from ZZZ Projects.

See Reference-> Manage NuGet Packages

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Then there is an entry in the References

HtmlAgilityPack

 

Basis: HTML document

 

All websites on the internet are based on an HTML document.

You can see this if you can use F12 or the right mouse button to display a position in the webpage with Exams in the browser.

In this example, all lines are to be read as results of a web page list leading to a detailed sentence.

 

All elements are processed as nodes in html document.

 

The div HTML element with the class = job details must be found and below that the matching link to the subweb page.

<div class="job-details">

 

The link is kept with a href as the Desendant element.

<a href="/de/job/eDiscovery-Helpdesk-Supervisor-Washington-D-C--1/">eDiscovery Helpdesk Supervisor - Washington, D.C. </a>

 

 

 

 

Application:

If you start the WPF program with the start button, then first the HTML document of the website is downloaded in the first step.

In the second step, the html document is examined for the searched nodes.

 

In the first step, the html document is downloaded

 

For this I use the following C # code method fx_read_Page (URL)

The code first creates a web request and then derives a WebResponse

//< WebRequest and Response >

WebRequest objRequest = WebRequest.Create(sURL);

HttpWebResponse objResponse = (HttpWebResponse)objRequest.GetResponse();

//</ WebRequest and Response >

 

Then it downloads the text into a stream

//< Stream and Reader >

Stream objDataStream = objResponse.GetResponseStream();

StreamReader TextReader = new StreamReader(objDataStream);

//</ Stream and Reader >

string sHTML = TextReader.ReadToEnd();

 

And pass this to a local htmlDocument

//*create and load to local HtmlDocument

HtmlDocument doc = new HtmlDocument();

doc.LoadHtml (sHTML);

 

Method fx_read_Page (string sURL)

private HtmlAgilityPack.HtmlDocument fx_read_Page(string sURL)

        {

            //------------< fx_read_Page() >------------

            //* get the HTML Document of a website-URL     

            //-< init >-

            //< WebRequest and Response >

            WebRequest objRequest = WebRequest.Create(sURL);

            HttpWebResponse objResponse = (HttpWebResponse)objRequest.GetResponse();

            //</ WebRequest and Response >

 

            //< Stream and Reader >

            Stream objDataStream = objResponse.GetResponseStream();

            StreamReader TextReader = new StreamReader(objDataStream);

            //</ Stream and Reader >

            //-</ init >-

 

            //< download >

            //* Read Website to local String

            string sHTML = TextReader.ReadToEnd();

            //</ download >

 

            //< get HTMLdocument >

            //*create and load to local HtmlDocument

            HtmlDocument doc = new HtmlDocument();

            doc.LoadHtml (sHTML);

            //</ get HTMLdocument >

 

            //< output >

            return doc;

            //</ output >

            //------------</ fx_read_Page() >------------

        }

 

 

Evaluate the HTML document

Then the html document is examined for the nodes

 

First, all Li-Html elements are collected using the Class = Job-Result-Item

//< nodes >

HtmlNodeCollection  nodes = doc.DocumentNode.SelectNodes("//li[@class=\"job-result-item\"]"); //*find subnode with //

//</ nodes >

 

From this, each hit is traversed and searched for the first subelement of the type Div and the Css class Job Title

HtmlNode nodeTitle = node.SelectSingleNode(".//div[@class=\"job-title\"]");

 

From this detail element the title and the link are determined

HtmlNode node_to_Detail = nodeTitle.SelectSingleNode("a");

                    string sTitle = node_to_Detail.InnerText;

                    string sURL_Detail_relative = node_to_Detail.GetAttributeValue("href", "");

                    string sURL_Detail_absolute = new Uri(baseUrl, sURL_Detail_relative).AbsoluteUri ;

 

 

 

Evaluate html nodes

//< nodes >

            HtmlNodeCollection  nodes = doc.DocumentNode.SelectNodes("//li[@class=\"job-result-item\"]"); //*find subnode with //

            //</ nodes >

 

            //------< @Loop: Detail Nodes >------

            foreach (HtmlNode node in nodes )

            {

                //----< In Detail-Node >----

                HtmlNode nodeTitle = node.SelectSingleNode(".//div[@class=\"job-title\"]"); //*find subnode with .//

                if (nodeTitle !=null)

                {

                    //--< get a_href >--

                    HtmlNode node_to_Detail = nodeTitle.SelectSingleNode("a");

                    string sTitle = node_to_Detail.InnerText;

                    string sURL_Detail_relative = node_to_Detail.GetAttributeValue("href", "");

                    string sURL_Detail_absolute = new Uri(baseUrl, sURL_Detail_relative).AbsoluteUri ;

 

                    //< print >

                    fx_Log(sTitle);

                    //</ print >

                    //--</ get a_href >--

                }

                //----</ In Detail-Node >----

            }

            //------</ @Loop: Detail Nodes >------

 

 

 

 

Complete code example in C #

using System;

using System.Collections.Generic;

using System.Text;

using System.Threading.Tasks;

using System.Windows;

//< add using >

using System.IO;                //*Stream, Streamreader

using System.Net;               //*NetRequest

using HtmlAgilityPack;   //*Html Website

//</ add using >

 

namespace WebRobot_ComputerFutures

{

 

    public partial class MainWindow : Window

    {

 

 

        #region Form

        //--------------------< region: Form >---------------------

        public MainWindow()

        {

            InitializeComponent();

        }

        //--------------------</ region: Form >---------------------

        #endregion /Form

 

 

        #region Buttons

        //--------------------< region: Buttons >---------------------

        private void Button_Start_Click(object sender, RoutedEventArgs e)

        {

            //--------< Button_Start_Click() >--------

            string sURL = Textbox_URL.Text;

            var baseUrl = new Uri(sURL);

 

            HtmlDocument doc= fx_read_Page(sURL );

 

            //< nodes >

            HtmlNodeCollection  nodes = doc.DocumentNode.SelectNodes("//li[@class=\"job-result-item\"]"); //*find subnode with //

            //</ nodes >

 

            //------< @Loop: Detail Nodes >------

            foreach (HtmlNode node in nodes )

            {

                //----< In Detail-Node >----

                HtmlNode nodeTitle = node.SelectSingleNode(".//div[@class=\"job-title\"]"); //*find subnode with .//

                if (nodeTitle !=null)

                {

                    //--< get a_href >--

                    HtmlNode node_to_Detail = nodeTitle.SelectSingleNode("a");

                    string sTitle = node_to_Detail.InnerText;

                    string sURL_Detail_relative = node_to_Detail.GetAttributeValue("href", "");

                    string sURL_Detail_absolute = new Uri(baseUrl, sURL_Detail_relative).AbsoluteUri ;

 

                    //< print >

                    fx_Log(sTitle);

                    //</ print >

                    //--</ get a_href >--

                }

                //----</ In Detail-Node >----

            }

            //------</ @Loop: Detail Nodes >------

            //--------</ Button_Start_Click() >--------

        }

        //--------------------</ region: Buttons >---------------------

        #endregion /Buttons

 

 

 

        #region Methods

        //--------------------< region: Methods >---------------------

        private HtmlAgilityPack.HtmlDocument fx_read_Page(string sURL)

        {

            //------------< fx_read_Page() >------------

            //* get the HTML Document of a website-URL     

            //-< init >-

            //< WebRequest and Response >

            WebRequest objRequest = WebRequest.Create(sURL);

            HttpWebResponse objResponse = (HttpWebResponse)objRequest.GetResponse();

            //</ WebRequest and Response >

 

            //< Stream and Reader >

            Stream objDataStream = objResponse.GetResponseStream();

            StreamReader TextReader = new StreamReader(objDataStream);

            //</ Stream and Reader >

            //-</ init >-

 

            //< download >

            //* Read Website to local String

            string sHTML = TextReader.ReadToEnd();

            //</ download >

 

            //< get HTMLdocument >

            //*create and load to local HtmlDocument

            HtmlDocument doc = new HtmlDocument();

            doc.LoadHtml (sHTML);

            //</ get HTMLdocument >

 

            //< output >

            return doc;

            //</ output >

            //------------</ fx_read_Page() >------------

        }

 

        //--------------------</ region: Methods >---------------------

        #endregion /Methods

 

 

 

 

 

 

        #region Sys

        //--------------------< region: Sys >---------------------

        private void fx_Log (string sLog)

        {

            //------------< fx_Log() >------------

            //* log Text to Textbox

            string sText = Textbox_Log.Text;

            sText = DateTime.Now + " " + sLog + Environment.NewLine + sText ;

            if (sText.Length > 50000) { sText = sText.Substring(50000); }

            Textbox_Log.Text = sText;

            Textbox_Log.UpdateLayout();

            //------------</ fx_Log() >------------

        }

 

        //--------------------</ region: Sys >---------------------

        #endregion /Sys

    }

}

 

 

 

 

XAML frontend

Following the Xaml code to the frontend of the WPF application.

 

It contains the start button, a text box to enter the URL, and a text box to display the results.

<Window x:Class="WebRobot_ComputerFutures.MainWindow"

        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"

        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"

        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"

        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"

        xmlns:local="clr-namespace:WebRobot_ComputerFutures"

        mc:Ignorable="d"

        Title="MainWindow" Height="550" Width="720">

    <Grid>

 

        <Label x:Name="Label_URL" Content="Url:" HorizontalAlignment="Left" Margin="30,46,0,0" VerticalAlignment="Top"/>

        <TextBox x:Name="Textbox_URL" HorizontalAlignment="Left" Height="23" Margin="30,72,0,0" TextWrapping="Wrap" Text="https://www.computerfutures.com/jobs/temporary/?locale=de" VerticalAlignment="Top" Width="658"/>

 

        <Label x:Name="Label_Log" Content="log:" HorizontalAlignment="Left" Margin="29,100,0,0" VerticalAlignment="Top"/>

        <TextBox x:Name="Textbox_Log" HorizontalAlignment="Left" Height="385" Margin="29,126,0,0" TextWrapping="Wrap" Text=".." VerticalAlignment="Top" Width="659"/>

        <Button x:Name="Button_Start" Content="Start" HorizontalAlignment="Left" Margin="29,10,0,0" VerticalAlignment="Top" Width="75"

                Click="Button_Start_Click"

                />

 

 

    </Grid>

</Window>

 

 
Video Tutorial
https://www.youtube.com/watch?v=_ABWXvVSKEw

Mobile
»
WPF HTML: Read Website with Webrequest and HtmlAgilityPack

.

Contact for Jobs, Project Requests: raimund.popp@microsoft-programmierer.de