Free Code Snippets in C#, Net Framework, Office 365, ASP.Net,WPF, Visual Studio, SQL Server, Antivirus free
#

 

 

This C # code page shows how to write a robot as an application in Windows Forms (short: Winforms), which browses internet pages and collects data from internet pages.

In principle, the data can be collected in two ways,

 

1.)   With WebBrowser Control

A browser is remote-controlled and the information is taken from the browser as a control

 

2.)   Using WebRequest and WebResponse

Only the web addresses are queried and the HTML text of the web page is evaluated.

 

Why Winforms?

As a platform I use Windows Forms with the Microsoft .Net Framework, because Winforms runs here the most stable. In WPF one can still not suppress any scripterrors, and in UWP Universal Windows Platform the functionality is further reduced.

Finally, this program is more than just a machine and requires only a limited visual interface.

 

entry:

You create a Windows Classic Desktop application under Visual Studio.

Then paste into the Forms1 form a start button, a textbox for the URL and one or more output fields as a text box with Multiline = true.

 

HTML Evaluation with HtmlAgilityPack

Then add the HTMLAgiltiyPack for the evaluation of HTML information. To do this, add an additional reference to theHTMLAgilityPack under References with Add References.

 

 

In order to use the functions of the HtmlAgilityPack you have to insert them in the head area of ​​the code page

//< add using >

using html =HtmlAgilityPack;

//</ add using >

Oder

using HtmlAgilityPack;

 

 

Read the webpage with C #

 

Read HTMLpage

First, you will find the complete HTML text of a website.

 

To do this, you create a WebRequest and HttpWebResponse object.

WebRequest objRequest = WebRequest.Create(sURL);

HttpWebResponse objResponse = (HttpWebResponse) objRequest.GetResponse();

 

Then you read the website into a local HtmlDocument

Stream objDataStream = objResponse.GetResponseStream();

StreamReader TextReader = new StreamReader(objDataStream);

 

string sHTML = TextReader.ReadToEnd();

_doc = new html.HtmlDocument();

_doc.LoadHtml(sHTML);

 

 

C # Method for reading the HTML text as a page

private void fl_Test()

        {

            //----------------< fl_Test() >----------------

            string sURL = tbxURL.Text;

 

            WebRequest objRequest = WebRequest.Create(sURL);

            HttpWebResponse objResponse = (HttpWebResponse) objRequest.GetResponse();

 

           

            //< Webseite auslesen >

            Stream objDataStream = objResponse.GetResponseStream();

            StreamReader TextReader = new StreamReader(objDataStream);

           

            //< get HTMLdocument >

            string sHTML = TextReader.ReadToEnd();

            _doc = new html.HtmlDocument();

            _doc.LoadHtml(sHTML);

            //</ get HTMLdocument >

 

 

            //< Abschluss >

            TextReader.Close();

            objDataStream.Close();

            objResponse.Close();

            //</ Abschluss >

 

            //< auswerten >

 

            get_Results();

            //----------------</ fl_Test() >----------------

        }

 

 

 

Evaluate

Code for evaluating individual HTML elements of the HMTL document

With .SelectNodes (xPath) you determine all elements, which have a certain HTMLType and if possible synonymous name or CSS class description

Then you can go through all nodes of the collection and evaluate them individually

            HtmlNodeCollection nodes = _doc.DocumentNode.SelectNodes("//h2[@class='text-module-begin']");

 

 

            foreach (HtmlNode n in nodes)

            {

                HtmlNode a = n.SelectSingleNode("a");

               

                string sTitel = a.InnerText;

                string sURL = a.GetAttributeValue("href","");

 

                sys_Add_Result(sTitel);

            }

 

Searched HTML elements

In the example shown, all HTML elements should be collected, which contain the links and titles of the Ebay ads.

If you search the website with Alt-F12, you will find the HTML structure of the web page again. In this example, you can see that all results are displayed as <h2 class = text-module-begin> .. </h2> and within this HTML node is a <a href=...> link with the title and URL name.

 

By executing the C # code on the form, all links of the web page are loaded and in this case simply output to the textbox

 

 

 

As a video tutorial

 

 

Complete C # code

using HtmlAgilityPack;

using System;

using System.Collections.Generic;

using System.ComponentModel;

using System.Data;

using System.Drawing;

using System.IO;

using System.Linq;

using System.Net;

using System.Text;

using System.Threading.Tasks;

using System.Windows.Forms;

 

//< add using >

using html =HtmlAgilityPack;

//</ add using >

 

namespace webrobot_ebayKleinanzeigen

{

    public partial class frmWebRoboter : Form

    {

        //----------< global>----------

        html.HtmlDocument _doc;

        //----------</ global>----------

 

 

        #region Form

        //-------------------< region: Forms >---------- --------

        public frmWebRoboter()

        {

            InitializeComponent();

        }

        //-------------------</ region: Forms >---------- --------

        #endregion /Form

 

        #region Buttons

        //-------------------< region: Buttons >---------- --------

        private void btnRead_Click(object sender, EventArgs e)

        {

            fl_Test();

        }

        //-------------------</ region: Buttons >---------- --------

        #endregion

 

        #region Methods

        //-------------------< region: Methods >------------------

        private void fl_Test()

        {

            //----------------< fl_Test() >----------------

            string sURL = tbxURL.Text;

 

            sys_Add_Log("Get Website");

            WebRequest objRequest = WebRequest.Create(sURL);

            HttpWebResponse objResponse = (HttpWebResponse) objRequest.GetResponse();

 

            sys_Add_Log("/Get Website");

            sys_Add_Log("Text auslesen");

           

            //< Webseite auslesen >

            Stream objDataStream = objResponse.GetResponseStream();

            //</ Webseite auslesen >

 

            //< Text auslesen >

            StreamReader TextReader = new StreamReader(objDataStream);

            //</ Text auslesen >

           

            //< get HTMLdocument >

            string sHTML = TextReader.ReadToEnd();

            _doc = new html.HtmlDocument();

            _doc.LoadHtml(sHTML);

            //</ get HTMLdocument >

 

            sys_Add_Log("/Text auslesen");

 

 

 

            //< anzeigen >

            tbxHTML.Text = sHTML;

            //</ anzeigen >

 

            sys_Add_Log("/anzeigen");

 

 

            //< Abschluss >

            TextReader.Close();

            objDataStream.Close();

            objResponse.Close();

            //</ Abschluss >

 

            //< auswerten >

 

            get_Results();

            //----------------</ fl_Test() >----------------

        }

 

       

 

        #region Auswerten

        private void get_Results()

        {

            //-------------------< get_Results() >-------------------

            HtmlNodeCollection nodes = _doc.DocumentNode.SelectNodes("//h2[@class='text-module-begin']");

 

 

            foreach (HtmlNode n in nodes)

            {

                HtmlNode a = n.SelectSingleNode("a");

               

                string sTitel = a.InnerText;

                string sURL = a.GetAttributeValue("href","");

 

                sys_Add_Result(sTitel);

            }

 

            //-------------------</ get_Results() >-------------------

        }

        #endregion

 

 

        //-------------------</ region: Methods >------------------

        #endregion

 

 

        #region System

        //-------------------</ region: System Methods >------------------

        private void sys_Add_Log(string parText = "")

        {

            //----------------------< sys_Add_Log() >----------------------

            string sText = tbxLog.Text;

            sText = DateTime.Now.ToLongTimeString() + " " + parText + Environment.NewLine + sText;

            //< automatisch kuerzen >

            if (sText.Length > 10000)

            { sText = sText.Substring(1, 10000); }

            //</ automatisch kuerzen >

            tbxLog.Text = sText;

            //----------------------</ sys_Add_Log() >----------------------

        }

 

        private void sys_Add_Result(string parText = "")

        {

            //----------------------< sys_Add_Log() >----------------------

            string sText = tbxResult.Text;

            sText = DateTime.Now.ToLongTimeString() + " " + parText + Environment.NewLine + sText;

            //< automatisch kuerzen >

            if (sText.Length > 10000)

            { sText = sText.Substring(1, 10000); }

            //</ automatisch kuerzen >

            tbxResult.Text = sText;

            //----------------------</ sys_Add_Log() >----------------------

        }

        //-------------------</ region: System Methods >------------------

        #endregion

    }

}

 

 

Mobile
»
Winforms Webbrowser: HTMLDocument.load Illegal characters in path
»
Winforms Webbrowser: HTMLDocument.load Illegal characters in path

.

Contact for Jobs, Project Requests: raimund.popp@microsoft-programmierer.de