Parsing is a extracting information from a string usually using this VB command.
you may want to extract certain parts from the HTML source code of a webpage. It's really hard to teach "parsing" to someone, because it's entirely dependent on what data you're working with. However, after some practice, you will start to notice patterns and realize that most parsing situations call for almost the same thing
Left
Mid
Right
Trim
InStr
InStrRev
Split, etc.
Note : In most cases, it's usually just a combination of InStr and Mid (avoid using the Split function if possible).
Parsing HTML
The HTML looks like this:
<html>
<head>
<title>Parse this</title>
</head>
<body>
<strong>Welcome!</strong>
<a href="#"><strong>User@yahoo.com</strong></a>
</body>
</html>
Follow this step :
1. Find the starting string
In this case, it would be <strong>. You would have to find the 2nd instance of
<strong>.
VB has a nice built-in function called InStr.
Dim lonPos As Long
Dim strStart As String
'The start string.
strStart = "<a href=""#""><strong>"
'Find the start string.
lonPos = InStr(1, HTML, strStart, vbTextCompare)
'1 - Where we start searching in the string (from the beginning).
'HTML - The string holding the HTML.
'strStart - What we're searching for.
'vbTextCompare - Case in-sensitive search (more reliable for HTML).
'vbBinaryCompare - Faster, but it's case-sensitive.
If the search string was found, lonPos will contain the starting position. The
starting position would be the < in <a href="#".
2. Find the ending string
The end would be be </strong>.
So, all we do is use the InStr function again
Except, this time, we will supply the function with lonPos and have it start
searching from there. If we searched from the beginning, it would take us to the
end of "Welcome!".
Dim lonPos As Long, lonEnd As Long
Dim strStart As String, strEnd As String
Dim strEmail As String
'The start string.
strStart = "<a href=""#""><strong>"
strEnd = "</strong>"
'Find the start string.
lonPos = InStr(1, HTML, strStart, vbTextCompare)
If lonPos > 0 Then
'Move to the end of the start string
'which happens to be the beginning of what we're looking for. :)
lonPos = lonPos + Len(strStart)
'Find the end string starting from where we found the start.
lonEnd = InStr(lonPos, HTML, strEnd, vbTextCompare)
If lonEnd > 0 Then
'Now, we have the starting and ending position.
'What we do is extract the information between them.
'The length of data (e-mail address) will be:
'lonEnd - lonPos
strEmail = Mid$(HTML, lonPos, lonEnd - lonPos)
'Done!
MsgBox strEmail
End If
End If
Note :
If lonPos > 0 Then
Checks if we found the start. If InStr didn't find it, it will return 0.
lonPos = lonPos + Len(strStart)
This will take us from the beginning of the start string (X<a href="#"></strong>)
to the end of the start string (<a href="#"></strong>X)
At the end of the start string is what we're looking for (the e-mail address).
lonEnd = InStr(lonPos, HTML, strEnd, vbTextCompare)
The search will start from lonPos and will find strEnd (</strong>).
If lonEnd > 0 Then
If InStr found the ending string (</strong>) then...
strEmail = Mid$(HTML, lonPos, lonEnd - lonPos)
We are using Mid to extract something from the middle of the string.
We start at lonPos. This starts with the first character of the e-mail address.
We end at lonEnd - lonPos. That will equal the length of the e-mail address (for
any length-email address).
Done
As you can see, the entire process of that parsing routine was:
Find the start (InStr)
Find the end (InStr)
Extract the data between (Mid)
And you know what? That is the exact same process you will use 90% of the time
when you want to extract data from between two other strings.
Try this: Change the HTML. Change the e-mail address. Change the values of
strStart and strEnd in the code to match those of the HTML. Run the code. It
will work regardless.
Wrapping it up
Wrapping up the above code into a reusable function:
Private Function GetBetween(ByVal Start As Long, Data As String, _
StartString As String, EndString As String, _
Optional ByVal CompareMethod As VbCompareMethod = vbBinaryCompare) As String
Dim lonStart As Long, lonEnd As Long
'1. Find start string.
lonStart = InStr(Start, Data, StartString, CompareMethod)
If lonStart > 0 Then
'2. Move to end of start string.
lonStart = lonStart + Len(StartString)
'3. Find end string.
lonEnd = InStr(lonStart, Data, EndString, CompareMethod)
If lonEnd > 0 Then
'4. Extract data between start and end strings.
GetBetween = Mid$(Data, lonStart, lonEnd - lonStart)
End If
End If
End Function
And if we were to use this function for this scenario, it woud be:
strEmail = GetBetween(1, HTML, "<a href=""#""><strong>", "</strong>",
vbTextCompare)
MsgBox strEmail
1 - Where we start searching in the string (beginning).
HTML - The HTML we are working with.
<a href="#... - The start string.
</strong> - The end string.
vbTextCompare - Case-insensitive search (slower, but more reliable for HTML).
vbBinaryCompare - Case-sensitive search (faster, but more strict).
Note : Always Remember the steps : Find the start, find the end, extract
between.
0 comments:
Posting Komentar