Microsoft Office 365 Microsoft SharePoint Online

Office 365 – Check your site for broken links in SharePoint Online – Part 2

October 9, 2017May 23, 2024 Pieter Veenstra

In part 1 of this series Check your site for broken links in SharePoint Online, I looked at going through all my sites within a site collection.

In this post I’m continuing with the implementation of the Get-WebForBrokenLinks.

Get-WebForBrokenLinks -Web $subweb

Before we can have a look at finding all broken links within a site, we will need to identify where the broken links may be stored. A quick look at SharePoint gives me the following lists

List items
Pages
Documents in libraries
Web Parts

For now I’m going to look at the easiest option. List items.

I’m going to start with the function. And I’m making the Lists available using the Load and ExecuteQuery:

Function Get-WebForBrokenLinks {
[CmdletBinding()] param( [Parameter(Mandatory=$True,ValueFromPipeline=$True, ValueFromPipelineByPropertyName=$True,HelpMessage='Web to be scanned for broken links')] [Microsoft.SharePoint.Client.Web] $Web )
begin{
Write-Host "Scanning: "  $Web.Url
}
process{
$web.Context.Load($web.Lists)
$web.Context.ExecuteQuery();
... # This is where the rest of the code needs to appear
}
end {
Write-Host "Compelted scanning: "  $Web.Url
}
}

Now I need to go through the lists and the list items

ForEach ($list in $web.Lists) {
$items = Get-PnPListItem -List $list
foreach ($item in $items) {
....
}
}

So now I’m getting the items for all of my lists. Now it becomes important to understand what type of fields SharePoint has as we step through all the fields in all the items of all the lists.

foreach ($fieldValue in $item.FieldValues){

foreach ($value in $fieldValue.Values) {
if ($value -ne $null) {
switch ($value.GetType().Name){
....
}
}
}

Now all we need to do is handle all data types that may contain urls. So what are the data types? And which ones could possibly contain a url?

To find this out I added a default option to my switch:

default {
$type = $value.GetType()
Write-Error "Not supported type: $type"
}

Then I kept rerunning my script until I collected all the datatypes. I found the following data types in my lists:

Guid
Int32
ContentTypeId
DateTime
FieldUserValue
FieldLookupValue
Boolean
Double
String[]
FieldUrlValue
String

Most of these couldn’t possibly contain a url. e.g. Guid. So building up my switch I get the following script:

switch ($value.GetType().Name){
"Guid" { # Ignore }
"Int32" { # Ignore }
"ContentTypeId" { # Ignore }
"DateTime" { # Ignore }
"FieldUserValue" { # Ignore }
"FieldLookupValue" { # Ignore }
"Boolean" { # Ignore }
"Double" { # Ignore }
"String[]" { ...
}
"FieldUrlValue" { ...
}
"String" { ...
}
default {
$type = $value.GetType()
Write-Error "Not supported type: $type"
}
}

Ok, so so far I only need to write some code for 3 field types. I’m going to start with FieldUrlValue. The reason why this type is easier than String is because the String field may contain other text as well:

if ($value.Url.Contains("https://") -or $value.Url.Contains("http://") ) {
try {
if ((invoke-webrequest $value -DisableKeepAlive -UseBasicParsing -Method head).StatusCode -ne 200){
Write-Host "Broken link:" $value.Url
}
}
catch
{
Write-Host "Broken link:" $value.Url
}
}

So we are now ready to answer the next critical question. How do I recognize a Url in text. I’ve seen solution with Regular expressions. And although this might be a good way ( if you can get it to work!) I’m hoping that I have found an easier way.

It’s all started by assuming that a Url doesn’t contain a space. So if I have a text with a url then a split by space would give me an array:

$string = "text https://sharepains.com/anylocation/anypage.html some more text"
$string.split(" ")

Ok, This will almost work, but not if there isn’t a space before or after the url. So other than spaces what else could be splitting urls from text.
I’m first having a look at the html

&lt;a href="http://testurl"&gt;Link&lt;/a&gt;

As all I’m interested in is getting a variable with a clean url in it, I could just split by ” as well.

for string fields this results in the following piece of code:

if ($value.Contains("https://") -or $value.Contains("http://") -or $value.Contains("http://") -or $value.Contains("https://") ) {
try {
$words = $value.split(" ")
foreach ($word in $words) {
$quotesplitwords = $word.split("`"")
foreach ($quotesplitword in $quotesplitwords)
{
if ($quotesplitword.Contains("https://") -or $quotesplitword.Contains("http://") -or $quotesplitword.Contains("http://") -or $quotesplitword.Contains("https://") ) {
if ((invoke-webrequest $quotesplitword.Replace(":", ":") -DisableKeepAlive -UseBasicParsing -Method head).StatusCode -ne 200){
Write-Host "Broken link:" $quotesplitword
}
}
}
}
}
catch
{
Write-Host "Broken link:" $quotesplitword
}
}

This code now only gives one false positive:

Office 365 - Check your site for broken links in SharePoint Online - Part 2

If urls appear in text, without these being actual clickable hyperlinks then the script will flag them up. Actually any text that contains http will be flagged up as a broken link. Well for now I’m going to decide to live with that. Not sure though if this will be ok for the leftover locations that may contain broken links.

So this now covers finding broken urls within list items. there is still quite a bit of work to do.

Pages
Documents in libraries
Web Parts

But these elements will be done within the next part of this series. Now that we have code that finds Urls within text we are half way there.

Discover more from SharePains

Subscribe to get the latest posts sent to your email.

Pieter Veenstra

Is your business still running on paper trails, sprawling Excel files, or ageing Access databases? There's a better way — and I can show you exactly what it looks like. I'm the Technical Director of Vantage 365, a Microsoft solutions consultancy working with clients across the UK, the Netherlands, and worldwide. For over 30 years I've been turning messy, manual business processes into clean, automated systems that save time, reduce errors, and give teams the visibility they need to make better decisions. SharePains is not just any blog run by a Microsoft MVP. Have you ever used Try-Catch in Power Automate? The original post about Try-Catch in Power Automate can still be found on this site, https://sharepains.com/2018/02/07/try-catch-finally-in-power-automate-flow/ Or have you ever used the Pieter’s method to avoid variables and speed up your flows? https://sharepains.com/2020/03/11/pieters-method-for-advanced-in-flows/ You can contact me using contact@sharepains.com

broken links, Office 365, sharepoint

8 thoughts on “Office 365 – Check your site for broken links in SharePoint Online – Part 2”

tombraman says:

November 30, 2020 at 9:00 pm

Can’t wait for part III!

Loading...

Reply
1. Pieter Veenstra says:
  
  November 30, 2020 at 9:05 pm
  
  Sorry, a had a client that needed this a few years ago. But we never got to fully implement the whole solution. If I get a client whobdoes want the rest as well then I might complete this series.
  
  Loading...
  
  Reply
Sushma Yadav says:

May 3, 2021 at 6:28 am

Using Invoke-Webrequest, I am able to find broken SP sites but for site pages it is still returning me 200 status for broken pages , Any suggestions?

Loading...

Reply
1. Pieter Veenstra says:
  
  May 3, 2021 at 8:49 am
  
  You could try and read the page and look at the content. The non existing page will not have any content.
  
  Loading...
  
  Reply
thad131 says:

December 3, 2021 at 9:53 pm

I guess there was no part 3? This definitely got me started, and I got it all sorted out for the most part now, but I found that the lists were harder to filter through than regular pages. Especially fields that were multi line or allowed for the special features. No matter twhat you do, the text comes out with multipl lines and won’t filter properly. I ended up making a scratch file to dump the text to and then to do a get-content … – raw.

Loading...

Reply
1. Pieter Veenstra says:
  
  December 3, 2021 at 9:59 pm
  
  Indeed, I never got to finishing this series as the client didn’t want this to be done.
  
  Loading...
  
  Reply
AFriend says:

February 14, 2024 at 1:31 pm

What does the completed script look like?

Loading...

Reply
1. Pieter Veenstra says:
  
  February 14, 2024 at 2:20 pm
  
  I would need to reconstruct it from the post.
  
  Loading...
  
  Reply

Office 365 – Check your site for broken links in SharePoint Online – Part 2

Like this:

Discover more from SharePains

Pieter Veenstra

8 thoughts on “Office 365 – Check your site for broken links in SharePoint Online – Part 2”

Leave a ReplyCancel reply

Office 365 – Check your site for broken links in SharePoint Online – Part 2

Share this:

Like this:

Discover more from SharePains

Pieter Veenstra

Related Posts

Migrate SharePoint Site Designs and Scripts to a new Tenant

Check if a file exists in SharePoint with Power Automate

8 thoughts on “Office 365 – Check your site for broken links in SharePoint Online – Part 2”

Leave a ReplyCancel reply

Discover more from SharePains