Skip to main content

Lighthouse automatic run audit and report result for a web site

Lighthouse is a SEO, performance audit tool by Google Chrome. It provide many report about performance, SEO with suggestion to improve.
This guide I will try to write some bash script and/or NodeJS module to automated the process.
This guide include many tools:
ScreemingFrog (SEO tool, this require fee or you can cr*k :), search free key )
Lighthouse  and here https://developers.google.com/web/tools/lighthouse/
Lighthouse viewer https://googlechrome.github.io/lighthouse/viewer2x/
Git-Bash (some unix tool on fucking Windows)
Python3 (if you use Windows then find yourself how to install it).
NodeJS: Lighthouse and many tool is based on (or install required NodeJS)

Script to get only performance Optimized images report from a URL:

#!/bin/bash
FILES=./LH_json/*
for f in $FILES
do
  report_length=$(cat $f |  python3 -c "import sys, json; print(json.load(sys.stdin)['audits']['uses-optimized-images'])" | wc -m);
  report="$(cat $f | python3 -c  | fold -w 12 | head -n 1)"
  
  # Normal page without any image need optimize have report length 720 chars.
  # So we only care ab site page has report length longer than that. These report contain image optimize suggest.

  if [ "$report_length" -gt 720 ]; then
    #echo "$report"
    echo $f
    cat $f |  python3 -c "import sys, json; print(json.load(sys.stdin)['audits']['uses-optimized-images'])" 
  else 
    #echo "---"
    printf "\n"
  fi
  
  #  len=$(expr length 'monkey brains');
  # TODO grep exclude eg farm4. as external resources
  # loop through report image to get percent (if needed). or just get image url and page-url
done

use simple print(json.load...) will result in error json format.
JSON seem use double quote for dict data. single quote like this will cause error:


{'rawValue': 90, 'description': 'Optimize images', 'extendedInfo': {'value': {'wastedKb': 12, 'results': [{'fromProtocol': True, 'preview': {'url': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'mimeType': 'image/jpeg', 'type': 'thumbnail'}, 'totalMs': '380\xa0ms', 'url': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'totalBytes': 49508, 'wastedBytes': 12223, 'totalKb': '48\xa0KB', 'wastedKb': '12\xa0KB', 'isCrossOrigin': False, 'wastedMs': '90\xa0ms', 'potentialSavings': '12\xa0KB (25%)'}], 'wastedMs': 90}}, 'informative': True, 'score': 90, 'helpText': 'Optimized images load faster and consume less cellular data. [Learn more](https://developers.google.com/web/tools/lighthouse/audits/optimize-images).', 'details': {'header': 'View Details', 'type': 'table', 'itemHeaders': [{'text': '', 'type': 'text', 'itemType': 'thumbnail'}, {'text': 'URL', 'type': 'text', 'itemType': 'url'}, {'text': 'Original', 'type': 'text', 'itemType': 'text'}, {'text': 'Potential Savings', 'type': 'text', 'itemType': 'text'}], 'items': [[{'url': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'mimeType': 'image/jpeg', 'type': 'thumbnail'}, {'text': 'http://yoursite/wp-content/themes/genesis/images/panel-image.jpg', 'type': 'url'}, {'text': '48\xa0KB', 'type': 'text'}, {'text': '12\xa0KB (25%)', 'type': 'text'}]]}, 'displayValue': 'Potential savings of 12\xa0KB (~90\xa0ms)', 'scoringMode': 'binary', 'name': 'uses-optimized-images'}

So we have to put a command json.dumps() before this output:
cat $line  |  python3 -c "import sys, json; print(json.dumps( json.load(sys.stdin)['audits']['uses-optimized-images']) )" >> lh_opt_img.json

To get Lighthouse (LH) report from a URL:
lighthouse --quiet --chrome-flags="--headless"  --perf --output=json http://yoursite.test.co/courses/test-coures-1      --output-path course1.json

To get html instead of json output you can simple change output=json to output=html.
To view JSON autdit result go to Lighthouse viewer then browse json file to view.
If you have html output simple open it with chrome browser.
You can use trick to open many audit result:
In bash or git-bash type something like this:
open or chrome-browser path_to_html_result

Combine with python to get only wanted report
 lighthouse --quiet --chrome-flags="--headless"  --perf --output=json http://tafecourses.uat.pgtest.co/courses/massage-therapy/rockhampton/ |  python3 -c "import sys, json; print(json.load(sys.stdin)['audits']['uses-optimized-images'])" 

This example show the limit of one-line python should be. And we should move to complete python script solution or at least call another .py file from bash ?
https://unix.stackexchange.com/questions/116228/parse-json-using-python

Python script to grab JSON report, generate CSV expected output report.
import json
from collections import namedtuple

with open('lh_opt_img.json') as f:
for line in f:
# Parse JSON into an object with attributes corresponding to dict keys.
rp = json.loads(line, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
# print line, TODO output CSV format
print '"' + str(rp.rawValue) +'(ms)","' + str(rp.extendedInfo.value.wastedKb) + '","' + rp.extendedInfo.value.results[0].totalKb.encode('utf-8') +'","'+ rp.extendedInfo.value.results[0].potentialSavings.encode('utf-8') + '","' + rp.extendedInfo.value.results[0].url.encode('utf-8') + '"'

...
Optimize image
Write Node module instead of bash.
Other report beside performance.
Using tool ScreamingFrog
Wordpress case study.
What can we learn from complex stylesheet like this ? There are many style CSS that can be learned here.
Not used image (size crop)
How to speed up LH scan
Coverage all image from TF => list URLs cover all img to scan
...

Comments

Post a Comment

Popular posts from this blog

Rand mm 10

https://stackoverflow.com/questions/2447791/define-vs-const Oh const vs define, many time I got unexpected interview question. As this one, I do not know much or try to study this. My work flow, and I believe of many programmer is that search topic only when we have task or job to tackle. We ignore many 'basic', 'fundamental' documents, RTFM is boring. So I think it is a trade off between the two way of study language. And I think there are a bridge or balanced way to extract both advantage of two method. There are some huge issue with programmer like me that prevent we master some technique that take only little time if doing properly. For example, some Red Hat certificate program, lesson, course that I have learned during Collage gave our exceptional useful when it cover almost all topic while working with Linux. I remember it called something like RHEL (RedHat Enterprise Linux) Certificate... I think there are many tons of documents, guide n books about Linux bu

Martin Fowler - Software Architecture - Making Architecture matter

  https://martinfowler.com/architecture/ One can appreciate the point of this presentation when one's sense of code smell is trained, functional and utilized. Those controlling the budget as well as developer leads should understand the design stamina hypothesis, so that the appropriate focus and priority is given to internal quality - otherwise pay a high price soon. Andrew Farrell 8 months ago I love that he was able to give an important lesson on the “How?” of software architecture at the very end: delegate decisions to those with the time to focus on them. Very nice and straight-forward talk about the value of software architecture For me, architecture is the distribution of complexity in a system. And also, how subsystems communicate with each other. A battle between craftmanship and the economics and economics always win... https://hackernoon.com/applying-clean-architecture-on-web-application-with-modular-pattern-7b11f1b89011 1. Independent of Frameworks 2. Testable 3. Indepe