Laracon 2024: Jess Archer: Analyzing Analytical Databases

Jess Archer taught attendees about analytical databases and how they compare to other more traditional databases.

I think this is the talk that taught me the most of the entire conference.

Definitions

  • OLTP (Online Transaction Processing): MySQL, PostgreSQL, SQLite, etc.
  • OLAP (Online Analytical Processing): SingleStore, ClickHouse, etc.

Her preference is Clickhouse; it’s free and open-source, and has excellent documentation and performance.

Comparisons

OLTP databases tend to be row-oriented and store data on disk with each row’s index.

OLAP databases tend to be column-oriented, and store each column of data together, making it much more performant to run queries like AVERAGE(), SUM(), etc., as it only has to open a single file instead of reading the entire database like an OLTP database would.

She had downloaded a dump (22GB compressed) of all Stack Overflow posts and imported it into a MySQL database and a Clickhouse database to run queries live on stage.

It could take 5–6 seconds to load an average view count using MySQL, and 27.5ms using Clickhouse.

What’s the catch?

At least for Clickhouse, the ID field is not unique, meaning that you could have multiple rows with the same ID, and that selecting a row by ID requires a full table scan (using LIMIT 1 can help by “bailing out” once a match has been found).

Ordering: the table structure should be designed close to what the typical query needs, to prevent extra reads from disk.

Inserts: bulk inserts are optimal, rather than single-row inserts

  • Each individual insert creates a “part” or folder on disk
  • The database engine will eventually merge and compact them (see the MergeTree engine)
  • The async_insert feature can also help

Updates: ideally, data is immutable so the engine doesn’t have to rewrite an entire file on disk

Deletes: can be optimized and automated; there’s a marker that indicates a row has been deleted, and at some point the engine will compact the files and remove those

Other Notes about Clickhouse

The LowCardinality field: similar to an enum, but better; it creates a dictionary of values.

The ReplacingMergeTree engine: inserting and updating an entry results in two entries on disk until the engine compacts the files; this engine provides a final keyword that resolves this automatically during queries.

Clickhouse can also easily fill gaps in time series data, while this would be more complicated using other database engines.

Packages

She mentioned these packages for using ClickHouse in a Laravel application:

A week later, I’m still thinking about this talk and how we could use Clickhouse to provide better features and performance for some of our clients.

Laracon 2024: Joe Dixon: Learn to Fly with Laravel Reverb

Joe Dixon explained how Laravel Reverb works using websockets to broadcast data to clients. It is very performant; he said that Laravel has just a single server handling thousands of connections for Forge and other products, including the upcoming Laravel Cloud.

The he provided an impressive demo: he showed a Nintendo Switch that he designed using TailwindCSS, and proceeded to fly a drone using Laravel Reverb to control it.

As if that weren’t enough, he showed how he could receive live telemetry data back from the drone (speed, altitude, temperature, and battery level) and display it on-screen. And then he turned on the camera, showing a live view of the audience!

I’ve been itching to try Reverb, and I have a couple of immediate uses for it…I just haven’t had the time yet!

Laracon 2024: Seb Armand: Scaling Laravel at Square

Seb Armand told some battle stories of how they have approached scaling Laravel at Square, one of the largest payment processors.

  • Reducing database load: eager-loading queries, using Elasticache, developing Tag Tree Cache to cache multiple levels and recursively flush the relevant caches
  • Reducing bandwidth: using CDNs to move assets closer to end users
  • Reducing processing: using queues and deduplication
  • Further reducing processing: using batches and pipelines, and buffering/bundling tasks together

Laracon 2024: Mateus Guimaraes: Behind Laravel Octane

Mateus Guimaraes brought a deep-dive through how Laravel Octane can massively improve performance of your apps.

Main benefits:

  • Reduced latency by eliminating the framework boot step on every request
  • Increased performance
  • Lower cost by reducing CPU usage

Aaron Francis asked a followup question about which driver is best:

None of the apps I’m currently working on need this level of performance (yet) but I’d be interested to try Octane to see how it could improve performance even now.

One more note: Octane can run multiple processes concurrently to save time during a request:

<?php

use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Route;
use Laravel\Octane\Facades\Octane;

Route:: get('foo', function () {
    Octane::concurrently([
        fn () => DB::select('SELECT * WHERE SLEEP(1)'),
        fn () => DB::select('SELECT * WHERE SLEEP(1)'),
        fn () => DB::select('SELECT * WHERE SLEEP(1)'),
    ]);

    return ['foo' => 'bar'];
});

Laracon 2024: Philo Hermans: Livewire Beyond the Basics

Philo Hermans went deep into Livewire optimizations in his talk. A couple of key takeaways:

If your Livewire component calls an action that doesn’t need to re-render anything, you can skip re-rendering by using the [Renderless] attribute. I think I can find some immediate use cases for this.

For high-traffic apps with read replica databases, the sticky option can help guarantee consistency immediately after writes.

Optimistic UIs can improve perceived efficiency: use wire:loading.remove to immediately remove an element before the server round-trip has completed, so the app feels more snappy. I think I could also improve some UIs using this trick.

MySQL Table Size

Ever wondered which database or tables are taking up disk space on a MySQL/MariaDB server?

This query will provide the size of each table:

SELECT TABLE_SCHEMA AS `Database`,
TABLE_NAME AS `Table`,
ROUND(((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024), 2) AS `Size (MB)`,
ROUND((data_free / 1024 / 1024), 2) AS `Reclaimable Size (MB)`
FROM information_schema.TABLES
-- WHERE `TABLE_SCHEMA` = 'database_name'
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;

Database Platform Comparisons for Laravel Feature Tests

TL;DR: MySQL significantly outperforms MariaDB in my automated test suite.

The Problem

This Twitter thread prompted me to do a bit of research on database platforms for Laravel automated tests.

I’ve recently been building an ecommerce app based on Laravel. Partway through development, we added geometry fields to a couple of tables in order to determine distances. I’ve been using this spatial package, so SQLite was not an option for my test suite.

As soon as I switched the testing database driver from SQLite to MariaDB, my tests immediately took an extra 12–13 seconds to run, regardless of whether I ran the entire test suite, a single file, or just one test.

This significantly lengthened the feedback loop when making changes to code and re-running tests.

So when I saw Jack Ellis mention that he uses MySQL for his test suite, it made me curious if he had the same issue.

He said that one of his test files runs 39 tests in < 2 seconds, so apparently it’s not been a problem for him.

Context

  • I’m using the LazilyRefreshDatabase trait added in Laravel 8.62.0 on my entire test suite
  • I’m using squashed migrations
  • Many of my tables have constrained foreign keys referencing other tables

Comparisons

I decided to do some digging; here are comparisons using four different platforms for the same test in my application.

MariaDB

I’ve been using MariaDB as the main database platform on my development machine for years. Currently I’m on version 10.6.4.

In-Memory SQLite Database

I temporarily disabled the geometry features and tried the in-memory SQLite database (DB_CONNECTION=:memory:); it performed much better for the same tests:

SQLite File Database

I then tried with an SQLite file (DB_CONNECTION=sqlite), and it performed about the same:

MySQL 8

I have an installation of MySQL 8 set up for one app that uses some specific MySQL 8 and I figured why not give that a try too.

Here are the results:

Summary

For some reason, MariaDB takes approximately 12–13 seconds to tear down and recreate the database before starting to run tests, but MySQL is much faster.

While testing MariaDB, I opened the raw data directory for the database, and noticed chunks of files being removed and recreated at a time, so perhaps the foreign key constraints are (part of) the culprit here.

I do have 77 databases with ~3800 tables in my MariaDB installation built up from various projects over the years. It seems unlikely, but theoretically possible, that the server size could be part of the problem too.

I think I’ll experiment with switching back to MySQL as my development platform of choice.

Have you run into this same issue? Have any tips or tricks? Let me know in the comments.

Counting Distinct Values in a Single Field

A quick MySQL snippet to count how many times a value appears in a single field—much easier to grok than multiple JOINs.

SELECT
COUNT(CASE WHEN meta_value = 'value1' THEN 1 END) AS value1,
COUNT(CASE WHEN meta_value = 'value2' THEN 1 END) AS value2
FROM wp_post_meta;

/** Results

| value1 | value2 |
|--------|--------|
| 75     | 56     |
*/

WordPress and WooCommerce at Scale with 500,000 Users

WordPress generally works fairly well on small-to-medium sites; on larger sites, it can run into performance issues because of the size of the database.

A current project I’m working on for LuminFire has a WooCommerce store with potentially 270,000+ customers, and that’s causing some issues with site performance. For sake of development, our dev store has 500,000+ users.

Contents:

Generating Dummy Users

First, I generated a CSV file with 50,000 fake users using mockaroo.com and imported them using this plugin (I had to bump up my PHP memory limit to 4096MB and execution time to 240 and it still timed out a couple of times, but I deleted the users who had already been imported and then ran the import again).

Update: I have since learned about WC Smooth Generator, and it may have worked just as well or better.

I figured 50K unique users were plenty and duplicating them via MySQL queries was more efficient, so wrote these queries to do the job.

The users queries ran in seconds each, copying a batch of 50K rows at a time. The usermeta queries, not so much…they took about 11 minutes each since there were about 1.71 million rows to clone each time.

In case it’s helpful to you, here are files with the dummy users:

  • CSV with 50K users
  • MySQL dump of wp_users table (30.4MB)
    • It does include the ID field since it needs to match the usermeta table.
    • The first user ID is 510024; if you try to import and have user IDs above 510024, you’ll have errors.
    • 500,000 rows
  • MySQL dump of wp_usermeta table (148.5MB)
    • It does not include the meta_id field (so no errors trying to overwrite existing IDs).
    • 17,000,000+ rows

WordPress User Dashboard

At the top of the user dashboard, WordPress typically displays a list of the user roles on your site, as well as the number of users in each role.

user roles

The number of each users is generated by the count_users function, which uses a resource-intensive SQL query. In our dev site, it takes 15+ seconds just to run the query.

This is a known issue and should be resolved in WordPress 5.0 (currently scheduled for release in late 2018), but we need this working much sooner.

WordPress Multisite drops the number of users per role and instead shows just a list of roles; that’s how WordPress.com and other large multisite networks avoid this performance hit.

The proposed patch on the ticket modifies the count_users function to behave similarly to WP Multisite: if there are more than 10,000 users (modifiable with the new wp_is_large_user_count filter), it doesn’t show the number of users per role.

Since the patch is for the development version of the WP code, it doesn’t apply cleanly to a production site, so I manually patched WP core. Here are several patch files for different versions of WordPress; make sure to use the appropriate patch for your version:

Core Patch File

You can apply this patch file by downloading it, opening your WordPress installation in a terminal, and running git apply <path-to-downloaded-patch-file>.

WooCommerce User Queries

The biggest performance hit I found was searching for customers when editing an order; it took 10–15 seconds for search results to be returned.

WooCommerce user search

  • If searching by customer ID, the backend would respond pretty quickly.
  • Otherwise, it runs a full text search for the search term in the first and last name, which takes a while; see the code for full details.

Since orders aren’t manually assigned/reassigned too often, we decided this behavior was acceptable for now.

WooCommerce Customer Reports

The WooCommerce customer reports were the other major performance hit. For now, we simply disabled the customers reports since this particular customer already has a third-party system where they manage the customer data, so they’re not likely to use the WooCommerce reports.

Customers vs. Guests

Timing details:

  • Around 20–40 sec to load the page
  • Around 7 sec to get administrator users
  • Around 7 sec to get shop_manager users
  • Around 6 sec to get all other users

Here are the actual actual MySQL queries. In my staging environment, PHP ran out of memory; in my local environment, it took 305MB(!) to load the report for 1 week (the default view).

As noted above, we gave up on optimizing this report since the customer doesn’t really need it.

Customers List

This was completely unusable; it took 300 seconds to load the page.

Here are the actual MySQL queries. Two JOINs times in each query with full-text search on two columns × three queries was just too much on such a large table.

As noted above, we gave up on optimizing this report since the customer doesn’t really need it.

WordPress Posts Dashboard

Update: found another place where there’s a 15+ second wait. When bulk-editing posts (or any other post type that supports authors), an author dropdown causes a full search of the wp_usermeta table. I’ve updated the patch file in the gist above to include a fix for this.

WordPress Posts Dashboard bulk edit author dropdown

Summary

In summary, having 500,000 users on a WordPress and WooCommerce site doesn’t hurt overall performance too much, as long as you include an upcoming change in WP core and can get by without the WooCommerce customer reports.

We may explore further options to improve performance particularly when getting user roles from wp_usermeta, and I will update this post or add a new post if we find other enhancements.

Merge + Minify + Refresh Force Clear Caches

This plugin clears other page caches/proxies when the Merge + Minify + Refresh cache is regenerated so users don’t end up missing static resources (CSS/JS files) due to a cached page trying to load old static resources.

Get the code at the GitHub repository. Continue reading “Merge + Minify + Refresh Force Clear Caches”