Mode
Text Size
Log in / Sign up

Composite Endpoints Show Weaker Treatment Effects Than EDSS Alone in Multiple Sclerosis TrialsAdding walk tests to MS disability scores actually hides treatment benefits

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Note composite endpoints do not systematically improve treatment effect detection in multiple sclerosis trials.

This post-hoc analysis of individual patient data from ten phase III randomised controlled trials included 9369 participants with relapsing-remitting and progressive multiple sclerosis. The study followed patients for 24 weeks to evaluate composite endpoints constructed from the Expanded Disability Status Scale, the timed 25-foot walk test, and the nine-hole peg test.

Comparisons against the Expanded Disability Status Scale alone revealed that OR-type composite endpoints showed weaker treatment effects. The effect size reached {Delta}Z up to +2.26 with an interaction p = 0.004 indicating a reduction in sensitivity. While the combination of the Expanded Disability Status Scale and the nine-hole peg test showed numerically stronger treatment effects in progressive multiple sclerosis with {Delta}Z = -1.65, the interaction p = 0.051 was not statistically significant.

Timed walk test event rates generated the highest rates up to 46.8%, whereas nine-hole peg test event rates generated the lowest rates as low as 2.1%. Safety data including adverse events and serious adverse events were not reported. Limitations note that composite endpoints do not systematically improve treatment effect detection. Increased event capture driven by the timed walk test introduces noise that dilutes the treatment signal rather than amplifying it.

The combination of global disability and upper limb measures represents a promising direction for future endpoint development in progressive multiple sclerosis trials, warranting validation. Composite endpoints do not systematically improve treatment effect detection in multiple sclerosis trials based on current evidence.

Imagine trying to hear a whisper in a noisy room. That is exactly what happens when researchers try to measure progress in multiple sclerosis. They want to know if a new drug stops the disease from getting worse. But sometimes, the tools they use to measure the disease make the good news harder to find.

For years, doctors have used a specific score to track how a patient's condition changes. This score looks at walking, hand dexterity, and other daily tasks. It is called the Expanded Disability Status Scale. It has been the gold standard for decades.

But here is the problem. This single score does not catch every small change in the brain. It misses subtle shifts that happen between doctor visits. To fix this, scientists thought adding more tests would help. They believed that combining different measurements would create a clearer picture.

The Noise Problem

The new idea was to mix the standard score with other tests. One common addition is a timed walk test. Patients race across a short hallway to see how fast they can move. Another test checks how fast a person can move a peg into a hole with their fingers.

The goal was simple. If a drug works, it should show up in all these different ways. But the data told a different story. When researchers added the walk test to the main score, the results got worse.

Think of the walk test as a loud fan in the room. It creates a lot of noise. Even if the drug is working perfectly, the noise from the walk test drowns out the quiet signal of improvement. The overall score becomes less sensitive to the treatment.

Why The Walk Test Fails

This happens because the walk test changes too often. In some patients, their walking speed fluctuates wildly from day to day. Maybe they feel tired, or the floor is slippery. These small changes add up to a lot of "noise."

When you mix this noisy data with the standard score, you dilute the true effect of the medicine. The math shows that the combined score actually makes it harder to prove a drug works. The study looked at thousands of patients across ten major trials. The result was clear: adding the walk test did not help.

A Better Tool For Some

Not all tests are bad news. The study found that the peg test is different. This test focuses on the hands and fingers. It measures fine motor skills that the walk test misses.

In patients with progressive multiple sclerosis, the hands often get stiff before the legs do. The peg test catches these early signs of trouble. When combined with the standard score, this test actually helped a little bit. It provided a clearer view of what was happening inside the body.

However, this only worked for the progressive type of the disease. For the more common relapsing type, the walk test still caused too much confusion. The researchers found that the standard score alone was often better than any mix of tests.

The team analyzed data from nearly 9,400 people. They looked at how different combinations of tests affected the results. They used complex math to compare the noise from the walk test against the clear signal from the standard score.

The numbers showed a surprising trend. The more they relied on the walk test, the weaker the evidence for the drugs became. In some cases, the combined score made the drug look like it did nothing at all. This is dangerous for patients waiting for new treatments.

If a trial fails because of bad data, patients miss out on potential cures. Scientists need to know which tools give the most honest picture. This study proves that not all extra tests are helpful. Sometimes, less is more.

So, what does this mean for the future of research? It means scientists need to be very careful about which tests they choose. They should not just add more tests hoping for better results. Quality matters more than quantity.

For trials focusing on progressive multiple sclerosis, the combination of the standard score and the peg test looks promising. This mix might help catch the disease slowing down. But for other types of the disease, the standard score remains the best choice.

What Happens Next

This discovery does not mean we stop testing patients. It means we must choose our tests wisely. Future trials will need to validate these new combinations before using them widely.

Researchers are already looking at other ways to measure disease activity. Maybe new imaging tools or blood tests will provide the clarity we need. Until then, the lesson is simple: do not let noisy data hide the truth about a treatment. Patients deserve the clearest possible view of their options.

Study Details

Study typeRct
Sample sizen = 9,369
EvidenceLevel 2
PublishedApr 2026
View Original Abstract ↓
Disability worsening is the critical long-term outcome in multiple sclerosis, yet the Expanded Disability Status Scale incompletely captures neurological deterioration and has limited sensitivity in the short time windows of clinical trials. Composite endpoints incorporating functional measures have been proposed to address these limitations, but whether they reliably improve detection of treatment effects has not been established across trials. We conducted a post-hoc analysis of individual patient data from ten phase III randomised controlled trials (ASCEND, BRAVO, CONFIRM, DEFINE, EXPAND, INFORMS, OLYMPUS, OPERA I/II, and ORATORIO; n = 9,369), spanning relapsing-remitting and progressive multiple sclerosis. Confirmed disability worsening was defined using harmonised criteria with the msprog package and confirmed at 24 weeks. Treatment effects were estimated using Cox proportional hazards models and combined across trials in a one-stage individual patient data framework. Composite endpoints were constructed from the Expanded Disability Status Scale, the timed 25-foot walk test, and the nine-hole peg test using logical unions (OR-type), intersections (AND-type), and majority-vote structures. Sensitivity to treatment effect was quantified using Z-scores (the ratio of the pooled log-hazard ratio to its standard error) and compared to the Expanded Disability Status Scale reference using interaction tests. Event rates varied across components: the timed walk test generated the highest rates (up to 46.8%) while the nine-hole peg test generated the lowest (as low as 2.1%). OR-type composite endpoints showed weaker treatment effects than the Expanded Disability Status Scale alone, with the largest reductions in sensitivity observed for endpoints incorporating the timed walk test ({Delta}Z up to +2.26; interaction p = 0.004). These findings were confirmed across disease subtypes and were pronounced in relapsing-remitting trials, where no composite endpoint outperformed the Expanded Disability Status Scale. In progressive multiple sclerosis, the combination of the Expanded Disability Status Scale and the nine-hole peg test showed numerically stronger treatment effects ({Delta}Z = -1.65), though interaction tests did not reach statistical significance (p = 0.051). Composite endpoints do not systematically improve treatment effect detection in multiple sclerosis trials. Increased event capture driven by the timed walk test introduces noise that dilutes the treatment signal rather than amplifying it, highlighting that event rate and endpoint quality are not interchangeable. Upper limb function assessed by the nine-hole peg test provides complementary and specific information, particularly in progressive disease. The combination of global disability and upper limb measures represents a promising direction for future endpoint development in progressive multiple sclerosis trials, warranting validation.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.