Publications by authors named "Franz Rothlauf"

Background: Urothelial Bladder Cancer (UBC) is a common cancer with a high risk of recurrence, which is influenced by the TNM classification, grading, age, and other factors. Recent studies demonstrate reliable and accurate recurrence prediction using Machine Learning (ML) algorithms and even outperform traditional approaches. However, most ML algorithms cannot process categorical input features, which must first be encoded into numerical values.

View Article and Find Full Text PDF

Missing values (NA) often occur in cancer research, which may be due to reasons such as data protection, data loss, or missing follow-up data. Such incomplete patient information can have an impact on prediction models and other data analyses. Imputation methods are a tool for dealing with NA.

View Article and Find Full Text PDF

Background: Urothelial bladder cancer (UBC) is characterized by a high recurrence rate, which is predicted by scoring systems. However, recent studies show the superiority of Machine Learning (ML) models. Nevertheless, these ML approaches are rarely used in medical practice because most of them are black-box models, that cannot adequately explain how a prediction is made.

View Article and Find Full Text PDF

Background: Cancer registries link a large number of electronic health records reported by medical institutions to already registered records of the matching individual and tumor. Records are automatically linked using deterministic and probabilistic approaches; machine learning is rarely used. Records that cannot be matched automatically with sufficient accuracy are typically processed manually.

View Article and Find Full Text PDF

Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases, allowing for more individuals to be explored with the same number of program executions. However, sampling randomly can exclude important cases from the down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused.

View Article and Find Full Text PDF

Background: Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible.

View Article and Find Full Text PDF

Linear Genetic Programming (LGP) represents programs as sequences of instructions and has a Directed Acyclic Graph (DAG) dataflow. The results of instructions are stored in registers that can be used as arguments by other instructions. Instructions that are disconnected from the main part of the program are called noneffective instructions, or structural introns.

View Article and Find Full Text PDF

This paper discusses how the use of redundant representations influences the performance of genetic and evolutionary algorithms. Representations are redundant if the number of genotypes exceeds the number of phenotypes. A distinction is made between synonymously and non-synonymously redundant representations.

View Article and Find Full Text PDF

When using genetic and evolutionary algorithms for network design, choosing a good representation scheme for the construction of the genotype is important for algorithm performance. One of the most common representation schemes for networks is the characteristic vector representation. However, with encoding trees, and using crossover and mutation, invalid individuals occur that are either under- or over-specified.

View Article and Find Full Text PDF